emnlp emnlp2012 emnlp2012-24 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. [sent-2, score-0.316]
2 We present a novel, formal statement of the representation learning task. [sent-3, score-0.202]
3 We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. [sent-4, score-0.309]
4 Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. [sent-5, score-0.432]
5 , 2007) One major cause for poor performance on out of-domain texts is the traditional representation used by supervised NLP systems (Ben-David et al. [sent-10, score-0.228]
6 Recently, several authors have found that learning new features based on distributional similarity can significantly improve domain adaptation (Blitzer et al. [sent-18, score-0.389]
7 This framework is attractive for several reasons: experimentally, learned features can yield significant improvements over standard supervised models on out-of-domain tests. [sent-22, score-0.145]
8 Traditional representations still hold one significant advantage over representation-learning, however: because features are hand-crafted, these representations can readily incorporate the linguistic or domain expert knowledge that leads to state-ofthe-art in-domain performance. [sent-25, score-0.775]
9 To address this shortcoming, we introduce representation-learning techniques that incorporate a domain expert’s preferences over the learned features. [sent-27, score-0.232]
10 For example, out of the set of all possible distributional-similarity features, we might pre- fer those that help predict the labels in a labeled training data set. [sent-28, score-0.129]
11 To capture this preference, we might bias a representation-learning algorithm towards features with low joint entropy with the labels in the training data. [sent-29, score-0.506]
12 This particular biased form of LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in31t C3–o1n3f2e3re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a. [sent-30, score-0.147]
13 We present a novel formal statement of representation learning, and demonstrate that it is computationally intractable in general. [sent-33, score-0.302]
14 It is therefore critical for representation learning to be flexible enough to incorporate the intuitions and knowledge of human experts, to guide the search for representations efficiently and effectively. [sent-34, score-0.428]
15 , 2010), we present an architecture for learning representations for sequence-labeling tasks that allows for biases. [sent-36, score-0.307]
16 In addition to a bias towards task-specific representations, we investigate a bias towards repre- sentations that have similar features across domains, to improve domain-independence; and a bias towards multi-dimensional representations, where different dimensions are independent of one another. [sent-37, score-0.92]
17 In this paper, we focus on incorporating the biases with HMM-type representations (Hidden Markov Model). [sent-38, score-0.538]
18 However, this technique can also be applied to other graphical model-based representations with little modification. [sent-39, score-0.267]
19 Our experiments show that on two different domain-adaptation tasks, our biased representations improve significantly over unbiased ones. [sent-40, score-0.549]
20 In a part-of-speech tagging experiment, our best model provides a 25% relative reduction in error over a state-of-the-art Chinese POS tagger, and a 19% relative reduction in error over an unbiased representation from previous work. [sent-41, score-0.47]
21 Section 3 introduces our framework for learning biased representations. [sent-43, score-0.147]
22 Section 4 describes how we estimate parameters for the biased objective functions efficiently. [sent-44, score-0.22]
23 1 Terminology and Notation A representation is a set of features that describe data points. [sent-47, score-0.161]
24 R F : rXm → Y fivore some nsustiatanbcele space Y (of1314 ten Rd), which is then used as the input space for a classifier. [sent-49, score-0.078]
25 For instance, a traditional representation for POS tagging over vocabulary V would include (in part) |V | dimensions, and would map a word to a binary tv)e |cVto |r d iwmitehn a 1o nisn, only one lodf tmheap pdi am weonrsdio tons a. [sent-50, score-0.245]
26 A domain is a probability distribution D over the instance set X; R(D) denotes the induced distribution over Y. [sent-53, score-0.293]
27 IXn ;d Rom(Dain) adaptation tasks, a dle dairsntreibr uist given samples from a source domain DS, and is evaluated on samples from a target domain DT. [sent-54, score-0.588]
28 (2010) give a theoretical analysis of domain adaptation which shows that the choice of representation is crucial. [sent-57, score-0.5]
29 A good choice is one that minimizes error on the training data, but equally important is that the representation must make data from the two domains look similar. [sent-58, score-0.218]
30 ’s theory provides learning bounds for domain adaptation under a fixed R. [sent-63, score-0.401]
31 The equation also highlights an important underlying tension: the best representation for the source domain would naturally include domain-specific features, and allow a hypothesis to learn domain-specific patterns. [sent-65, score-0.452]
32 By optimizing for this combined objective function, we allow the optimization method to trade off between features that are best for classifying source-domain data and features that allow generalization to new domains. [sent-68, score-0.073]
33 Naturally, the objective function in Equation 2 is completely intractable. [sent-69, score-0.073]
34 Just finding the optimal hypothesis for a fixed representation of the training data is intractable for many hypothesis classes. [sent-70, score-0.223]
35 We leverage prior knowledge to bias the representation learner towards attractive regions of the representations space R, and we develop efficient, greedy optimization techniques feovre learning effective representations. [sent-74, score-0.799]
36 4 Previous Work There is a long tradition of research on representations for NLP, mostly falling into one of three categories: 1) vector space models and dimensionality reduction techniques (Salton and McGill, 1983; Tur- ney and Pantel, 2010; Sahlgren, 2005; Deerwester et al. [sent-76, score-0.411]
37 , 1990; Honkela, 1997) 2) using structured representations to identify clusters based on distributional similarity, and using those clusters as features (Lin and Wu, 2009; Candito and Crabb ´e, 2009; Huang 1315 and Yates, 2009; Ahuja and Downey, 2010; Turian et al. [sent-77, score-0.446]
38 , 2011); 3) and structured representations that induce multi-dimensional real-valued features (Dhillon et al. [sent-79, score-0.31]
39 To our knowledge, we are the first to apply semi-supervised representation learning techniques for structured NLP tasks. [sent-83, score-0.204]
40 Most previous work on domain adaptation has focused on the case where some labeled data is available in both the source and target domains (Daum ´e III, 2007; Jiang and Zhai, 2007; Daum e´ III et al. [sent-84, score-0.511]
41 A few authors have considered domain adaptation with no labeled data from the target domain (Blitzer et al. [sent-89, score-0.591]
42 We demonstrate empirically that incorporating biases into this type ofrepresentation-learning process can significantly improve results. [sent-92, score-0.271]
43 3 Biased Representation Learning As before, let US and UT be unlabeled data, and LS be labeled data from the source domain only. [sent-93, score-0.365]
44 Previous work on representation learning with Hidden Markov Models (HMMs) (Huang and Yates, 2009) has estimated parameters θ for the HMM from unlabeled data alone, and then determined the Viterbioptimal latent states for training and test data to produce new features for a supervised classifier. [sent-94, score-0.403]
45 The joint distribution in an HMM factors into observation and transition distributions, typically mixtures of multinomials: p(x,y|θ) = P(y1)P(x1|y1) YP(yi|yi−1)P(xi|yi) Yi≥2 Figure 1: Illustration of how the entropy bias is incorporated into HMM learning. [sent-96, score-0.432]
46 The dotted oval shows the space of desired distributions in the hidden space, which have small or zero entropy with the real labels. [sent-97, score-0.332]
47 The learning algorithm aims to maximize the log-likelihood of the unlabeled data, and to minimize the KL divergence between the real distribution, pm, and the closest desired distribution, pn. [sent-98, score-0.172]
48 Intuitively, this form of representation learning identifies clusters of distributionally-similar words: those words with the same Viterbi-optimal latent state. [sent-99, score-0.281]
49 The Viterbi-optimal latent states are then used as features for the supervised classifier. [sent-100, score-0.185]
50 Our previous work (2009) has shown that the features from the learned HMM significantly improve the accuracy of POS taggers and chunkers on benchmark domain adaptation datasets. [sent-101, score-0.378]
51 Note, however, that the HMM on its own does not provide even an approximate solution to the objective function in our problem formulation (Eqn. [sent-104, score-0.073]
52 2), since it makes no attempt to find the representation that minimizes loss on labeled data. [sent-105, score-0.22]
53 To address this and other concerns, we modify the objective function for HMM training. [sent-106, score-0.073]
54 Specifically, we encode biases for representation learning by defining a set of properties φ that we believe a good representation function would minimize. [sent-107, score-0.593]
55 One possible bias is that the HMM states should be predictive of the labels in labeled training 1316 data. [sent-108, score-0.409]
56 We can encode this as a property that computes the entropy between the HMM states and the labels. [sent-109, score-0.265]
57 For example, in Figure 1, we want to learn the best HMM distribution for the sentence “Innocent bystanders are often the victims” for POS tagging task. [sent-110, score-0.134]
58 The hidden sequence y1, y2, y3, y4, y5, y6 can have any distribution p1, p2, p3, . [sent-111, score-0.112]
59 Therefore, by calculating the entropy between the hidden sequence and real labels, we can identify a subset of desired distributions that have low entropy, shown in the dotted oval. [sent-120, score-0.293]
60 By minimizing the KL divergence between the learned distribution and the set of desired distributions, we can find the best distribution which is the closest to our desire. [sent-121, score-0.318]
61 Let z be the sequence of labels in LS, and let φ(x, y, z) be a property of the completed data that we wish the learned representation to minimize, based on our prior beliefs. [sent-123, score-0.351]
62 (WYe t|hEen ad[dφ penalty )te]r m≤s ξ }to, tfhoer objective function (3) for the divergence between the HMM distribution p and the “good” distributions q, as well as for ξ: L(θ) − mq,iξn[KL(q(Y)||p(Y|x,θ)) + σ|ξ|] (4) s. [sent-125, score-0.238]
63 EY∼q [φ(x, Y, z)] ≤ ξ (5) where KL is the Kullback-Leibler divergence, and σ is a free parameter indicating how important the bias is compared with the marginal log likelihood. [sent-127, score-0.287]
64 To incorporate multiple biases, we define a vector of properties φ, and we constrain each property φi ≤ ξi. [sent-128, score-0.081]
65 1 A Bias for Task-specific Representations Current representation learning techniques are unsupervised, so they will generate the exact same representation for different tasks. [sent-134, score-0.322]
66 In response, we propose to bias our representation learning such that the learned representations are optimized for a specific task. [sent-139, score-0.706]
67 In particular, we propose a property that measures how difficult it is to predict the labels in training data, given the learned latent states. [sent-140, score-0.267]
68 Our entropy property uses conditional entropy of the labels given the latent state as the measure of unpredictability: φentropy(y, z) = P˜ −XP˜(yi, zi) logP˜(zi|yi) (6) Xi where is the empirical probability and iindicates the ith position in the data. [sent-141, score-0.514]
69 Unlike previous formulations for supervised and semisupervised dimensionality reduction (Zhang et al. [sent-144, score-0.224]
70 2, we devise a biased objective to provide an explicit mechanism for minimizing the distance between the source and target domain. [sent-149, score-0.393]
71 As before, we construct a property of the completed data: φdistance(y) = d1(P˜S, P˜T) where P˜S(Y ) is the empirical distribution over latent state values estimated from source-domain latent states, and similarly for P˜T(Y ). [sent-150, score-0.285]
72 1317 minimizing this property will bias the the representation towards features that appear approximately as often in the source domain as the target domain. [sent-153, score-0.848]
73 We refer to the model trained with a bias of minimizing φdistance as HMM+D, and the model with both φdistance and φentropy biases as HMM+D+E. [sent-154, score-0.574]
74 Factorial HMMs (FHMMs) (Ghahramani and Jordan, 1997) can learn multidimensional models, but inference and learning are complex and computationally expensive even in supervised settings. [sent-158, score-0.177]
75 Our previous work (2010) creat- ed a multi-dimensional representation called an “IHMM” by training several HMM layers independently; we showed that by finding several latent categories for each word, this representation can provide useful and domain-independent features for supervised learners. [sent-159, score-0.566]
76 In this work, we also learn a similar multi-dimensional model (I-HMM+D+E), but within each layer we add in the two biases described above. [sent-160, score-0.36]
77 As a result, the layers may produce similar or equivalent features describing the dominant aspect of distributional similarity in the data, but miss features that are less strong, but still important, in the data. [sent-162, score-0.15]
78 To encourage learning a truly multi-dimensional representation, we add a bias towards I-HMM models in which each layer is different from all previous layers. [sent-163, score-0.382]
79 We define an entropy-based predictability property that measures how predictable each previous layer is, given the current one. [sent-164, score-0.17]
80 Formally, let yil denote the hidden state at the ith position in layer l of the model. [sent-165, score-0.251]
81 For a given layer l, this proper- ty measures the conditional entropy of ym given yl, summed over layers m < l, and subtracts this from the maximum possible entropy: φlpredict(y) = MAX+ X P˜(yil, yim) logP˜(yim|yil) iX;m < 0. [sent-166, score-0.332]
82 Figure 4: Greater distance between domains correlates with worse target-domain tagging accuracy. [sent-169, score-0.194]
83 We use the same setting of free parameters from our POS tagging experiments. [sent-174, score-0.132]
84 Our best biased representation P-HMM+D+E outperformed the unbiased HMM representation by 3. [sent-176, score-0.604]
85 The domain-distance and multi-dimensional biases help most, while the taskspecific bias helps somewhat, but only when the domain-distance bias is included. [sent-179, score-0.749]
86 , 2011) have performed a detailed comparison between these types of systems, so we concentrate on comparisons between biased and unbiased representations here. [sent-192, score-0.549]
87 In this section, we test whether the task-specific bias (entropy bias) actually learns something taskspecific. [sent-195, score-0.239]
88 We learn the entropy-biased representations for two tasks on the same set of sentences, labeled differently for the two tasks: English POS tagging and Named Entity Recognition. [sent-196, score-0.45]
89 Then we switch the representations to see whether they will help or hurt the performance on the other task. [sent-197, score-0.267]
90 0 Table 4: Results of POS tagging and Named Entity recognition tasks with different representations. [sent-204, score-0.124]
91 With the entropy-biased representation, the system has better performance on the task which the bias is trained for, but worse performance on the other task. [sent-205, score-0.239]
92 When the entropy bias uses labels from the same task as the classifier, the performance is improved: about 1. [sent-210, score-0.452]
93 Switching the representations for the tasks actually hurts the performance compared with the unbiased representation. [sent-213, score-0.442]
94 The results suggest that the entropy bias does indeed yield a task-specific representation. [sent-214, score-0.382]
95 6 Conclusion and Future Work We introduce three types of biases into representation learning for sequence labeling using the PR framework. [sent-215, score-0.432]
96 Our experiments on POS tagging and NER indicate domain-independent biases and multidimensional biases significantly improve the representations, while the task-specific bias improves performance on out-of-domain data if it is combined with the domain-independent bias. [sent-216, score-0.937]
97 Our results indicate the power of representation learning in building domain-agnostic classifiers, but also the complexity of the task and the limitations of current techniques, as even the best models still fall significantly short of in-domain performance. [sent-217, score-0.161]
98 Distributional representations for handling sparsity in supervised sequence labeling. [sent-308, score-0.334]
99 Language models as representations for weakly supervised nlp tasks. [sent-316, score-0.382]
100 Morphological features help pos tagging 1323 of unknown words across language varieties. [sent-380, score-0.173]
wordName wordTfidf (topN-words)
[('hmm', 0.317), ('biases', 0.271), ('representations', 0.267), ('bias', 0.239), ('domain', 0.193), ('representation', 0.161), ('yates', 0.156), ('biased', 0.147), ('adaptation', 0.146), ('entropy', 0.143), ('unbiased', 0.135), ('turian', 0.13), ('ner', 0.107), ('blitzer', 0.106), ('layers', 0.1), ('yil', 0.1), ('huang', 0.094), ('layer', 0.089), ('pos', 0.089), ('signaling', 0.086), ('ahuja', 0.086), ('dhillon', 0.086), ('tagging', 0.084), ('property', 0.081), ('latent', 0.077), ('objective', 0.073), ('multidimensional', 0.072), ('kl', 0.071), ('divergence', 0.071), ('labels', 0.07), ('aii', 0.067), ('candito', 0.067), ('crabb', 0.067), ('dsl', 0.067), ('emami', 0.067), ('fhmms', 0.067), ('yim', 0.067), ('supervised', 0.067), ('wsj', 0.064), ('minimizing', 0.064), ('fei', 0.064), ('hidden', 0.062), ('intractable', 0.062), ('bounds', 0.062), ('daum', 0.06), ('dimensionality', 0.06), ('ds', 0.06), ('labeled', 0.059), ('danlp', 0.058), ('factorial', 0.058), ('mansour', 0.058), ('morin', 0.058), ('yi', 0.057), ('unlabeled', 0.057), ('domains', 0.057), ('ls', 0.056), ('source', 0.056), ('towards', 0.054), ('ut', 0.054), ('distance', 0.053), ('semisupervised', 0.052), ('alexander', 0.051), ('distribution', 0.05), ('dt', 0.05), ('distributional', 0.05), ('nlp', 0.048), ('expert', 0.048), ('hmms', 0.048), ('salton', 0.048), ('deerwester', 0.048), ('ey', 0.048), ('free', 0.048), ('neural', 0.048), ('koby', 0.047), ('fernando', 0.046), ('frustratingly', 0.045), ('pradhan', 0.045), ('arun', 0.045), ('doug', 0.045), ('reduction', 0.045), ('distributions', 0.044), ('desired', 0.044), ('structured', 0.043), ('ex', 0.043), ('kulesza', 0.043), ('clusters', 0.043), ('equation', 0.042), ('states', 0.041), ('pm', 0.041), ('statement', 0.041), ('nns', 0.041), ('ghahramani', 0.041), ('dimensions', 0.041), ('regularization', 0.041), ('tasks', 0.04), ('space', 0.039), ('learned', 0.039), ('indexing', 0.039), ('attractive', 0.039), ('computationally', 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
2 0.1685888 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
3 0.14288691 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
Author: Michael J. Paul
Abstract: Recent work has explored the use of hidden Markov models for unsupervised discourse and conversation modeling, where each segment or block of text such as a message in a conversation is associated with a hidden state in a sequence. We extend this approach to allow each block of text to be a mixture of multiple classes. Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models.
4 0.12812535 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
5 0.12524177 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
Author: Jason Naradowsky ; Sebastian Riedel ; David Smith
Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.
7 0.10750264 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
8 0.10325542 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
9 0.10168789 84 emnlp-2012-Linking Named Entities to Any Database
10 0.10164656 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
11 0.1006885 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
12 0.099575117 36 emnlp-2012-Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
13 0.09860871 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
14 0.097351387 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
15 0.095603786 106 emnlp-2012-Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features
16 0.094756156 81 emnlp-2012-Learning to Map into a Universal POS Tagset
17 0.091020495 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
18 0.090875365 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
19 0.089570045 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
20 0.085705221 6 emnlp-2012-A New Minimally-Supervised Framework for Domain Word Sense Disambiguation
topicId topicWeight
[(0, 0.284), (1, -0.013), (2, 0.133), (3, 0.039), (4, -0.021), (5, 0.099), (6, 0.036), (7, 0.032), (8, 0.256), (9, -0.176), (10, -0.075), (11, 0.195), (12, -0.002), (13, -0.133), (14, 0.032), (15, 0.084), (16, -0.104), (17, -0.008), (18, -0.105), (19, 0.01), (20, 0.145), (21, -0.121), (22, 0.134), (23, -0.188), (24, 0.067), (25, -0.069), (26, -0.06), (27, -0.016), (28, -0.027), (29, -0.066), (30, 0.12), (31, 0.014), (32, 0.093), (33, -0.057), (34, -0.014), (35, 0.041), (36, 0.036), (37, -0.01), (38, -0.063), (39, 0.045), (40, -0.065), (41, 0.076), (42, 0.031), (43, -0.013), (44, -0.015), (45, -0.058), (46, -0.071), (47, -0.033), (48, -0.062), (49, -0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.97641706 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
2 0.76615351 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
Author: Natalia Ponomareva ; Mike Thelwall
Abstract: This paper presents a comparative study of graph-based approaches for cross-domain sentiment classification. In particular, the paper analyses two existing methods: an optimisation problem and a ranking algorithm. We compare these graph-based methods with each other and with the other state-ofthe-art approaches and conclude that graph domain representations offer a competitive solution to the domain adaptation problem. Analysis of the best parameters for graphbased algorithms reveals that there are no optimal values valid for all domain pairs and that these values are dependent on the characteristics of corresponding domains.
4 0.56444126 36 emnlp-2012-Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
Author: Jian Bo Yang ; Qi Mao ; Qiao Liang Xiang ; Ivor Wai-Hung Tsang ; Kian Ming Adam Chai ; Hai Leong Chieu
Abstract: We propose an adaptive ensemble method to adapt coreference resolution across domains. This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base models; (3) it can automatically adjust the active ensemble members during prediction. With simplification, this method can be used in the traditional withindomain case, while still retaining the above features. To the best of our knowledge, this work is the first to both (i) develop a domain adaptation algorithm for the coreference resolution problem and (ii) have the above features as an ensemble method. Empirically, we show the benefits of (i) on the six domains of the ACE 2005 data set in domain adaptation set- ting, and of (ii) on both the MUC-6 and the ACE 2005 data sets in within-domain setting.
5 0.52772766 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
Author: Michael J. Paul
Abstract: Recent work has explored the use of hidden Markov models for unsupervised discourse and conversation modeling, where each segment or block of text such as a message in a conversation is associated with a hidden state in a sequence. We extend this approach to allow each block of text to be a mixture of multiple classes. Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models.
6 0.49578816 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
7 0.45361105 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
8 0.44660586 84 emnlp-2012-Linking Named Entities to Any Database
9 0.43393898 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
10 0.40498468 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability
11 0.40182701 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
12 0.39192972 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
13 0.38775501 6 emnlp-2012-A New Minimally-Supervised Framework for Domain Word Sense Disambiguation
14 0.38513836 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
15 0.37572852 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
16 0.3545638 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
17 0.34524676 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
18 0.33167735 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
19 0.32582736 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
20 0.31900769 81 emnlp-2012-Learning to Map into a Universal POS Tagset
topicId topicWeight
[(2, 0.02), (16, 0.04), (25, 0.032), (34, 0.128), (35, 0.244), (45, 0.028), (60, 0.119), (63, 0.038), (64, 0.029), (65, 0.038), (70, 0.024), (73, 0.018), (74, 0.051), (76, 0.072), (80, 0.022), (86, 0.027), (95, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.80533546 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
2 0.65173995 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
3 0.61999869 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
4 0.61693329 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
5 0.61592329 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1
6 0.61511701 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
7 0.60997623 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
8 0.60985076 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
9 0.60924309 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
10 0.60839278 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
11 0.60798317 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
12 0.60781771 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
13 0.60735732 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
14 0.60676622 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
15 0.60666394 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
16 0.60579938 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
17 0.60522693 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
18 0.60495538 111 emnlp-2012-Regularized Interlingual Projections: Evaluation on Multilingual Transliteration
19 0.60476094 22 emnlp-2012-Automatically Constructing a Normalisation Dictionary for Microblogs
20 0.60324246 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic