emnlp emnlp2012 emnlp2012-92 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
Reference: text
sentIndex sentText sentNum sentScore
1 First, many multidomain learning algorithms resemble ensemble learning algorithms. [sent-8, score-0.255]
2 (1) Are multi-domain learning improvements the result of ensemble learning effects? [sent-9, score-0.231]
3 Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. [sent-10, score-0.407]
4 1 Introduction Research efforts in recent years have demonstrated the importance of domains in statistical natural language processing. [sent-13, score-0.209]
5 A mismatch between training and test domains can negatively impact system accuracy as it violates a core assumption in many machine learning algorithms: that data points are independent and identically distributed (i. [sent-14, score-0.231]
6 As a result, numerous domain adaptation methods (Chelba and Acero, 2004; Daum e´ III and Marcu, 2006; Blitzer et al. [sent-18, score-0.255]
7 , 2007) target settings with a training set from one domain and a test set from another. [sent-19, score-0.23]
8 edu case, training a single model obscures domain distinctions, and separating the dataset by domains reduces training data. [sent-29, score-0.448]
9 Instead, multi-domain learning (MDL) can take advantage of these domain labels to improve learning (Daum ´e III, 2007; Dredze and Crammer, 2008; Arnold et al. [sent-30, score-0.228]
10 The key question of this paper is: when do domains matter? [sent-38, score-0.231]
11 First, we explore the question of whether domain distinctions are used by existing MDL algorithms in meaningful ways. [sent-40, score-0.294]
12 While differences in feature behaviors between domains will hurt performance (Blitzer et al. [sent-41, score-0.298]
13 Second, one simple way in which domains can change is the distribution of the prior over the labels. [sent-50, score-0.209]
14 2 Multi-Domain Learning In the multi-domain learning (MDL) setting, examples are accompanied by both a class label and a domain indicator. [sent-57, score-0.301]
15 This aspect of domain difference is typically the focus of unsupervised domain adaptation (Blitzer et al. [sent-70, score-0.449]
16 The key idea behind many MDL algorithms is to target one or both of these properties of domain difference 1303 to improve performance. [sent-78, score-0.253]
17 (2008) consider the problem oflearning how to combine different domain-specific classifiers such that behaviors common to several domains can be captured by a shared classifier, while domain-specific behavior is still captured by the individual classifiers. [sent-85, score-0.324]
18 The key assumption is that instances from two different domains are half as much related to each other as two instances from the same domain. [sent-91, score-0.209]
19 Zhang and Yeung (2010) derive a convex formulation for adaptively learning domain relationships. [sent-95, score-0.217]
20 We select this dataset because, unlike the AMAZON data, CONVOTE can be divided into domains in several ways based on different metadata attributes available with the dataset. [sent-119, score-0.289]
21 We consider two types of domain divisions: the bill identifier and the political party of the speaker. [sent-120, score-0.487]
22 Division based on the bill creates domain differences in that each bill has its own topic. [sent-121, score-0.388]
23 FEDA Frustratingly easy domain adaptation (FEDA) (Daum ´e III, 2007; Daum e´ III et al. [sent-139, score-0.255]
24 This method has similar benefits to FEDA in terms of classifier combination, but also attempts to model domain relationships. [sent-161, score-0.247]
25 1 learning methods, we consider a common baseline: ignoring the domain distinctions and learning a single classifier over all the data. [sent-174, score-0.316]
26 This reflects single-domain learning, in which no domain knowledge is used and will indicate baseline performance for all experiments. [sent-175, score-0.225]
27 While some earlier research has included a separate one classifier per domain baseline, it almost always performs worse, since splitting the domains provides much less data to each classifier (Dredze et al. [sent-176, score-0.509]
28 Support Vector Machines A single SVM run over all the training data, ignoring domain labels. [sent-180, score-0.224]
29 z ip 1305 Logistic Regression (LR) A single logistic regression model run over all the training data, ignoring domain labels. [sent-200, score-0.245]
30 1 Ensemble Learning Question: Are MDL improvements the result of ensemble learning effects? [sent-207, score-0.231]
31 The key idea behind ensemble learning, that of combining a diverse array of models, has been applied to settings in which data preprocessing is used to create many different classifiers. [sent-211, score-0.224]
32 In fact, the top performing and only successful entry to the 2007 CoNLL shared task on domain adaptation for dependency parsing was a straightforward implementation of ensemble learning by creating variants of parsers (Sagae and Tsujii, 2007). [sent-214, score-0.423]
33 However, given this justification, are improvements from MDL simply the result of standard ensemble learning effects, or are these methods really learning something about domain behavior? [sent-227, score-0.425]
34 As we will do in each empirical experiment, we propose a contrarian hypothesis: Hypothesis: Knowledge of domains is irrelevant for MDL. [sent-229, score-0.237]
35 We will then withhold knowledge of the true domains from these algorithms and instead provide them with random “pseudo-domains,” and then evaluate the change in their behavior. [sent-233, score-0.296]
36 The question is whether we can obtain similar benefits by ignoring domain labels and relying strictly on an ensemble learning motivation (instance bagging). [sent-234, score-0.448]
37 For the “Random Domain” setting, we randomly shuffle the domain labels within a given class label within each fold, thus 1306 maintaining the same number of examples for each domain label, and also retaining the same class distribution within each randomized domain. [sent-236, score-0.732]
38 Following the standard practice in previous work, for this experiment we use a balanced number of examples from each domain and a balanced number of positive and negative labels (no class bias). [sent-238, score-0.433]
39 In the “Random Domain” setting, since we are randomizing the domain labels, we increase the number of trials. [sent-240, score-0.264]
40 We repeat each cross-validation experiment 5 times with different randomization of the domain labels each time. [sent-241, score-0.228]
41 The first row shows absolute (average) accuracy for a single classifier trained on all data, ignoring domain distinctions. [sent-243, score-0.277]
42 First, we note for the well-studied AMAZON dataset that our results with true domains are consistent with the previous literature. [sent-245, score-0.327]
43 The main comparison to make in Table 1 is between having knowledge of true domains or not. [sent-254, score-0.257]
44 Ignoring the significance test results for now, overall the results indicate that knowing the true domains is useful for MDL algorithms. [sent-256, score-0.278]
45 Randomizing the domains does not work better than knowing true domains in any case. [sent-257, score-0.487]
46 significantly better only for the AMAZON And interestingly, exactly in the same case, randomly shuffling the domains also gives significant improvements compared to the baseline, showing that there is an ensemble learning effect in operation for MDR-L2 and MDR-KL on the AMAZON dataset. [sent-266, score-0.465]
47 For FEDA, randomizing the domains significantly hurts its performance on the AMAZON data, as is the case for MDR-KL on the PARTY data. [sent-267, score-0.304]
48 Therefore, while our contrarian hypothesis about irrelevance of domains is not completely true, it is indeed the case that some MDL methods benefit from the ensemble learning effect. [sent-268, score-0.405]
49 A second observation to be made from these results is that, while all of empirical research on MDL assumes the definition of domains as a given, the question of how to split a dataset into domains given various metadata attributes is still open. [sent-269, score-0.52]
50 For example, in our experiments, in general, using the political party as a domain distinction gives us more improvements over the corresponding baseline approach5. [sent-270, score-0.476]
51 randomized domains in Table 6, after presenting the second set of experimental results. [sent-275, score-0.305]
52 Even in our datasets, the original versions demonstrated class bias: Amazon product reviews are generally positive, votes on bills are rarely tied, and political parties vote in blocs. [sent-281, score-0.333]
53 While it is common to evaluate learning methods on balanced data, and then adjust for imbalanced real world datasets, it is unclear what effect domain-specific class bias will have on MDL methods. [sent-282, score-0.322]
54 These datasets with varying class bias across domains were used for the experiments described in §4. [sent-288, score-0.548]
55 2 Consider two domains, where each domain is biased towards the opposite label. [sent-289, score-0.218]
56 In this case, domainspecific parameters may simply be capturing the bias towards the class label, increasing the weight uniformly of features predictive of the dominant class. [sent-290, score-0.301]
57 Similarly, methods that learn domain similarity may be learning class bias similarity. [sent-291, score-0.467]
58 First, if capturing domain-specific class bias is the source of improvement, there are much simpler methods for learning that can be just as effective. [sent-293, score-0.273]
59 Second, if class bias accounted for most ofthe improvement in learning, it suggests that such settings could be amenable to unsupervised adaptation of the bias parameters. [sent-295, score-0.536]
60 cb4), where each domain has 100 examples per fold and each domain has a different balance between positive and negative classes. [sent-302, score-0.445]
61 In this setting, we augment the baseline classifier (which ignores domain labels) with a new feature that indicates the domain label. [sent-307, score-0.472]
62 This shows that MDL results can be highly influenced by systematic differences in class bias across domains. [sent-314, score-0.297]
63 Note that there is also a significant negative influence of class bias on MTRL for the AMAZON data. [sent-315, score-0.273]
64 A comparison ofthe MDL results on true domains to the DOM- ID baseline gives us an idea of how much MDL benefits purely from class bias differences across domains. [sent-316, score-0.605]
65 We see that in most cases, about half of the improvement seen in MDL is accounted for by a simple baseline of using the domain identifier as a feature, and all but one of the improvements from DOM- ID are significant. [sent-317, score-0.308]
66 This evaluate the ensemble learning effect, we have a setting of using randomized domains. [sent-318, score-0.285]
67 N: H: Significantly Significantly suggests that in a real-world scenario where difference in class bias across domains is quite likely, it is useful to consider DOM- ID as a simple baseline that gives good empirical performance. [sent-323, score-0.513]
68 Each “Random Domain” result in Table 3 is an average over 20 cross-validation runs (5 randomized trials for each of the four class biased trials cb1 . [sent-326, score-0.275]
69 This setup combines the effects of ensemble learning and bias difference across domains. [sent-330, score-0.375]
70 As seen in the table, for MDL algorithms the results are consistently better as compared to know- ing the true domains for the AMAZON dataset. [sent-331, score-0.296]
71 For the other datasets, the performance after randomizing the domains is still significantly better than the baseline. [sent-332, score-0.304]
72 This evaluation on randomized domains further strengthens the conclusion that differences in bias across domains play an important role, even in the case of noisy domains. [sent-333, score-0.704]
73 Looking at the performance of DOM- ID with randomized domains, we see that in all cases the DOM- ID baseline performs better with randomized domains. [sent-334, score-0.223]
74 For the BILL and PARTY datasets, similar folds with consistent bias were created (number of examples used was different). [sent-336, score-0.236]
75 These datasets with consistent class bias across domains were used for the experiments described in §4. [sent-337, score-0.573]
76 We refer to this as the case of consistent class bias across domains. [sent-344, score-0.298]
77 The distribution of classes within each domain within each fold is shown in Table 4. [sent-345, score-0.251]
78 This is likely due to the higher baseline performance in the consistent class bias case. [sent-349, score-0.329]
79 For the AMAZON dataset, we believe that the domain distinctions are less meaningful, and hence forcing MTRL to learn the relationships results in lower performance. [sent-351, score-0.233]
80 For the PARTY dataset, in the case of a class-biased setup, knowing the party is highly predictive of the vote (in the original CONVOTE dataset, Democrats mostly vote “no” and Republicans mostly vote “yes”), and this is rightly exploited by MTRL. [sent-352, score-0.313]
81 For the three sets of results reported earlier, we evaluated whether using true domains as compared to randomized domains gives significantly better, significantly worse or equal performance. [sent-358, score-0.636]
82 As the table shows, for the first set of results where the class labels were balanced (overall, as well as within each domain), using true domains was significantly better mostly only for the AMAZON dataset. [sent-361, score-0.472]
83 FEDA-SVM was the only approach that was consistently better with true domains across all datasets. [sent-362, score-0.257]
84 For the second set of results (Table 3) where the 1310 class bias varied across the different domains, using true domains was either no different from using randomized domains, or it was significantly worse. [sent-364, score-0.651]
85 In particular, it was consistently significantly worse to use true domains on the AMAZON dataset. [sent-365, score-0.306]
86 This questions the utility of domains on the AMAZON dataset in the context of MDL in a domain-specific class bias scenario. [sent-366, score-0.555]
87 Since randomizing the domains works better for all of the MDL methods on AMAZON, it suggests that an ensemble learning effect is primarily responsible for the significant improvements seen on the AMAZON data, when evaluated in a domain-specific class bias setting. [sent-367, score-0.783]
88 Finally, for the case of consistent class bias across domains, the trend is similar to the case of no class bias using true domains is useful. [sent-368, score-0.828]
89 This table further supports the conclusion that domain-specific class bias highly influences multi-domain learning. [sent-369, score-0.273]
90 Differences emerge between the AMAZON and CONVOTE datasets in terms of the ensemble learning hypothesis. [sent-375, score-0.234]
91 The choice of splits yses Table 5: A comparison between MDL methods with data that have a consistent class bias across domains. [sent-386, score-0.298]
92 Similar to the setup where we evaluate the ensemble learning effect, we have a setting of using randomized domains. [sent-387, score-0.305]
93 l5) Table 6: The table shows the datasets (AM:AMAZON, BI:BILL, PA:PARTY) for which a given MDL method using true domain information was significantly better, significantly worse, or not significantly different (equal) as compared to using randomized domain information with the same MDL method. [sent-395, score-0.673]
94 If we can gain improvements simply by randomly creating new domains (“Random Domain” setting in our experiments) then there may be better ways to take advantage of the provided metadata for MDL. [sent-399, score-0.328]
95 Experiments with domain-specific class biases revealed that a significant part of the improvements could be achieved by adding domain-specific bias features. [sent-401, score-0.368]
96 Limiting the multi-domain improvements to a small set of parameters raises an interesting question: can these parameters be adapted to a new domain without labeled data? [sent-402, score-0.257]
97 Traditionally, domain 1311 adaptation without target domain labeled data has focused on learning the behavior of new features; beliefs about existing feature behaviors could not be corrected without new training data. [sent-403, score-0.538]
98 However, by collapsing the adaptation into a single bias parameter, we may be able to learn how to adjust this pa- rameter in a fully unsupervised way. [sent-404, score-0.227]
99 This would open the door to improvements in this challenging setting for real world problems where class bias was a significant factor. [sent-405, score-0.357]
100 Dependency parsing and domain adaptation with lr models and parser ensembles. [sent-501, score-0.314]
wordName wordTfidf (topN-words)
[('mdl', 0.729), ('domains', 0.209), ('domain', 0.194), ('amazon', 0.193), ('ensemble', 0.168), ('bias', 0.166), ('mtrl', 0.153), ('party', 0.151), ('dredze', 0.129), ('convote', 0.111), ('feda', 0.111), ('class', 0.107), ('daum', 0.1), ('randomized', 0.096), ('bill', 0.085), ('blitzer', 0.075), ('randomizing', 0.07), ('yeung', 0.07), ('datasets', 0.066), ('bagging', 0.066), ('behaviors', 0.065), ('improvements', 0.063), ('adaptation', 0.061), ('saha', 0.06), ('lr', 0.059), ('iii', 0.059), ('fold', 0.057), ('classifier', 0.053), ('balanced', 0.049), ('crammer', 0.048), ('multidomain', 0.048), ('true', 0.048), ('vote', 0.047), ('dataset', 0.045), ('folds', 0.045), ('koby', 0.045), ('reviews', 0.045), ('svm', 0.045), ('mdr', 0.042), ('hal', 0.041), ('algorithms', 0.039), ('distinctions', 0.039), ('political', 0.037), ('settings', 0.036), ('metadata', 0.035), ('labels', 0.034), ('fernando', 0.033), ('avishek', 0.032), ('biases', 0.032), ('baseline', 0.031), ('liblinear', 0.031), ('ignoring', 0.03), ('option', 0.03), ('paired', 0.028), ('questions', 0.028), ('frustratingly', 0.028), ('domainspecific', 0.028), ('cavallanti', 0.028), ('chelba', 0.028), ('contrarian', 0.028), ('maclin', 0.028), ('parties', 0.028), ('republicans', 0.028), ('matter', 0.028), ('kulesza', 0.027), ('classifiers', 0.026), ('product', 0.026), ('id', 0.025), ('consistent', 0.025), ('significantly', 0.025), ('biased', 0.024), ('behavior', 0.024), ('trials', 0.024), ('mansour', 0.024), ('congress', 0.024), ('democrats', 0.024), ('controversial', 0.024), ('differences', 0.024), ('worse', 0.024), ('bills', 0.023), ('convex', 0.023), ('alex', 0.023), ('carnegie', 0.022), ('cmu', 0.022), ('pittsburgh', 0.022), ('question', 0.022), ('violates', 0.022), ('ddi', 0.022), ('arnold', 0.022), ('knowing', 0.021), ('mellon', 0.021), ('logistic', 0.021), ('effects', 0.021), ('setting', 0.021), ('traditionally', 0.021), ('idea', 0.02), ('setup', 0.02), ('multitask', 0.02), ('identifier', 0.02), ('votes', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
2 0.17622299 36 emnlp-2012-Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
Author: Jian Bo Yang ; Qi Mao ; Qiao Liang Xiang ; Ivor Wai-Hung Tsang ; Kian Ming Adam Chai ; Hai Leong Chieu
Abstract: We propose an adaptive ensemble method to adapt coreference resolution across domains. This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base models; (3) it can automatically adjust the active ensemble members during prediction. With simplification, this method can be used in the traditional withindomain case, while still retaining the above features. To the best of our knowledge, this work is the first to both (i) develop a domain adaptation algorithm for the coreference resolution problem and (ii) have the above features as an ensemble method. Empirically, we show the benefits of (i) on the six domains of the ACE 2005 data set in domain adaptation set- ting, and of (ii) on both the MUC-6 and the ACE 2005 data sets in within-domain setting.
3 0.1685888 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
4 0.11899964 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
Author: Natalia Ponomareva ; Mike Thelwall
Abstract: This paper presents a comparative study of graph-based approaches for cross-domain sentiment classification. In particular, the paper analyses two existing methods: an optimisation problem and a ranking algorithm. We compare these graph-based methods with each other and with the other state-ofthe-art approaches and conclude that graph domain representations offer a competitive solution to the domain adaptation problem. Analysis of the best parameters for graphbased algorithms reveals that there are no optimal values valid for all domain pairs and that these values are dependent on the characteristics of corresponding domains.
5 0.083759844 6 emnlp-2012-A New Minimally-Supervised Framework for Domain Word Sense Disambiguation
Author: Stefano Faralli ; Roberto Navigli
Abstract: We present a new minimally-supervised framework for performing domain-driven Word Sense Disambiguation (WSD). Glossaries for several domains are iteratively acquired from the Web by means of a bootstrapping technique. The acquired glosses are then used as the sense inventory for fullyunsupervised domain WSD. Our experiments, on new and gold-standard datasets, show that our wide-coverage framework enables highperformance results on dozens of domains at a coarse and fine-grained level.
6 0.069172449 84 emnlp-2012-Linking Named Entities to Any Database
7 0.068567425 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution
8 0.047910787 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
9 0.047225907 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
10 0.044248544 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
11 0.043759171 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants
12 0.042027354 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
13 0.041358478 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
14 0.040935814 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
15 0.039565895 68 emnlp-2012-Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
16 0.038776662 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
17 0.036152452 120 emnlp-2012-Streaming Analysis of Discourse Participants
18 0.034464471 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
19 0.031967111 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
20 0.031874828 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
topicId topicWeight
[(0, 0.148), (1, 0.061), (2, 0.041), (3, 0.06), (4, 0.021), (5, -0.015), (6, -0.02), (7, 0.023), (8, 0.218), (9, -0.125), (10, 0.048), (11, 0.151), (12, -0.042), (13, 0.018), (14, 0.185), (15, 0.182), (16, -0.129), (17, 0.005), (18, -0.09), (19, 0.005), (20, 0.094), (21, -0.209), (22, 0.254), (23, -0.183), (24, -0.012), (25, -0.198), (26, 0.012), (27, 0.041), (28, -0.071), (29, 0.003), (30, 0.001), (31, -0.081), (32, 0.078), (33, -0.086), (34, -0.043), (35, -0.065), (36, -0.024), (37, 0.085), (38, -0.065), (39, 0.015), (40, 0.114), (41, 0.135), (42, -0.02), (43, -0.097), (44, 0.003), (45, -0.024), (46, -0.077), (47, -0.073), (48, -0.009), (49, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.96943957 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
2 0.75517994 36 emnlp-2012-Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
Author: Jian Bo Yang ; Qi Mao ; Qiao Liang Xiang ; Ivor Wai-Hung Tsang ; Kian Ming Adam Chai ; Hai Leong Chieu
Abstract: We propose an adaptive ensemble method to adapt coreference resolution across domains. This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base models; (3) it can automatically adjust the active ensemble members during prediction. With simplification, this method can be used in the traditional withindomain case, while still retaining the above features. To the best of our knowledge, this work is the first to both (i) develop a domain adaptation algorithm for the coreference resolution problem and (ii) have the above features as an ensemble method. Empirically, we show the benefits of (i) on the six domains of the ACE 2005 data set in domain adaptation set- ting, and of (ii) on both the MUC-6 and the ACE 2005 data sets in within-domain setting.
3 0.69382066 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
4 0.65456861 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
Author: Natalia Ponomareva ; Mike Thelwall
Abstract: This paper presents a comparative study of graph-based approaches for cross-domain sentiment classification. In particular, the paper analyses two existing methods: an optimisation problem and a ranking algorithm. We compare these graph-based methods with each other and with the other state-ofthe-art approaches and conclude that graph domain representations offer a competitive solution to the domain adaptation problem. Analysis of the best parameters for graphbased algorithms reveals that there are no optimal values valid for all domain pairs and that these values are dependent on the characteristics of corresponding domains.
5 0.44731045 84 emnlp-2012-Linking Named Entities to Any Database
Author: Avirup Sil ; Ernest Cronin ; Penghai Nie ; Yinfei Yang ; Ana-Maria Popescu ; Alexander Yates
Abstract: Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.
6 0.41343373 6 emnlp-2012-A New Minimally-Supervised Framework for Domain Word Sense Disambiguation
7 0.25926805 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution
8 0.25713408 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
9 0.24522904 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
10 0.22537988 9 emnlp-2012-A Sequence Labelling Approach to Quote Attribution
11 0.21899974 68 emnlp-2012-Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation
12 0.2118438 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
13 0.20919576 121 emnlp-2012-Supervised Text-based Geolocation Using Language Models on an Adaptive Grid
14 0.20555308 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
15 0.18990606 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
16 0.18951733 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
17 0.18899821 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
18 0.183412 61 emnlp-2012-Grounded Models of Semantic Representation
19 0.17490661 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
20 0.17310181 120 emnlp-2012-Streaming Analysis of Discourse Participants
topicId topicWeight
[(2, 0.017), (7, 0.221), (16, 0.027), (25, 0.024), (34, 0.08), (35, 0.037), (60, 0.119), (63, 0.051), (64, 0.057), (65, 0.031), (70, 0.029), (73, 0.01), (74, 0.044), (76, 0.059), (80, 0.012), (86, 0.051), (95, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.76789689 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
Author: Mahesh Joshi ; Mark Dredze ; William W. Cohen ; Carolyn Rose
Abstract: We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multidomain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced class label setting, although in practice many multidomain settings have domain-specific class label biases. When multi-domain learning is applied to these settings, (2) are multidomain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.
2 0.63150918 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
3 0.62151855 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
Author: Dan Garrette ; Jason Baldridge
Abstract: Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MINGREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to per- formance over the original MIN-GREEDY algorithm for both English and Italian data.
4 0.62103468 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky
Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.
5 0.60948384 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum
Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.
6 0.60870403 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
7 0.60585833 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
8 0.60100675 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
9 0.59897786 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
10 0.59804446 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
11 0.59731972 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
12 0.59686464 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
13 0.59432149 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
14 0.59364206 95 emnlp-2012-N-gram-based Tense Models for Statistical Machine Translation
15 0.59336805 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
16 0.59301335 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
17 0.59246969 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
18 0.59147334 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
19 0.59122241 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation
20 0.58815938 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews