emnlp emnlp2010 emnlp2010-104 knowledge-graph by maker-knowledge-mining

104 emnlp-2010-The Necessity of Combining Adaptation Methods


Source: pdf

Author: Ming-Wei Chang ; Michael Connor ; Dan Roth

Abstract: Problems stemming from domain adaptation continue to plague the statistical natural language processing community. There has been continuing work trying to find general purpose algorithms to alleviate this problem. In this paper we argue that existing general purpose approaches usually only focus on one of two issues related to the difficulties faced by adaptation: 1) difference in base feature statistics or 2) task differences that can be detected with labeled data. We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. We find that the combined approach often outperforms the individual adaptation approaches. By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Problems stemming from domain adaptation continue to plague the statistical natural language processing community. [sent-2, score-0.688]

2 We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. [sent-5, score-0.648]

3 We find that the combined approach often outperforms the individual adaptation approaches. [sent-6, score-0.601]

4 By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. [sent-7, score-1.803]

5 Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data. [sent-8, score-1.768]

6 1 Introduction While recent advances in statistical modeling for natural language processing are exciting, the problem of domain adaptation remains a big challenge. [sent-9, score-0.688]

7 Several general purpose algorithms have been proposed to address the domain adaptation problem: (Blitzer et al. [sent-16, score-0.739]

8 In general, we can separate existing adaptation algorithms into two categories: Focuses on P(X) This type of adaptation algorithm attempts to resolve the difference between the feature space statistics of two domains. [sent-19, score-1.253]

9 While many different techniques have been proposed, the common goal of these algorithms is to find a better shared representation that brings the source domain and the target domain closer. [sent-20, score-0.548]

10 Focuses on P(Y |X) These adaptation algorithms assume st ohant Pth(eYre | Xex)ists a small amount of labeled data for the target domain. [sent-24, score-1.049]

11 Instead of training two weight vectors independently (one for source and the other for the target domain), these algorithms try to relate the source and target weight vectors. [sent-25, score-0.536]

12 tc ho2d0s10 in A Nsastoucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag eusis 7t6ic7s–7 7, It is important to give the definition of an adaptation framework. [sent-30, score-0.601]

13 An adaptation framework is specified by the data/resources used and a specific learning algorithm. [sent-31, score-0.689]

14 For example, a framework that used only source labeled examples and one that used both source and target labeled examples should be considered as two different frameworks, even though they might use exactly the same training algorithm. [sent-32, score-0.837]

15 Note that the goal of a good adaptation framework is to perform well on the target domain and quite often we only need to change the data/resource used to increase the performance without changing the training algorithm. [sent-33, score-0.921]

16 We refer to frameworks that do not use target labeled data and focus on P(X) as Unlabeled Adaptation Frameworks and refer to frameworks that use algorithms that focus on P(Y |X) as Lwaorbkesle tdh Adaptation hFmrasm theawto forckuss. [sent-34, score-0.957]

17 The major difference between unlabeled adaptation frameworks and labeled adaptation frameworks is the use of target labeled examples. [sent-35, score-2.542]

18 Unlabeled adaptation frameworks do not use target labeled examples1 , while the labeled adaptation frameworks make use of target labeled examples. [sent-36, score-2.747]

19 Under this definition, we consider that a model trained on both source and target labeled examples (later referred as S+T) is a labeled adaptation framework. [sent-37, score-1.295]

20 It is important to combine the labeled and unlabeled adaptation frameworks for two reasons: • • Mutual Benefit: We analyze these two types oMf ufrtuamale Bwoernkefs ta:n dW efin adn athlyazt they aed tdwreoss t dpief-s ferent adaptation issues. [sent-38, score-1.929]

21 Different representations will impact how much a labeled adaptation algorithm can transfer information between domains. [sent-41, score-0.874]

22 Therefore, in order to have a clear picture of what is the best labeled adaptation framework, it is necessary to analyze these two domain adaptation frameworks together. [sent-42, score-1.855]

23 In this paper, we assume we have both a small amount of target labeled data and a large amount 1Note that we still use labeled data from an unlabeled adaptation framework. [sent-43, score-1.432]

24 source domain in 768 of unlabeled data so that we can perform both unlabeled and labeled adaptation. [sent-44, score-0.737]

25 The goal of our paper is to point out the necessity of applying these two adaptation frameworks together. [sent-45, score-0.933]

26 To the best of our knowledge, this is the first paper that both theoretically and empirically analyzes the interdependence between the impact of labeled and unlabeled adaptation frameworks. [sent-46, score-1.035]

27 • • Demonstrate the complex interaction between uDnelmaboenlsetdr aatend t hlaeb ecloemd approaches (Section 4) We construct artificial experiments that demonstrate how applying unlabeled adaptation may impact the behavior of two labeled adaptation approaches. [sent-50, score-1.749]

28 Our approach not only achieves state-of-theart results on these two tasks but it also reveals something surprising finding a better shared representation often makes a simple source+target approach the best adaptation framework in practice. [sent-53, score-0.876]

29 – 2 Two Adaptation Aspects: A Review Why do we need two types of adaptation frameworks? [sent-54, score-0.601]

30 First, unlabeled adaptation frameworks are necessary since many features only exist in one domain. [sent-55, score-1.067]

31 On the other hand, labeled adaptation frameworks are also required because we would like to take advantages of target labeled data. [sent-57, score-1.51]

32 While these two aspects of adaptation both saw significant progress in the past few years, little analysis has been done on the interaction between these two types of algorithms2. [sent-60, score-0.683]

33 In order to have a deep analysis, it is necessary to choose specific adaptation algorithms for each aspect of adaptation framework. [sent-61, score-1.253]

34 While we mainly conduct analysis on the algorithms we picked, we would like to point out that the necessity of combining these two types of adaptation algorithms has been largely ignored in the community. [sent-62, score-0.756]

35 As our example adaptation algorithms lected: we se- Labeled adaptation: FE framework One of the most popular adaptation frameworks that requires the use of labeled target data is the “Frustratingly Easy” (FE) adaptation framework (Daum ´e III, 2007). [sent-63, score-2.666]

36 The FE framework can be viewed as an framework that extends the feature space, and it requires source and target labeled data to work. [sent-65, score-0.597]

37 The (t+1)-th block (the (nt+ 1)-th to the (nt+n)-th features) is a “specific” block, and is only active when Rn(m+1) 2Among the previously mentioned work, (Jiang and Zhai, 2007) is a special case given that it discusses both aspects of adaptation algorithms. [sent-70, score-0.666]

38 As it will become clear in Section 3, in fact, this framework is only effective when there is target labeled data and hence belongs to labeled adaptation frameworks. [sent-81, score-1.328]

39 Although FE framework is quite popular in the community, there are other even simpler labeled adaptation frameworks that allow the use of target labeled data. [sent-82, score-1.598]

40 Unlabeled adaptation: Adding cluster-like features Recall that unlabeled adaptation frameworks find the features that “work” across domain. [sent-84, score-1.102]

41 aurnesd- Table 1: Comparison between two general adaptation frameworks discussed in this paper. [sent-102, score-0.871]

42 Multiple previous adaptation approaches fit in one of either framework. [sent-104, score-0.601]

43 , 2006) for finding better shared representation (without using labeled target data) have been proposed, we find that using straightforward clustering features is quite effective in general. [sent-106, score-0.546]

44 Note that we ignore the effect of unlabeled adaptation in this section, and focus on the analysis of the FE framework as a representative labeled adaptation framework. [sent-108, score-1.745]

45 m, where vt is the specific weight vector for t-th domain and u is a shared weight vector across all domains. [sent-120, score-0.457]

46 The FE framework was in fact originally designed for the problem of multitask learning so in the following, we propose a simple mistake bound analysis based on the multitask setting, where we calculate the mistakes on all domains5. [sent-138, score-0.45]

47 1, we empirically confirm that the analysis holds for the adaptation setting. [sent-140, score-0.622]

48 Then, the number of kmxisktak ≤es m Rad,e∀ twit =h online perceptron training (Novikoff, 1963) and the 5In the adaptation setting, one generally only performance on the target domain. [sent-154, score-0.724]

49 Before showing the bound analysis, note that the framework proposed by (Evgeniou and Pontil, 2004; Finkel and Manning, 2009) is a generalization over these three frameworks (FE, S+T, and the baseline)6. [sent-178, score-0.439]

50 Following the assumptions and Theorem 1, the mistake bound for the FE frameworks is 4(2 − cos(w1, w2))R2a2/(3µ2) (4) This line of analysis leads to interesting bound comparisons for two cases. [sent-192, score-0.548]

51 Data Generation In the following artificial experiments we experiment with domain adaptation by generating training and test data for two tasks, source and target, where we can control the difference between task definitions. [sent-214, score-0.831]

52 The general procedure can be divided into two steps: 1) generating weight vectors z1 and z2 (for source and target respectively), and 2) randomly generating labeled instances for training and testing using z1 and z2. [sent-215, score-0.536]

53 Note that FE labeled adaptation framework beats TGT once the task cosine passes approximately 0. [sent-236, score-1.046]

54 Note that while the experiments are based on the adaptation setting, the results match our analysis based on the multitask setting in Section 3. [sent-239, score-0.69]

55 nHde rthe we construct a simple example to show that even a simple representation shift can change the behavior of the labeled adaptation framework. [sent-244, score-1.0]

56 (a) Basic Similarity Original Cosine (b) Shared Features Figure 1: Artificial Experiment comparing labeled adaptation performance vs. [sent-246, score-0.874]

57 For FE adaptation algorithm to work the tasks need to be close (cosine > 0. [sent-249, score-0.643]

58 Figure (b) csohsoiwnes r≈es 1ul,ts d fvoird experiment 3n when shared features are added to the base weight vectors as used in experiment 1. [sent-251, score-0.425]

59 Both labeled adaptation algorithms effectively use the shared features to improve over just training on target. [sent-253, score-1.082]

60 In this experiment we want to explore which adaptation algorithm performs best before these features are applied. [sent-259, score-0.688]

61 The simple S+T adaptation framework is best able to exploit the set of shared features so performs best over the whole space of similarity in this setting. [sent-265, score-0.877]

62 With this experiment we have demonstrated that there is a need to consider label and unlabeled adaptation frameworks together. [sent-272, score-1.084]

63 3 Experiment 3, Shared Features Goal A good unlabeled adaptation framework should try to find features that “work” across domains. [sent-274, score-0.885]

64 However, it is not clear how these newly added features will impact the behavior of the labeled adaptation frameworks. [sent-275, score-0.979]

65 In this experiment, we show that the new shared features will bring the domains together, and hence make S+T a very strong adaptation framework. [sent-276, score-0.813]

66 These shared weights correspond to the introduction of features that appear in both domains and have the same meaning relative to the tasks, the ideal result of unlabeled adaptation methods. [sent-278, score-0.974]

67 Figure 1(b) shows the performance of the labeled adaptation algorithms once shared features had been added. [sent-283, score-1.082]

68 This shows that labeled adaptation and unlabeled are not independent. [sent-289, score-1.035]

69 Therefore, it is important to combine these two aspects to see the real contribution of each adaptation framework. [sent-290, score-0.629]

70 In these three artificial experiments we have demonstrated cases where both FE or S+T are the best algorithm before and after representation changes like those created with unlabeled adaptation are imposed. [sent-291, score-0.847]

71 5 Real World Experiments In Section 4, we have shown through artificial data experiments that labeled and unlabeled adaptation algorithms are not independent. [sent-294, score-1.122]

72 For the labeled adaptation algorithms, we have the following options: • TGT: Only uses target labeled training dataset. [sent-296, score-1.24]

73 S+T: Uses both source and target labeled dSa+taTse:ts U tsoe strai bno a single meo adnedl w taitrhg ealtl l aabbeelleedd data directly. [sent-301, score-0.421]

74 Both unlabeled adaptation algorithms (adding cluster features) and labeled adaptation algorithm (using source labeled data) help the performance significantly. [sent-317, score-2.076]

75 Moreover, adding cluster-like features also changes the behavior of the labeled adaptation algorithms. [sent-318, score-1.016]

76 The source domain is from the CoNLL03 shared task (Tjong Kim Sang and De Meulder, 2003) and the target domain is from the MUC7 dataset. [sent-326, score-0.444]

77 The goal of this adaptation system is to maximize the performance on the test data of MUC7 dataset with CoNLL training data and (some) MUC7 labeled data. [sent-327, score-0.904]

78 As an unlabeled adaptation method to address feature sparsity, we add cluster-like features based on the gazetteers and word clustering resources used in (Ratinov and Roth, 2009) to bridge the source and target domain. [sent-328, score-0.982]

79 Moreover, using both target labeled data and cluster-like shared representation are mutually beneficial in all cases. [sent-335, score-0.511]

80 Importantly, adding cluster-like features changes the behavior of the labeled adaptation algorithms. [sent-336, score-1.016]

81 When the cluster-like features are not added, the FE+ algorithm is in general the best labeled adaptation framework. [sent-337, score-0.909]

82 This result agrees with the results showed in (Finkel and Manning, 2009), where the authors show that FE+ is the best labeled adaptation framework in their settings. [sent-338, score-0.962]

83 This matches our analysis in Section 4: resolving features sparsity will change the behavior of labeled adaptation frameworks. [sent-340, score-1.026]

84 For example, (Ratinov and Roth, 2009) only use the cluster-like features to address the feature sparsity problem, and (Finkel and Manning, 2009) only use target labeled data without using gazetteers and word-cluster information. [sent-343, score-0.468]

85 Preposition Sense Disambiguation We also test the combination of unlabeled and labeled adaption on the task of Preposition Sense Disambiguation. [sent-345, score-0.513]

86 Previous systems often only use one class of adaptation algorithms. [sent-367, score-0.601]

87 Using both adaptation aspects makes our system perform significantly better than FM09 and RR09. [sent-368, score-0.629]

88 Note that both adding cluster features and adding source labeled data help the performance significantly. [sent-384, score-0.498]

89 Moreover, adding clusters also changes the behavior of the labeled adaptation algorithms. [sent-385, score-1.012]

90 labeled adaptation approach we augment all word based features with cluster information from separately generated hierarchical Brown clusters (Brown et al. [sent-386, score-1.001]

91 First, both labeled and unlabeled adaptation improves the system. [sent-390, score-1.035]

92 When only 10% of the target labeled data is used, the inclusion of the source labeled data helps significantly. [sent-391, score-0.694]

93 When there is more labeled data, labeled and unlabeled adaption have similar impact. [sent-392, score-0.786]

94 Again, using unlabeled adaption changes the behavior of the labeled adaption algorithms. [sent-393, score-0.662]

95 With both labeled and unlabeled adaption, our system is significantly better. [sent-396, score-0.434]

96 Note that after adding cluster features and source labeled data with S+T approach, our system outperforms the state-of-the-art system proposed in (Dahlmeier et al. [sent-401, score-0.461]

97 6 Conclusion In this paper, we point out the necessities of combining labeled and unlabeled adaptation algorithms. [sent-403, score-1.035]

98 More importantly, through artificial data experiments we found that applying unlabeled adaptation algorithms may change the behavior of labeled adaptation algorithms as representations change, and hence affect the choice of labeled adaptation algorithm. [sent-405, score-2.714]

99 Experiments with real-world datasets confirmed that combinations of both adaptation methods provide the best results, often allowing the use of simple labeled adaptation approaches. [sent-406, score-1.475]

100 In the future, we hope to develop a joint algorithm which addresses both labeled and unlabeled adaptation at the same time. [sent-407, score-1.035]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('adaptation', 0.601), ('fe', 0.433), ('labeled', 0.273), ('frameworks', 0.27), ('unlabeled', 0.161), ('shared', 0.122), ('mistake', 0.095), ('target', 0.093), ('preposition', 0.089), ('framework', 0.088), ('domain', 0.087), ('ratinov', 0.085), ('ner', 0.084), ('cosine', 0.084), ('bound', 0.081), ('evgeniou', 0.079), ('tgt', 0.079), ('adaption', 0.079), ('weight', 0.074), ('multitask', 0.068), ('pontil', 0.063), ('psd', 0.063), ('cluster', 0.061), ('finkel', 0.061), ('daum', 0.058), ('cos', 0.056), ('source', 0.055), ('domains', 0.055), ('wm', 0.054), ('dahlmeier', 0.054), ('theorem', 0.053), ('experiment', 0.052), ('algorithms', 0.051), ('novikoff', 0.047), ('behavior', 0.044), ('regularization', 0.044), ('bounds', 0.042), ('tasks', 0.042), ('vectors', 0.041), ('roth', 0.041), ('bounded', 0.04), ('iii', 0.038), ('adding', 0.037), ('shift', 0.037), ('block', 0.037), ('gazetteers', 0.037), ('dissimilar', 0.036), ('artificial', 0.036), ('features', 0.035), ('vector', 0.035), ('rn', 0.034), ('frustratingly', 0.034), ('named', 0.033), ('tx', 0.033), ('blitzer', 0.033), ('interaction', 0.033), ('runner', 0.032), ('tratz', 0.032), ('wttx', 0.032), ('zs', 0.032), ('necessity', 0.032), ('similarity', 0.031), ('assume', 0.031), ('clusters', 0.031), ('perceptron', 0.03), ('sparsity', 0.03), ('vt', 0.03), ('goal', 0.03), ('becomes', 0.03), ('manning', 0.03), ('entity', 0.03), ('mistakes', 0.029), ('koo', 0.028), ('xm', 0.028), ('sense', 0.028), ('aspects', 0.028), ('underline', 0.027), ('hsieh', 0.027), ('chelba', 0.027), ('maxx', 0.027), ('changes', 0.026), ('theoretical', 0.026), ('added', 0.026), ('zhai', 0.026), ('brown', 0.026), ('ando', 0.024), ('jiang', 0.024), ('representation', 0.023), ('analyze', 0.023), ('base', 0.023), ('disambiguation', 0.023), ('sang', 0.023), ('tjong', 0.023), ('semeval', 0.023), ('change', 0.022), ('together', 0.021), ('tong', 0.021), ('urbana', 0.021), ('angle', 0.021), ('analysis', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 104 emnlp-2010-The Necessity of Combining Adaptation Methods

Author: Ming-Wei Chang ; Michael Connor ; Dan Roth

Abstract: Problems stemming from domain adaptation continue to plague the statistical natural language processing community. There has been continuing work trying to find general purpose algorithms to alleviate this problem. In this paper we argue that existing general purpose approaches usually only focus on one of two issues related to the difficulties faced by adaptation: 1) difference in base feature statistics or 2) task differences that can be detected with labeled data. We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. We find that the combined approach often outperforms the individual adaptation approaches. By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data.

2 0.21944202 39 emnlp-2010-EMNLP 044

Author: George Foster

Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.

3 0.1880945 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

Author: Mark Dredze ; Tim Oates ; Christine Piatko

Abstract: Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention – detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses Aodifst uannlaceb,e a dm eextraicm fpolre detecting tshhoifdts u sine sd Aatastreams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.

4 0.15040284 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira

Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.

5 0.13180596 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

Author: Lei Shi ; Rada Mihalcea ; Mingjun Tian

Abstract: In this paper, we introduce a method that automatically builds text classifiers in a new language by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambiguity associated with the translation of a word. We further exploit the readily available unlabeled data in the target language via semisupervised learning, and adapt the translated model to better fit the data distribution of the target language.

6 0.11003923 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

7 0.098148823 37 emnlp-2010-Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks

8 0.093208194 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

9 0.08472959 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

10 0.08179839 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

11 0.077457517 54 emnlp-2010-Generating Confusion Sets for Context-Sensitive Error Correction

12 0.071674727 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

13 0.063443378 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

14 0.061738066 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

15 0.054568999 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

16 0.049897414 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

17 0.049089503 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

18 0.047452323 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

19 0.047405589 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

20 0.044943053 77 emnlp-2010-Measuring Distributional Similarity in Context


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.207), (1, 0.097), (2, -0.042), (3, 0.053), (4, -0.096), (5, 0.04), (6, 0.252), (7, 0.208), (8, 0.033), (9, 0.246), (10, 0.151), (11, 0.183), (12, -0.122), (13, 0.135), (14, 0.056), (15, -0.041), (16, -0.081), (17, -0.128), (18, -0.04), (19, -0.008), (20, 0.116), (21, -0.037), (22, 0.082), (23, 0.083), (24, -0.073), (25, 0.029), (26, 0.042), (27, 0.105), (28, 0.129), (29, 0.238), (30, -0.068), (31, 0.016), (32, -0.189), (33, 0.025), (34, -0.172), (35, 0.026), (36, -0.064), (37, 0.088), (38, -0.023), (39, 0.067), (40, -0.04), (41, -0.091), (42, -0.018), (43, -0.042), (44, 0.085), (45, 0.045), (46, 0.034), (47, 0.025), (48, 0.056), (49, -0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9824568 104 emnlp-2010-The Necessity of Combining Adaptation Methods

Author: Ming-Wei Chang ; Michael Connor ; Dan Roth

Abstract: Problems stemming from domain adaptation continue to plague the statistical natural language processing community. There has been continuing work trying to find general purpose algorithms to alleviate this problem. In this paper we argue that existing general purpose approaches usually only focus on one of two issues related to the difficulties faced by adaptation: 1) difference in base feature statistics or 2) task differences that can be detected with labeled data. We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. We find that the combined approach often outperforms the individual adaptation approaches. By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data.

2 0.73298335 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

Author: Mark Dredze ; Tim Oates ; Christine Piatko

Abstract: Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention – detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses Aodifst uannlaceb,e a dm eextraicm fpolre detecting tshhoifdts u sine sd Aatastreams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.

3 0.70249921 39 emnlp-2010-EMNLP 044

Author: George Foster

Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.

4 0.48958126 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

Author: Amarnag Subramanya ; Slav Petrov ; Fernando Pereira

Abstract: We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the target domain, but no additional labeled data. The similarity graph is used during training to smooth the state posteriors on the target domain. Standard inference can be used at test time. Our approach is able to scale to very large problems and yields significantly improved target domain accuracy.

5 0.3859143 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

6 0.33565733 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

7 0.33547053 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

8 0.30040625 37 emnlp-2010-Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks

9 0.27767339 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

10 0.26373389 54 emnlp-2010-Generating Confusion Sets for Context-Sensitive Error Correction

11 0.2596052 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

12 0.25475428 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

13 0.23219782 77 emnlp-2010-Measuring Distributional Similarity in Context

14 0.22600664 30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

15 0.21143837 62 emnlp-2010-Improving Mention Detection Robustness to Noisy Input

16 0.20644668 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

17 0.19073299 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

18 0.18943618 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

19 0.18901494 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs

20 0.18560472 84 emnlp-2010-NLP on Spoken Documents Without ASR


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.014), (5, 0.016), (10, 0.021), (12, 0.038), (29, 0.118), (30, 0.025), (32, 0.016), (52, 0.065), (56, 0.069), (62, 0.042), (66, 0.177), (72, 0.039), (76, 0.021), (79, 0.015), (87, 0.033), (90, 0.198)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8639971 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

Author: Roi Reichart ; Ari Rappoport

Abstract: We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm’s effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

same-paper 2 0.83416164 104 emnlp-2010-The Necessity of Combining Adaptation Methods

Author: Ming-Wei Chang ; Michael Connor ; Dan Roth

Abstract: Problems stemming from domain adaptation continue to plague the statistical natural language processing community. There has been continuing work trying to find general purpose algorithms to alleviate this problem. In this paper we argue that existing general purpose approaches usually only focus on one of two issues related to the difficulties faced by adaptation: 1) difference in base feature statistics or 2) task differences that can be detected with labeled data. We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. We find that the combined approach often outperforms the individual adaptation approaches. By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data.

3 0.75885552 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

Author: Samuel Brody

Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.

4 0.74941397 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

5 0.74357319 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; David Stallard ; Prem Natarajan

Abstract: Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor, time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possible training corpus by choosing informative, nonredundant source sentences from an available candidate pool for manual translation. We present a novel, discriminative sample selection strategy that preferentially selects batches of candidate sentences with constructs that lead to erroneous translations on a held-out development set. The proposed strategy supports a built-in diversity mechanism that reduces redundancy in the selected batches. Simulation experiments on English-to-Pashto and Spanish-to-English translation tasks demon- strate the superiority of the proposed approach to a number of competing techniques, such as random selection, dissimilarity-based selection, as well as a recently proposed semisupervised active learning strategy.

6 0.74295413 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

7 0.73992991 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

8 0.73919296 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

9 0.73576301 2 emnlp-2010-A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model

10 0.73462558 109 emnlp-2010-Translingual Document Representations from Discriminative Projections

11 0.73421955 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

12 0.73041743 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

13 0.72921211 63 emnlp-2010-Improving Translation via Targeted Paraphrasing

14 0.7285043 114 emnlp-2010-Unsupervised Parse Selection for HPSG

15 0.72848195 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

16 0.72788179 84 emnlp-2010-NLP on Spoken Documents Without ASR

17 0.72695559 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

18 0.72653157 43 emnlp-2010-Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping

19 0.72610813 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

20 0.72596675 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation