nips nips2011 nips2011-53 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. [sent-7, score-0.846]
2 In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. [sent-8, score-0.962]
3 In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. [sent-9, score-0.962]
4 Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. [sent-12, score-0.898]
5 Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. [sent-15, score-0.279]
6 1 Introduction Domain adaptation addresses the problem of generalizing from a source distribution for which we have ample labeled training data to a target distribution for which we have little or no labels [3, 14, 28]. [sent-16, score-0.924]
7 Domain adaptation is of practical importance in many areas of applied machine learning, ranging from computational biology [17] to natural language processing [11, 19] to computer vision [23]. [sent-17, score-0.236]
8 In this work, we focus primarily on domain adaptation problems that are characterized by missing features. [sent-18, score-0.38]
9 In this situation, most domain adaptation algorithms seek to eliminate the difference between source and target distributions, either by re-weighting source instances [14, 18] or learning a new feature representation [6, 28]. [sent-22, score-1.08]
10 Our method seeks to slowly adapt its training set from the source to the target domain, using ideas from co-training. [sent-24, score-0.509]
11 We accomplish this in two ways: First, we train on our own output in rounds, where at each round, we include in our training data the target instances we are most confident of. [sent-25, score-0.399]
12 Second, we select a subset of shared source and target features based on their compatibility. [sent-26, score-0.521]
13 Different from most previous work on selecting features for domain adaptation, the compatibility is measured across the training set and the unlabeled set, instead of across the two domains. [sent-27, score-0.465]
14 As more target instances are added to the training set, target specific features become compatible across the two sets, therefore are included in the predictor. [sent-28, score-0.758]
15 By allowing us to slowly change our training data from source to target, CODA has an advantage over representation-learning algorithms [6, 28], since they must decide a priori what the best representation is. [sent-33, score-0.25]
16 In contrast, each iteration of CODA can choose exactly those few target features which can be related to the current (source and pseudo-labeled target) training set. [sent-34, score-0.425]
17 [4] CODA improves the state-of-the-art cross widely varying amounts of target labeled data in 65 out of 84 settings. [sent-36, score-0.496]
18 The source data is fully labeled DS = {(x1 , y1 ), . [sent-38, score-0.335]
19 The target data is sampled from PT (X, Y ) and is divided into labeled DT = d u d {(x1 , y1 ), . [sent-42, score-0.452]
20 (1) (x,y)∈D If trained on data sampled from PS (X, Y ), logistic regression models the distribution PS (Y |X) [13] through Ph (Y = y|X = x; w) = (1 + e−w xy )−1 . [sent-59, score-0.257]
21 In this paper, our goal is to adapt this classifier to the target distribution PT (Y |X). [sent-60, score-0.279]
22 3 Method In this section, we begin with a semi-supervised approach and describe the rote-learning procedure to automatically annotate target domain inputs. [sent-61, score-0.468]
23 The algorithm maintains and grows a training set that is iteratively adapted to the target domain. [sent-62, score-0.327]
24 In logistic regression, if y = sign(h(x)) is ˆ the prediction for an input x, the probability Ph (Y = y |X = x; w) is a natural metric of certainty ˆ (as h(x) can be interpreted as a probability for x to be of label +1), but other methods [22] can be used. [sent-70, score-0.197]
25 During training one maintains a labeled training set L and an unlabeled test set U , initialized as l u L = DS ∪ DT and U = DT . [sent-72, score-0.372]
26 The c most confident predictions on U are moved to L for the next iteration, labeled by the prediction of sign(hw ). [sent-74, score-0.224]
27 Move up-to c confident inputs xi from U to L, labeled as sign(h(xi )). [sent-80, score-0.247]
28 In domain adaptation, the training data is no longer representative of the test data. [sent-83, score-0.26]
29 Here, the bigram feature “must read” is indicative of a positive opinion within the source (“books”) domain, but rarely appears in the target (“dvd”) domain. [sent-86, score-0.532]
30 For example, after some iterations, the classifier can pick features that are never present in the source domain, but which have entered L through the rote-learning procedure. [sent-92, score-0.242]
31 A feature that switches polarization across domains (and therefore is “malicious”) has a score ∆L,U,w (α) > 1 (in the extreme case if it is the label in L and the inverted label in U , its score would be 2). [sent-109, score-0.25]
32 As we have very few labeled inputs from the target domain in the early iterations, stronger regularization is imposed so that only features shared across the two domains are used. [sent-116, score-0.899]
33 When more and more inputs from the target domain are included in the training set, we gradually decrease the regularization to accommodate target specific features. [sent-117, score-0.895]
34 We want to add inputs xi that contain additional features, which were not used to obtain the prediction hw (xi ) and would enrich the training set L. [sent-127, score-0.25]
35 If the exact labels of the inputs in U were known, a good active learning [26] strategy would be to move inputs to L on which the current classifier hw is most uncertain. [sent-128, score-0.343]
36 Define cu (x) as a confidence indicator function (for some confidence threshold τ > 0)4 cu (x) = 1 0 if p(ˆ|x; u) > τ y otherwise, (7) and cv respectively. [sent-162, score-0.322]
37 Then the -expanding condition translates to [cu (x)¯v (x) + cu (x)cv (x)] ≥ min c ¯ cu (x)cv (x), x∈U x∈U cu (x)¯v (x) , ¯ c (8) x∈U for some > 0. [sent-163, score-0.408]
38 Here, cu (x) = 1 − cu (x) indicates that classifier hu is not confident about input x. [sent-164, score-0.289]
39 This representation enables us to train two logistic regression classifiers, both with small loss on the labeled data set, while satisfying two constraints to ensure feature decomposition and -expandability. [sent-168, score-0.466]
40 4 Results We evaluate our algorithm together with several other domain adaptation algorithms on the “Amazon reviews” benchmark data sets [6]. [sent-176, score-0.4]
41 The data from four domains results in 12 directed adaptation tasks (e. [sent-182, score-0.322]
42 Each domain adaptation task consists of 2, 000 labeled source inputs and around 4, 000 unlabeled target test inputs (varying slightly between tasks). [sent-185, score-1.285]
43 We let the amount of labeled target data vary from 0 to 1600. [sent-186, score-0.452]
44 For each setting with target labels we ran 10 experiments with different, randomly chosen, labeled instances. [sent-187, score-0.479]
45 To reduce the dimensionality, we only use features that appear at least 10 times in a particular domain adaptation task (with approximately 40, 000 features remaining). [sent-194, score-0.54]
46 7 Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0 50 100 200 400 800 Number of target labeled data 0. [sent-209, score-0.452]
47 75 1600 0 50 100 200 400 800 Number of target labeled data 1600 Figure 1: Relative test-error reduction over logistic regression, averaged across all 12 domain adaptation tasks, as a function of the target training set size. [sent-211, score-1.339]
48 Right: A comparison of CODA with four state-of-the-art domain adaptation algorithms. [sent-217, score-0.38]
49 CODA leads to particularly strong improvements under little target supervision. [sent-218, score-0.279]
50 As a first experiment, we compare the three algorithms from Section 3 and logistic regression as a baseline. [sent-219, score-0.212]
51 For logistic regression, we ignore the difference between source and target distribution, and train a classifier on the union of both labeled data sets. [sent-221, score-0.774]
52 Our second baseline is self-training, which adds self-training to logistic regression – as described in section 3. [sent-224, score-0.212]
53 We start with the set of labeled instances from source and target domain, and gradually add confident predictions to the training set from the unlabeled target domain (without regularization). [sent-226, score-1.245]
54 The left plot in figure 1 shows the relative classification errors of these four algorithms averaged over all 12 domain adaptation tasks, under varying amounts of target labels. [sent-232, score-0.747]
55 A second trend is that the relative improvement over logistic regression reduces as more labeled target data becomes available. [sent-235, score-0.683]
56 This is not surprising, as with sufficient target labels the task turns into a classical supervised learning problem and the source data becomes irrelevant. [sent-236, score-0.508]
57 As a second experiment, we compare CODA against three state-of-the-art domain adaptation algorithms. [sent-237, score-0.38]
58 Coupled subspaces, as described in [6], does not utilize labeled target data and its result is depicted as a single point. [sent-241, score-0.452]
59 Figure 3 shows the individual results on all the 12 adaptation tasks with absolute classification error rates. [sent-243, score-0.262]
60 The error bars show the standard deviation across the 10 runs with different labeled instances. [sent-244, score-0.211]
61 EasyAdapt and EasyAdapt++, both consistently improve over logistic regression once sufficient target data is available. [sent-245, score-0.511]
62 It is noteworthy that, on average, CODA outperforms the other algorithms in almost all settings when 800 labeled target points or less are present. [sent-246, score-0.432]
63 With 1600 labeled target points all algorithms perform similar to the baseline and additional source data is irrelevant. [sent-247, score-0.614]
64 Concerning computational requirements, it is fair to say that CODA is significantly slower than the other algorithms, as each iteration is of comparable complexity as logistic regression or EasyAdapt. [sent-249, score-0.23]
65 4 Source heavy Ratio of used features r(w) 400 target labels 1. [sent-259, score-0.454]
66 4 Ratio of used features Ratio of usedof used features Ratio features (source/target) Ratio of used features (source/target) 0 target labels 1. [sent-260, score-0.646]
67 9 20 40 60 Iterations 80 100 Figure 2: The ratio of the average number of used features between source and target inputs (9), tracked throughout the CODA optimization. [sent-265, score-0.645]
68 The three plots show the same statistic at different amounts of target labels. [sent-266, score-0.306]
69 Initially, an input from the source domain has on average 10-35% more features that are used by the classifier than a target input. [sent-267, score-0.686]
70 The graph shows the geometric mean across all adaptation tasks. [sent-269, score-0.253]
71 With no target data available (left plot), the early spike in source dominance is more pronounced and decreases when more target labels are available (middle and right plot). [sent-270, score-0.805]
72 In typical domain adaptation settings this is generally not a problem, as training sets tend to be small. [sent-271, score-0.428]
73 We can denote the ratio between the average number of used features in labeled training inputs over those in unlabeled target inputs as r(w) = 1 l |DS | 1 l |DT | l xs ∈DS l xt ∈DT δ(w) δ(xs ) δ(w) δ(xt ) . [sent-276, score-0.874]
74 The three plots show the same statistic under varying amounts of target labels. [sent-278, score-0.323]
75 For example in the case with zero labeled target data, during early iterations the average source input contains 20 − 35% more used features relative to target inputs. [sent-280, score-0.997]
76 This source-heavy feature distribution changes and eventually turns into target-heavy distribution as the classifier adapts to the target domain. [sent-281, score-0.342]
77 As a second trend, we observe that with more target labels (right plot), this spike in source features is much less pronounced whereas the final target-heavy ratio is unchanged but starts earlier. [sent-282, score-0.616]
78 This indicates that as the target labels increase, the classifier makes less use of the source data and relies sooner and more directly on the target signal. [sent-283, score-0.787]
79 5 Related Work and Discussion Domain adaptation algorithms that do not use labeled target domain data are sometimes called unsupervised adaptation algorithms. [sent-284, score-1.047]
80 [5], learns a shared representation under which the source and target distributions are closer than under the ambient feature space [28]. [sent-287, score-0.504]
81 those with unique target features), but they have the advantage over both CODA and representation-learning of learning asymptotically optimal target predictors from only 6 We used a straight-forward MatlabT M implementation. [sent-293, score-0.558]
82 25 50 100 200 400 800 1600 Number of target labeled data Books −> Electronics Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-300, score-0.452]
83 3 50 100 200 400 800 1600 Number of target labeled data Kitchen −> Electronics Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-313, score-0.452]
84 05 50 100 200 400 800 1600 Number of target labeled data Dvd −> Kitchen 0. [sent-318, score-0.452]
85 1 50 100 200 400 800 1600 Number of target labeled data Books −> Kitchen 0. [sent-321, score-0.452]
86 15 0 50 100 200 400 800 1600 Number of target labeled data Kitchen −> Dvd Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-324, score-0.452]
87 3 50 100 200 400 800 1600 Number of target labeled data Dvd −> Electronics 0. [sent-325, score-0.452]
88 1 50 100 200 400 800 1600 Number of target labeled data Electronics −> Dvd 0. [sent-330, score-0.452]
89 1 50 100 200 400 800 1600 Number of target labeled data Books −> Dvd Test Error 0 0. [sent-337, score-0.452]
90 18 50 100 200 400 800 1600 Number of target labeled data Electronics −> Kitchen Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-350, score-0.452]
91 1 0 50 100 200 400 800 1600 Number of target labeled data 0. [sent-360, score-0.452]
92 1 0 50 100 200 400 800 1600 Number of target labeled data 0. [sent-361, score-0.452]
93 08 0 50 100 200 400 800 1600 Number of target labeled data Figure 3: The individual results on all domain adaptation tasks under varying amounts of labeled target data. [sent-362, score-1.335]
94 All settings with existing labeled target data were averaged over 10 runs (with randomly selected labeled instances). [sent-364, score-0.605]
95 In natural language processing, a final type of very successful algorithm self-trains on its own target predictions to automatically annotate new target domain features [19]. [sent-368, score-0.877]
96 The final set of domain adaptation algorithms, which we compared against but did not describe, are those which actively seek to minimize the labeling divergence between domains using multi-task techniques [1, 8, 9, 12, 21, 27]. [sent-371, score-0.457]
97 Most prominently, Daum´ [11] trains separate source and target e models, but regularizes these models to be close to one another. [sent-372, score-0.441]
98 The EasyAdapt++ variant of this algorithm, which we compared against, generalizes this to the semi-supervised setting by making the assumption that for unlabeled target instances, the tasks should be similar. [sent-373, score-0.402]
99 Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. [sent-400, score-0.283]
100 Domain adaptation of conditional probability models via feature subsetting. [sent-535, score-0.278]
wordName wordTfidf (topN-words)
[('coda', 0.514), ('easyadapt', 0.49), ('target', 0.279), ('adaptation', 0.215), ('domain', 0.165), ('source', 0.162), ('labeled', 0.153), ('logistic', 0.142), ('cu', 0.136), ('seda', 0.126), ('blitzer', 0.114), ('coupled', 0.112), ('books', 0.108), ('dent', 0.106), ('er', 0.097), ('unlabeled', 0.096), ('inputs', 0.094), ('dvd', 0.089), ('hw', 0.089), ('kitchen', 0.087), ('classi', 0.087), ('features', 0.08), ('electronics', 0.076), ('regression', 0.07), ('sentiment', 0.068), ('linguistics', 0.064), ('feature', 0.063), ('domains', 0.06), ('ph', 0.051), ('cv', 0.05), ('ps', 0.049), ('pcc', 0.048), ('heavy', 0.048), ('training', 0.048), ('labels', 0.047), ('ers', 0.047), ('dt', 0.042), ('con', 0.041), ('views', 0.041), ('across', 0.038), ('weinberger', 0.038), ('label', 0.036), ('instances', 0.034), ('association', 0.032), ('appliances', 0.032), ('cotraining', 0.032), ('pt', 0.031), ('ds', 0.031), ('ratio', 0.03), ('regularization', 0.03), ('exclusive', 0.029), ('predictions', 0.029), ('bigram', 0.028), ('dvds', 0.028), ('test', 0.027), ('amounts', 0.027), ('tasks', 0.027), ('republic', 0.026), ('trained', 0.025), ('iterations', 0.025), ('plot', 0.025), ('mutually', 0.024), ('stars', 0.024), ('annotate', 0.024), ('czech', 0.024), ('prague', 0.024), ('violated', 0.023), ('moved', 0.023), ('argminw', 0.023), ('louis', 0.023), ('daume', 0.023), ('dence', 0.022), ('selection', 0.021), ('language', 0.021), ('balcan', 0.021), ('pseudo', 0.021), ('predictive', 0.021), ('data', 0.02), ('error', 0.02), ('gaps', 0.02), ('blum', 0.02), ('slowly', 0.02), ('sign', 0.019), ('prediction', 0.019), ('relative', 0.019), ('move', 0.019), ('iteration', 0.018), ('train', 0.018), ('gure', 0.018), ('meeting', 0.018), ('pronounced', 0.018), ('chen', 0.018), ('split', 0.017), ('varying', 0.017), ('hu', 0.017), ('management', 0.017), ('subspaces', 0.017), ('inverted', 0.017), ('opposite', 0.017), ('minimize', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 53 nips-2011-Co-Training for Domain Adaptation
Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
2 0.34204254 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye
Abstract: Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1
3 0.12230819 291 nips-2011-Transfer from Multiple MDPs
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
4 0.089716256 219 nips-2011-Predicting response time and error rates in visual search
Author: Bo Chen, Vidhya Navalpakkam, Pietro Perona
Abstract: A model of human visual search is proposed. It predicts both response time (RT) and error rates (RT) as a function of image parameters such as target contrast and clutter. The model is an ideal observer, in that it optimizes the Bayes ratio of target present vs target absent. The ratio is computed on the firing pattern of V1/V2 neurons, modeled by Poisson distributions. The optimal mechanism for integrating information over time is shown to be a ‘soft max’ of diffusions, computed over the visual field by ‘hypercolumns’ of neurons that share the same receptive field and have different response properties to image features. An approximation of the optimal Bayesian observer, based on integrating local decisions, rather than diffusions, is also derived; it is shown experimentally to produce very similar predictions to the optimal observer in common psychophysics conditions. A psychophyisics experiment is proposed that may discriminate between which mechanism is used in the human brain. A B C Figure 1: Visual search. (A) Clutter and camouflage make visual search difficult. (B,C) Psychologists and neuroscientists build synthetic displays to study visual search. In (B) the target ‘pops out’ (∆θ = 450 ), while in (C) the target requires more time to be detected (∆θ = 100 ) [1]. 1
5 0.086531378 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
Author: Ichiro Takeuchi, Masashi Sugiyama
Abstract: We consider feature selection and weighting for nearest neighbor classifiers. A technical challenge in this scenario is how to cope with discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efficiently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the first algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efficient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments. 1
6 0.075871788 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
7 0.071124144 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
8 0.06717582 19 nips-2011-Active Classification based on Value of Classifier
9 0.063810222 258 nips-2011-Sparse Bayesian Multi-Task Learning
10 0.06281881 28 nips-2011-Agnostic Selective Classification
11 0.06024237 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
12 0.059599038 261 nips-2011-Sparse Filtering
13 0.059208754 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
14 0.053873498 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
15 0.053826008 4 nips-2011-A Convergence Analysis of Log-Linear Training
16 0.05320235 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
17 0.051747978 244 nips-2011-Selecting Receptive Fields in Deep Networks
18 0.050383318 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
19 0.049282946 54 nips-2011-Co-regularized Multi-view Spectral Clustering
20 0.049115654 180 nips-2011-Multiple Instance Filtering
topicId topicWeight
[(0, 0.161), (1, 0.039), (2, -0.033), (3, 0.015), (4, 0.014), (5, 0.038), (6, 0.06), (7, -0.117), (8, -0.116), (9, -0.009), (10, -0.089), (11, 0.013), (12, 0.069), (13, 0.114), (14, -0.002), (15, -0.065), (16, -0.119), (17, -0.084), (18, 0.21), (19, -0.025), (20, -0.187), (21, 0.256), (22, -0.07), (23, -0.121), (24, 0.0), (25, -0.009), (26, -0.057), (27, 0.027), (28, -0.098), (29, -0.052), (30, 0.086), (31, -0.116), (32, -0.057), (33, -0.036), (34, 0.048), (35, 0.047), (36, 0.036), (37, 0.158), (38, -0.043), (39, 0.064), (40, 0.045), (41, -0.028), (42, -0.073), (43, -0.069), (44, -0.026), (45, 0.049), (46, -0.165), (47, -0.026), (48, 0.065), (49, 0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.9668889 53 nips-2011-Co-Training for Domain Adaptation
Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
2 0.93528897 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye
Abstract: Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1
3 0.67391378 291 nips-2011-Transfer from Multiple MDPs
Author: Alessandro Lazaric, Marcello Restelli
Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.
4 0.60971057 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
Author: Luca Oneto, Davide Anguita, Alessandro Ghio, Sandro Ridella
Abstract: We derive here new generalization bounds, based on Rademacher Complexity theory, for model selection and error estimation of linear (kernel) classifiers, which exploit the availability of unlabeled samples. In particular, two results are obtained: the first one shows that, using the unlabeled samples, the confidence term of the conventional bound can be reduced by a factor of three; the second one shows that the unlabeled samples can be used to obtain much tighter bounds, by building localized versions of the hypothesis class containing the optimal classifier. 1
5 0.59086603 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
Author: Ichiro Takeuchi, Masashi Sugiyama
Abstract: We consider feature selection and weighting for nearest neighbor classifiers. A technical challenge in this scenario is how to cope with discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efficiently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the first algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efficient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments. 1
6 0.57487053 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies
7 0.55941367 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
8 0.46005484 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
9 0.45862952 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
10 0.44131458 219 nips-2011-Predicting response time and error rates in visual search
11 0.42712167 33 nips-2011-An Exact Algorithm for F-Measure Maximization
12 0.41311446 277 nips-2011-Submodular Multi-Label Learning
13 0.39199269 19 nips-2011-Active Classification based on Value of Classifier
14 0.38968801 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
15 0.38464549 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison
16 0.36671981 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
17 0.35848275 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
18 0.35179794 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
19 0.33847997 28 nips-2011-Agnostic Selective Classification
20 0.3311339 181 nips-2011-Multiple Instance Learning on Structured Data
topicId topicWeight
[(0, 0.093), (4, 0.047), (20, 0.033), (26, 0.027), (31, 0.072), (32, 0.226), (33, 0.032), (43, 0.057), (45, 0.141), (57, 0.037), (74, 0.033), (83, 0.031), (84, 0.014), (99, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.78997022 53 nips-2011-Co-Training for Domain Adaptation
Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1
2 0.68503261 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
3 0.6797691 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye
Abstract: Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (first stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1
4 0.65859306 106 nips-2011-Generalizing from Several Related Classification Tasks to a New Unlabeled Sample
Author: Gilles Blanchard, Gyemin Lee, Clayton Scott
Abstract: We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernel-based approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on flow cytometry data are presented. 1
5 0.65423834 186 nips-2011-Noise Thresholds for Spectral Clustering
Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh
Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1
6 0.65364289 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
7 0.64939171 80 nips-2011-Efficient Online Learning via Randomized Rounding
8 0.64892668 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
9 0.6481092 22 nips-2011-Active Ranking using Pairwise Comparisons
10 0.64748806 29 nips-2011-Algorithms and hardness results for parallel large margin learning
11 0.64689213 291 nips-2011-Transfer from Multiple MDPs
12 0.64682156 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
13 0.64630246 265 nips-2011-Sparse recovery by thresholded non-negative least squares
14 0.64502656 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
15 0.64472198 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
16 0.64399379 231 nips-2011-Randomized Algorithms for Comparison-based Search
17 0.64387637 202 nips-2011-On the Universality of Online Mirror Descent
18 0.64329231 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
19 0.6432637 199 nips-2011-On fast approximate submodular minimization
20 0.64143741 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity