nips nips2011 nips2011-53 knowledge-graph by maker-knowledge-mining

53 nips-2011-Co-Training for Domain Adaptation

Source: pdf

Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer

Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most conﬁdent. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA signiﬁcantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. [sent-7, score-0.846]

2 In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. [sent-8, score-0.962]

3 In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most conﬁdent. [sent-9, score-0.962]

4 Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. [sent-12, score-0.898]

5 Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. [sent-15, score-0.279]

6 1 Introduction Domain adaptation addresses the problem of generalizing from a source distribution for which we have ample labeled training data to a target distribution for which we have little or no labels [3, 14, 28]. [sent-16, score-0.924]

7 Domain adaptation is of practical importance in many areas of applied machine learning, ranging from computational biology [17] to natural language processing [11, 19] to computer vision [23]. [sent-17, score-0.236]

8 In this work, we focus primarily on domain adaptation problems that are characterized by missing features. [sent-18, score-0.38]

9 In this situation, most domain adaptation algorithms seek to eliminate the difference between source and target distributions, either by re-weighting source instances [14, 18] or learning a new feature representation [6, 28]. [sent-22, score-1.08]

10 Our method seeks to slowly adapt its training set from the source to the target domain, using ideas from co-training. [sent-24, score-0.509]

11 We accomplish this in two ways: First, we train on our own output in rounds, where at each round, we include in our training data the target instances we are most conﬁdent of. [sent-25, score-0.399]

12 Second, we select a subset of shared source and target features based on their compatibility. [sent-26, score-0.521]

13 Different from most previous work on selecting features for domain adaptation, the compatibility is measured across the training set and the unlabeled set, instead of across the two domains. [sent-27, score-0.465]

14 As more target instances are added to the training set, target speciﬁc features become compatible across the two sets, therefore are included in the predictor. [sent-28, score-0.758]

15 By allowing us to slowly change our training data from source to target, CODA has an advantage over representation-learning algorithms [6, 28], since they must decide a priori what the best representation is. [sent-33, score-0.25]

16 In contrast, each iteration of CODA can choose exactly those few target features which can be related to the current (source and pseudo-labeled target) training set. [sent-34, score-0.425]

17 [4] CODA improves the state-of-the-art cross widely varying amounts of target labeled data in 65 out of 84 settings. [sent-36, score-0.496]

18 The source data is fully labeled DS = {(x1 , y1 ), . [sent-38, score-0.335]

19 The target data is sampled from PT (X, Y ) and is divided into labeled DT = d u d {(x1 , y1 ), . [sent-42, score-0.452]

20 (1) (x,y)∈D If trained on data sampled from PS (X, Y ), logistic regression models the distribution PS (Y |X) [13] through Ph (Y = y|X = x; w) = (1 + e−w xy )−1 . [sent-59, score-0.257]

21 In this paper, our goal is to adapt this classiﬁer to the target distribution PT (Y |X). [sent-60, score-0.279]

22 3 Method In this section, we begin with a semi-supervised approach and describe the rote-learning procedure to automatically annotate target domain inputs. [sent-61, score-0.468]

23 The algorithm maintains and grows a training set that is iteratively adapted to the target domain. [sent-62, score-0.327]

24 In logistic regression, if y = sign(h(x)) is ˆ the prediction for an input x, the probability Ph (Y = y |X = x; w) is a natural metric of certainty ˆ (as h(x) can be interpreted as a probability for x to be of label +1), but other methods [22] can be used. [sent-70, score-0.197]

25 During training one maintains a labeled training set L and an unlabeled test set U , initialized as l u L = DS ∪ DT and U = DT . [sent-72, score-0.372]

26 The c most conﬁdent predictions on U are moved to L for the next iteration, labeled by the prediction of sign(hw ). [sent-74, score-0.224]

27 Move up-to c conﬁdent inputs xi from U to L, labeled as sign(h(xi )). [sent-80, score-0.247]

28 In domain adaptation, the training data is no longer representative of the test data. [sent-83, score-0.26]

29 Here, the bigram feature “must read” is indicative of a positive opinion within the source (“books”) domain, but rarely appears in the target (“dvd”) domain. [sent-86, score-0.532]

30 For example, after some iterations, the classiﬁer can pick features that are never present in the source domain, but which have entered L through the rote-learning procedure. [sent-92, score-0.242]

31 A feature that switches polarization across domains (and therefore is “malicious”) has a score ∆L,U,w (α) > 1 (in the extreme case if it is the label in L and the inverted label in U , its score would be 2). [sent-109, score-0.25]

32 As we have very few labeled inputs from the target domain in the early iterations, stronger regularization is imposed so that only features shared across the two domains are used. [sent-116, score-0.899]

33 When more and more inputs from the target domain are included in the training set, we gradually decrease the regularization to accommodate target speciﬁc features. [sent-117, score-0.895]

34 We want to add inputs xi that contain additional features, which were not used to obtain the prediction hw (xi ) and would enrich the training set L. [sent-127, score-0.25]

35 If the exact labels of the inputs in U were known, a good active learning [26] strategy would be to move inputs to L on which the current classiﬁer hw is most uncertain. [sent-128, score-0.343]

36 Deﬁne cu (x) as a conﬁdence indicator function (for some conﬁdence threshold τ > 0)4 cu (x) = 1 0 if p(ˆ|x; u) > τ y otherwise, (7) and cv respectively. [sent-162, score-0.322]

37 Then the -expanding condition translates to [cu (x)¯v (x) + cu (x)cv (x)] ≥ min c ¯ cu (x)cv (x), x∈U x∈U cu (x)¯v (x) , ¯ c (8) x∈U for some > 0. [sent-163, score-0.408]

38 Here, cu (x) = 1 − cu (x) indicates that classiﬁer hu is not conﬁdent about input x. [sent-164, score-0.289]

39 This representation enables us to train two logistic regression classiﬁers, both with small loss on the labeled data set, while satisfying two constraints to ensure feature decomposition and -expandability. [sent-168, score-0.466]

40 4 Results We evaluate our algorithm together with several other domain adaptation algorithms on the “Amazon reviews” benchmark data sets [6]. [sent-176, score-0.4]

41 The data from four domains results in 12 directed adaptation tasks (e. [sent-182, score-0.322]

42 Each domain adaptation task consists of 2, 000 labeled source inputs and around 4, 000 unlabeled target test inputs (varying slightly between tasks). [sent-185, score-1.285]

43 We let the amount of labeled target data vary from 0 to 1600. [sent-186, score-0.452]

44 For each setting with target labels we ran 10 experiments with different, randomly chosen, labeled instances. [sent-187, score-0.479]

45 To reduce the dimensionality, we only use features that appear at least 10 times in a particular domain adaptation task (with approximately 40, 000 features remaining). [sent-194, score-0.54]

46 7 Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0 50 100 200 400 800 Number of target labeled data 0. [sent-209, score-0.452]

47 75 1600 0 50 100 200 400 800 Number of target labeled data 1600 Figure 1: Relative test-error reduction over logistic regression, averaged across all 12 domain adaptation tasks, as a function of the target training set size. [sent-211, score-1.339]

48 Right: A comparison of CODA with four state-of-the-art domain adaptation algorithms. [sent-217, score-0.38]

49 CODA leads to particularly strong improvements under little target supervision. [sent-218, score-0.279]

50 As a ﬁrst experiment, we compare the three algorithms from Section 3 and logistic regression as a baseline. [sent-219, score-0.212]

51 For logistic regression, we ignore the difference between source and target distribution, and train a classiﬁer on the union of both labeled data sets. [sent-221, score-0.774]

52 Our second baseline is self-training, which adds self-training to logistic regression – as described in section 3. [sent-224, score-0.212]

53 We start with the set of labeled instances from source and target domain, and gradually add conﬁdent predictions to the training set from the unlabeled target domain (without regularization). [sent-226, score-1.245]

54 The left plot in ﬁgure 1 shows the relative classiﬁcation errors of these four algorithms averaged over all 12 domain adaptation tasks, under varying amounts of target labels. [sent-232, score-0.747]

55 A second trend is that the relative improvement over logistic regression reduces as more labeled target data becomes available. [sent-235, score-0.683]

56 This is not surprising, as with sufﬁcient target labels the task turns into a classical supervised learning problem and the source data becomes irrelevant. [sent-236, score-0.508]

57 As a second experiment, we compare CODA against three state-of-the-art domain adaptation algorithms. [sent-237, score-0.38]

58 Coupled subspaces, as described in [6], does not utilize labeled target data and its result is depicted as a single point. [sent-241, score-0.452]

59 Figure 3 shows the individual results on all the 12 adaptation tasks with absolute classiﬁcation error rates. [sent-243, score-0.262]

60 The error bars show the standard deviation across the 10 runs with different labeled instances. [sent-244, score-0.211]

61 EasyAdapt and EasyAdapt++, both consistently improve over logistic regression once sufﬁcient target data is available. [sent-245, score-0.511]

62 It is noteworthy that, on average, CODA outperforms the other algorithms in almost all settings when 800 labeled target points or less are present. [sent-246, score-0.432]

63 With 1600 labeled target points all algorithms perform similar to the baseline and additional source data is irrelevant. [sent-247, score-0.614]

64 Concerning computational requirements, it is fair to say that CODA is signiﬁcantly slower than the other algorithms, as each iteration is of comparable complexity as logistic regression or EasyAdapt. [sent-249, score-0.23]

65 4 Source heavy Ratio of used features r(w) 400 target labels 1. [sent-259, score-0.454]

66 4 Ratio of used features Ratio of usedof used features Ratio features (source/target) Ratio of used features (source/target) 0 target labels 1. [sent-260, score-0.646]

67 9 20 40 60 Iterations 80 100 Figure 2: The ratio of the average number of used features between source and target inputs (9), tracked throughout the CODA optimization. [sent-265, score-0.645]

68 The three plots show the same statistic at different amounts of target labels. [sent-266, score-0.306]

69 Initially, an input from the source domain has on average 10-35% more features that are used by the classiﬁer than a target input. [sent-267, score-0.686]

70 The graph shows the geometric mean across all adaptation tasks. [sent-269, score-0.253]

71 With no target data available (left plot), the early spike in source dominance is more pronounced and decreases when more target labels are available (middle and right plot). [sent-270, score-0.805]

72 In typical domain adaptation settings this is generally not a problem, as training sets tend to be small. [sent-271, score-0.428]

73 We can denote the ratio between the average number of used features in labeled training inputs over those in unlabeled target inputs as r(w) = 1 l |DS | 1 l |DT | l xs ∈DS l xt ∈DT δ(w) δ(xs ) δ(w) δ(xt ) . [sent-276, score-0.874]

74 The three plots show the same statistic under varying amounts of target labels. [sent-278, score-0.323]

75 For example in the case with zero labeled target data, during early iterations the average source input contains 20 − 35% more used features relative to target inputs. [sent-280, score-0.997]

76 This source-heavy feature distribution changes and eventually turns into target-heavy distribution as the classiﬁer adapts to the target domain. [sent-281, score-0.342]

77 As a second trend, we observe that with more target labels (right plot), this spike in source features is much less pronounced whereas the ﬁnal target-heavy ratio is unchanged but starts earlier. [sent-282, score-0.616]

78 This indicates that as the target labels increase, the classiﬁer makes less use of the source data and relies sooner and more directly on the target signal. [sent-283, score-0.787]

79 5 Related Work and Discussion Domain adaptation algorithms that do not use labeled target domain data are sometimes called unsupervised adaptation algorithms. [sent-284, score-1.047]

80 [5], learns a shared representation under which the source and target distributions are closer than under the ambient feature space [28]. [sent-287, score-0.504]

81 those with unique target features), but they have the advantage over both CODA and representation-learning of learning asymptotically optimal target predictors from only 6 We used a straight-forward MatlabT M implementation. [sent-293, score-0.558]

82 25 50 100 200 400 800 1600 Number of target labeled data Books −> Electronics Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-300, score-0.452]

83 3 50 100 200 400 800 1600 Number of target labeled data Kitchen −> Electronics Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-313, score-0.452]

84 05 50 100 200 400 800 1600 Number of target labeled data Dvd −> Kitchen 0. [sent-318, score-0.452]

85 1 50 100 200 400 800 1600 Number of target labeled data Books −> Kitchen 0. [sent-321, score-0.452]

86 15 0 50 100 200 400 800 1600 Number of target labeled data Kitchen −> Dvd Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-324, score-0.452]

87 3 50 100 200 400 800 1600 Number of target labeled data Dvd −> Electronics 0. [sent-325, score-0.452]

88 1 50 100 200 400 800 1600 Number of target labeled data Electronics −> Dvd 0. [sent-330, score-0.452]

89 1 50 100 200 400 800 1600 Number of target labeled data Books −> Dvd Test Error 0 0. [sent-337, score-0.452]

90 18 50 100 200 400 800 1600 Number of target labeled data Electronics −> Kitchen Logistic Regression Coupled EasyAdapt EasyAdapt++ CODA 0. [sent-350, score-0.452]

91 1 0 50 100 200 400 800 1600 Number of target labeled data 0. [sent-360, score-0.452]

92 1 0 50 100 200 400 800 1600 Number of target labeled data 0. [sent-361, score-0.452]

93 08 0 50 100 200 400 800 1600 Number of target labeled data Figure 3: The individual results on all domain adaptation tasks under varying amounts of labeled target data. [sent-362, score-1.335]

94 All settings with existing labeled target data were averaged over 10 runs (with randomly selected labeled instances). [sent-364, score-0.605]

95 In natural language processing, a ﬁnal type of very successful algorithm self-trains on its own target predictions to automatically annotate new target domain features [19]. [sent-368, score-0.877]

96 The ﬁnal set of domain adaptation algorithms, which we compared against but did not describe, are those which actively seek to minimize the labeling divergence between domains using multi-task techniques [1, 8, 9, 12, 21, 27]. [sent-371, score-0.457]

97 Most prominently, Daum´ [11] trains separate source and target e models, but regularizes these models to be close to one another. [sent-372, score-0.441]

98 The EasyAdapt++ variant of this algorithm, which we compared against, generalizes this to the semi-supervised setting by making the assumption that for unlabeled target instances, the tasks should be similar. [sent-373, score-0.402]

99 Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classiﬁcation. [sent-400, score-0.283]

100 Domain adaptation of conditional probability models via feature subsetting. [sent-535, score-0.278]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('coda', 0.514), ('easyadapt', 0.49), ('target', 0.279), ('adaptation', 0.215), ('domain', 0.165), ('source', 0.162), ('labeled', 0.153), ('logistic', 0.142), ('cu', 0.136), ('seda', 0.126), ('blitzer', 0.114), ('coupled', 0.112), ('books', 0.108), ('dent', 0.106), ('er', 0.097), ('unlabeled', 0.096), ('inputs', 0.094), ('dvd', 0.089), ('hw', 0.089), ('kitchen', 0.087), ('classi', 0.087), ('features', 0.08), ('electronics', 0.076), ('regression', 0.07), ('sentiment', 0.068), ('linguistics', 0.064), ('feature', 0.063), ('domains', 0.06), ('ph', 0.051), ('cv', 0.05), ('ps', 0.049), ('pcc', 0.048), ('heavy', 0.048), ('training', 0.048), ('labels', 0.047), ('ers', 0.047), ('dt', 0.042), ('con', 0.041), ('views', 0.041), ('across', 0.038), ('weinberger', 0.038), ('label', 0.036), ('instances', 0.034), ('association', 0.032), ('appliances', 0.032), ('cotraining', 0.032), ('pt', 0.031), ('ds', 0.031), ('ratio', 0.03), ('regularization', 0.03), ('exclusive', 0.029), ('predictions', 0.029), ('bigram', 0.028), ('dvds', 0.028), ('test', 0.027), ('amounts', 0.027), ('tasks', 0.027), ('republic', 0.026), ('trained', 0.025), ('iterations', 0.025), ('plot', 0.025), ('mutually', 0.024), ('stars', 0.024), ('annotate', 0.024), ('czech', 0.024), ('prague', 0.024), ('violated', 0.023), ('moved', 0.023), ('argminw', 0.023), ('louis', 0.023), ('daume', 0.023), ('dence', 0.022), ('selection', 0.021), ('language', 0.021), ('balcan', 0.021), ('pseudo', 0.021), ('predictive', 0.021), ('data', 0.02), ('error', 0.02), ('gaps', 0.02), ('blum', 0.02), ('slowly', 0.02), ('sign', 0.019), ('prediction', 0.019), ('relative', 0.019), ('move', 0.019), ('iteration', 0.018), ('train', 0.018), ('gure', 0.018), ('meeting', 0.018), ('pronounced', 0.018), ('chen', 0.018), ('split', 0.017), ('varying', 0.017), ('hu', 0.017), ('management', 0.017), ('subspaces', 0.017), ('inverted', 0.017), ('opposite', 0.017), ('minimize', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 53 nips-2011-Co-Training for Domain Adaptation

Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer

2 0.34204254 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation

Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye

Abstract: Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution but may have plenty of labeled data from multiple related sources with different distributions. The difference in distributions may be both in marginal and conditional probabilities. Most of the existing domain adaptation work focuses on the marginal probability distribution difference between the domains, assuming that the conditional probabilities are similar. However in many real world applications, conditional probability distribution differences are as commonplace as marginal probability differences. In this paper we propose a two-stage domain adaptation methodology which combines weighted data from multiple sources based on marginal probability differences (ﬁrst stage) as well as conditional probability differences (second stage), with the target domain data. The weights for minimizing the marginal probability differences are estimated independently, while the weights for minimizing conditional probability differences are computed simultaneously by exploiting the potential interaction among multiple sources. We also provide a theoretical analysis on the generalization performance of the proposed multi-source domain adaptation formulation using the weighted Rademacher complexity measure. Empirical comparisons with existing state-of-the-art domain adaptation methods using three real-world datasets demonstrate the effectiveness of the proposed approach. 1

3 0.12230819 291 nips-2011-Transfer from Multiple MDPs

Author: Alessandro Lazaric, Marcello Restelli

Abstract: Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms. A simple and effective approach is to transfer samples from source tasks and include them in the training set used to solve a target task. In this paper, we investigate the theoretical properties of this transfer method and we introduce novel algorithms adapting the transfer process on the basis of the similarity between source and target tasks. Finally, we report illustrative experimental results in a continuous chain problem.

4 0.089716256 219 nips-2011-Predicting response time and error rates in visual search

Author: Bo Chen, Vidhya Navalpakkam, Pietro Perona

Abstract: A model of human visual search is proposed. It predicts both response time (RT) and error rates (RT) as a function of image parameters such as target contrast and clutter. The model is an ideal observer, in that it optimizes the Bayes ratio of target present vs target absent. The ratio is computed on the ﬁring pattern of V1/V2 neurons, modeled by Poisson distributions. The optimal mechanism for integrating information over time is shown to be a ‘soft max’ of diffusions, computed over the visual ﬁeld by ‘hypercolumns’ of neurons that share the same receptive ﬁeld and have different response properties to image features. An approximation of the optimal Bayesian observer, based on integrating local decisions, rather than diffusions, is also derived; it is shown experimentally to produce very similar predictions to the optimal observer in common psychophysics conditions. A psychophyisics experiment is proposed that may discriminate between which mechanism is used in the human brain. A B C Figure 1: Visual search. (A) Clutter and camouﬂage make visual search difﬁcult. (B,C) Psychologists and neuroscientists build synthetic displays to study visual search. In (B) the target ‘pops out’ (∆θ = 450 ), while in (C) the target requires more time to be detected (∆θ = 100 ) [1]. 1

5 0.086531378 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification

Author: Ichiro Takeuchi, Masashi Sugiyama

Abstract: We consider feature selection and weighting for nearest neighbor classiﬁers. A technical challenge in this scenario is how to cope with discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efﬁciently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the ﬁrst algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efﬁcient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments. 1

6 0.075871788 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

7 0.071124144 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

8 0.06717582 19 nips-2011-Active Classification based on Value of Classifier

9 0.063810222 258 nips-2011-Sparse Bayesian Multi-Task Learning

10 0.06281881 28 nips-2011-Agnostic Selective Classification

11 0.06024237 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

12 0.059599038 261 nips-2011-Sparse Filtering

13 0.059208754 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition

14 0.053873498 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers

15 0.053826008 4 nips-2011-A Convergence Analysis of Log-Linear Training

16 0.05320235 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

17 0.051747978 244 nips-2011-Selecting Receptive Fields in Deep Networks

18 0.050383318 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

19 0.049282946 54 nips-2011-Co-regularized Multi-view Spectral Clustering

20 0.049115654 180 nips-2011-Multiple Instance Filtering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, 0.039), (2, -0.033), (3, 0.015), (4, 0.014), (5, 0.038), (6, 0.06), (7, -0.117), (8, -0.116), (9, -0.009), (10, -0.089), (11, 0.013), (12, 0.069), (13, 0.114), (14, -0.002), (15, -0.065), (16, -0.119), (17, -0.084), (18, 0.21), (19, -0.025), (20, -0.187), (21, 0.256), (22, -0.07), (23, -0.121), (24, 0.0), (25, -0.009), (26, -0.057), (27, 0.027), (28, -0.098), (29, -0.052), (30, 0.086), (31, -0.116), (32, -0.057), (33, -0.036), (34, 0.048), (35, 0.047), (36, 0.036), (37, 0.158), (38, -0.043), (39, 0.064), (40, 0.045), (41, -0.028), (42, -0.073), (43, -0.069), (44, -0.026), (45, 0.049), (46, -0.165), (47, -0.026), (48, 0.065), (49, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9668889 53 nips-2011-Co-Training for Domain Adaptation

Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer

2 0.93528897 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation

Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye

3 0.67391378 291 nips-2011-Transfer from Multiple MDPs

Author: Alessandro Lazaric, Marcello Restelli

4 0.60971057 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers

Author: Luca Oneto, Davide Anguita, Alessandro Ghio, Sandro Ridella

Abstract: We derive here new generalization bounds, based on Rademacher Complexity theory, for model selection and error estimation of linear (kernel) classiﬁers, which exploit the availability of unlabeled samples. In particular, two results are obtained: the ﬁrst one shows that, using the unlabeled samples, the conﬁdence term of the conventional bound can be reduced by a factor of three; the second one shows that the unlabeled samples can be used to obtain much tighter bounds, by building localized versions of the hypothesis class containing the optimal classiﬁer. 1

5 0.59086603 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification

Author: Ichiro Takeuchi, Masashi Sugiyama

6 0.57487053 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies

7 0.55941367 254 nips-2011-Similarity-based Learning via Data Driven Embeddings

8 0.46005484 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

9 0.45862952 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

10 0.44131458 219 nips-2011-Predicting response time and error rates in visual search

11 0.42712167 33 nips-2011-An Exact Algorithm for F-Measure Maximization

12 0.41311446 277 nips-2011-Submodular Multi-Label Learning

13 0.39199269 19 nips-2011-Active Classification based on Value of Classifier

14 0.38968801 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

15 0.38464549 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

16 0.36671981 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

17 0.35848275 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

18 0.35179794 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

19 0.33847997 28 nips-2011-Agnostic Selective Classification

20 0.3311339 181 nips-2011-Multiple Instance Learning on Structured Data

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.093), (4, 0.047), (20, 0.033), (26, 0.027), (31, 0.072), (32, 0.226), (33, 0.032), (43, 0.057), (45, 0.141), (57, 0.037), (74, 0.033), (83, 0.031), (84, 0.014), (99, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78997022 53 nips-2011-Co-Training for Domain Adaptation

Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer

2 0.68503261 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li

Abstract: We present a novel approach to efﬁciently learn a label tree for large scale classiﬁcation with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classiﬁers for each node in the tree. This approach also allows ﬁne grained control over the efﬁciency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classiﬁcation with 10184 classes and 9 million images. We demonstrate signiﬁcant improvements in test accuracy and efﬁciency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1

3 0.6797691 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation

Author: Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye

4 0.65859306 106 nips-2011-Generalizing from Several Related Classification Tasks to a New Unlabeled Sample

Author: Gilles Blanchard, Gyemin Lee, Clayton Scott

Abstract: We consider the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions ﬂuctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernel-based approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on ﬂow cytometry data are presented. 1

5 0.65423834 186 nips-2011-Noise Thresholds for Spectral Clustering

Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh

Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1

6 0.65364289 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

7 0.64939171 80 nips-2011-Efficient Online Learning via Randomized Rounding

8 0.64892668 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

9 0.6481092 22 nips-2011-Active Ranking using Pairwise Comparisons

10 0.64748806 29 nips-2011-Algorithms and hardness results for parallel large margin learning

11 0.64689213 291 nips-2011-Transfer from Multiple MDPs

12 0.64682156 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

13 0.64630246 265 nips-2011-Sparse recovery by thresholded non-negative least squares

14 0.64502656 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

15 0.64472198 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices

16 0.64399379 231 nips-2011-Randomized Algorithms for Comparison-based Search

17 0.64387637 202 nips-2011-On the Universality of Online Mirror Descent

18 0.64329231 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

19 0.6432637 199 nips-2011-On fast approximate submodular minimization

20 0.64143741 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity