nips nips2008 nips2008-241 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
Reference: text
sentIndex sentText sentNum sentScore
1 While the unlabeled samples reflect the target distribution, the labeled samples may be biased. [sent-5, score-0.545]
2 This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. [sent-6, score-0.421]
3 We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. [sent-8, score-0.898]
4 Some of the multiple tasks will likely relate to one another, but one cannot assume that the tasks share a joint conditional distribution of the class label given the input variables. [sent-11, score-0.417]
5 A common method for learning under covariate shift (marginal shift) is to weight the biased trainp ing examples by the test-to-training ratio p test (x) to match the marginal distribution of the test data train (x) [1]. [sent-13, score-0.731]
6 Similar to the idea of learning under marginal shift by weighting the training examples, [8] devise a method for learning under joint shift of covariates and labels over multiple tasks that is based on instancespecific rescaling factors. [sent-17, score-0.704]
7 We generalize this idea to a setting where not only the joint distributions between tasks may differ but also the training and test distribution within each task. [sent-18, score-0.505]
8 Our work is motivated by the targeted advertising problem for which the goal is to predict sociodemographic features (such as gender, age, or marital status) of web users, based on their surfing history. [sent-19, score-0.541]
9 When sociodemographic attributes can be identified, delivering advertisements to users outside the target segment can be avoided. [sent-21, score-0.552]
10 However, for many campaigns – including campaigns for products that are not typically purchased on the web – the sole goal is to deliver the advertisement to customers in the target segment. [sent-23, score-0.491]
11 Each of several tasks z is characterized by an unknown joint distribution p test (x, y|z) = p test (x|z)p(y|x, z) over features x and labels y given the task z. [sent-29, score-0.587]
12 The joint distributions of different tasks may differ arbitrarily but usually some tasks have similar distributions. [sent-30, score-0.441]
13 The test data for task z are governed by p test (x|z). [sent-36, score-0.437]
14 The training data for task z is drawn from a joint distribution p train (x, y|z) = p train (x|z)p(y|x, z) that may differ from the test distribution in terms of the marginal distribution p train (x|z). [sent-42, score-0.938]
15 The training and test marginals may differ arbitrarily, as long as each x with positive p test (x|z) also has a positive p train (x|z). [sent-43, score-0.439]
16 The conditional distribution p(y|x, z) of test and training data is identical for a given task z, but conditionals can differ arbitrarily between tasks. [sent-45, score-0.49]
17 The entire training set over all tasks is governed by the mixed density p train (z)p train (x, y|z). [sent-46, score-0.696]
18 The feature vector x encodes the web surfing behavior of a user of web portal z (task). [sent-53, score-0.49]
19 For a small number of users the sociodemographic target label y (e. [sent-54, score-0.552]
20 For new portals the number of such labeled training instances is initially small. [sent-57, score-0.544]
21 The sociodemographic labels for all users of all portals are to be predicted. [sent-58, score-0.55]
22 The joint distribution p test (x, y|z) can be different between portals since they attract specific populations of users. [sent-59, score-0.386]
23 The training distribution differs from the test distribution because the response to the web surveys is not uniform with respect to the test distribution. [sent-60, score-0.552]
24 Since the completion of surveys cannot be enforced, it is intrinsically impossible to obtain labeled samples that are governed by the test distribution. [sent-61, score-0.507]
25 One reference strategy is to learn individual models for each target task z by minimizing an appropriate loss function on the portion of Lz = {(xi , yi , zi ) ∈ L : zi = z}. [sent-63, score-0.511]
26 In addition, it minimizes the loss with respect to p train (x, y|z); the minimum of this optimization problem will not generally coincide with the minimal loss on p test (x, y|z). [sent-65, score-0.438]
27 The training sample may deviate arbitrarily from the target distribution p test (x, y|z). [sent-67, score-0.579]
28 3 Transfer Learning by Distribution Matching In learning a classifier ft (x) for target task t, we seek to minimize the loss function with respect to p test (x, y|t) = p(x, y|t, s = −1). [sent-70, score-0.522]
29 Our approach is to create a task-specific resampling weight rt (x, y) for each element of the pool of examples. [sent-73, score-0.724]
30 The resampling weights match the pool distribution to the target distribution p(x, y|t, s = −1). [sent-74, score-0.783]
31 The resampled pool is governed by the correct target distribution, but is larger than the labeled sample of the target task. [sent-75, score-1.078]
32 The expected weighted loss with respect to the mixture distribution that governs the pool equals the loss with respect to the target distribution p(x, y|t, s = −1). [sent-77, score-0.811]
33 Equation 5 rearranges some terms and Equation 6 is the expected loss over the distribution of all tasks weighted by rt (x, y). [sent-82, score-0.752]
34 This amounts to minimizing the expected loss with respect to the target distribution p(x, y|t, s = −1). [sent-84, score-0.382]
35 The resampling weights of Equation 2 have an intuitive interpretation: The first fraction accounts for the difference in the joint distributions across tasks, and the second fraction accounts for the covariate shift within the target task. [sent-85, score-0.691]
36 Equation 5 leaves us with the problem of estimating the product of two density ratios rt (x, y) = p(x,y|t,s=1) p(x|t,s=−1) . [sent-86, score-0.421]
37 p(x,y|t,s=1) 1 in terms of a conditional We reformulate the first density ratio rt (x, y) = z p(z|s=1)p(x,y|z,s=1) model p(t|x, y, s = 1). [sent-91, score-0.463]
38 This conditional has the following intuitive meaning: Given that an instance (x, y) has been drawn at random from the pool distribution z p(z|s = 1)p(x, y|z, s = 1) over all tasks (including target task t); the probability that (x, y) originates from p(x, y|t, s = 1) is p(t|x, y, s = 1). [sent-92, score-0.818]
39 The right hand side of Equation 8 can be evaluated based on a model p(t|x, y, s = 1) that discriminates labeled instances of the target task against labeled instances of the pool of examples for all non-target tasks. [sent-97, score-1.106]
40 2 Similar to the first density ratio, the second density ratio rt (x) = p(x|t,s=−1) can be expressed p(x|t,s=1) using a conditional model p(s = 1|x, t). [sent-98, score-0.518]
41 2 Estimation of Discriminative Density Ratios 1 For estimation of rt (x, y) we model p(t|x, y, s = 1) of Equation 8 with a logistic regression model p(t|x, y, s = 1, ut ) = 1 1 + exp(−uT Φ(x, y)) t over model parameters ut using a problem-specific feature mapping Φ(x, y). [sent-104, score-0.801]
42 Empirically, we observe that a separate binary logistic regression model (as described above) for each task yields more accurate results with the drawback of a slightly increased overall training time. [sent-110, score-0.373]
43 Optimization Problem 1 For task t: over parameters ut , maximize log p(t|x, y, s = 1, ut ) + (x,y)∈Lt log(1 − p(t|x, y, s = 1, ut )) − (x,y)∈L\Lt uT ut t . [sent-111, score-0.728]
44 The estimated vector ut leads to the first part of the weighting factor rt (x, y) ∝ p(t|x, y, s = 1, ut ) according to Equation 8. [sent-113, score-0.701]
45 rt (x, y) is normalized so that the weighted ˆ1 ˆ1 1 ˆ1 empirical distribution over the pool L sums to one, |L| (x,y)∈L rt (x, y) = 1. [sent-114, score-1.082]
46 1 2 According to Equation 10 density ratio rt (x) = p(x|t,s=−1) ∝ p(s=1|x,t) − 1 can be inferred p(x|t,s=1) from p(s = 1|x, t) which is the likelihood that a given x for task t originates from the training distribution, as opposed to from the test distribution. [sent-115, score-0.821]
47 A model of p(s = 1|x, t) can be obtained by discriminating a sample governed by p(x|t, s = 1) against a sample governed by p(x|t, s = −1) using a probabilistic classifier. [sent-116, score-0.378]
48 The labeled pool L 1 over all training examples weighted by rt (x, y) can serve as a sample governed by p(x|t, s = 1); the labels y can be ignored for this step. [sent-118, score-1.295]
49 We model p(s = 1|x, vt ) of Equation 10 with a regularized logistic regression on target variable s with parameters vt (Optimization Problem 2). [sent-120, score-0.653]
50 Labeled examples L are weighted by the estimated first factor rt (x, y) using the outcome of Optimization Problem 1. [sent-121, score-0.547]
51 ˆ1 Optimization Problem 2 For task t: over parameters vt , maximize rt (x, y) log p(s = 1|x, vt ) + ˆ1 log p(s = −1|x, vt ) − x∈Tt (x,y)∈L T vt vt . [sent-122, score-1.079]
52 rt (x) is normalized so that the final weighted empirical distribution ˆ2 1 ˆ1 over the pool sums to one, |L| (x,y)∈L rt (x, y)ˆt (x) = 1. [sent-124, score-1.082]
53 3 Weighted Empirical Loss and Target Model The learning procedure first determines resampling weights rt (x, y) = rt (x, y)ˆt (x) by solving ˆ ˆ1 r2 Optimization Problems 1 and 2. [sent-126, score-0.953]
54 These weights can now be used to reweight the labeled pool over all tasks and train the target model for task t. [sent-127, score-1.085]
55 E(x,y)∼L rt (x, y)ˆt (x) (f (x, t), y) + ˆ1 r2 T wt wt 2 2σw (11) Optimization Problem 3 minimizes Equation 11, the weighted regularized loss over the training 2 data using a standard Gaussian log-prior with variance σw on the parameters wt . [sent-130, score-0.899]
56 Optimization Problem 3 For task t: over parameters wt , minimize 1 |L| rt (x, y)ˆt (x) (f (x, wt ), y) + ˆ1 r2 (x,y)∈L T wt wt . [sent-132, score-0.758]
57 2 2σw In order to train target models for all tasks, instances of Optimization Problems 1 to 3 are solved for each task. [sent-133, score-0.383]
58 4 Targeted Advertising We study the benefit of distribution matching and other reference methods on targeted advertising for four web portals. [sent-134, score-0.673]
59 A small proportion of users is asked to fill out a web questionnaire that collects sociodemographic user profiles. [sent-140, score-0.577]
60 The completion of the questionnaire cannot be enforced and it is therefore not possible to obtain labeled data that is governed by the test distribution of all users that surf the target portal. [sent-143, score-0.988]
61 We add the correct proportion (25%) of users who have taken the survey, and thereby construct a sample that is governed by an approximation of the test distribution. [sent-146, score-0.43]
62 Table 1: Portal statistics: number of accepted, partially rejected, and test examples (mix of all partial reject (=75%) and 25% accept); ratio of male users in training (accept) and test set. [sent-149, score-0.729]
63 portal family TV channel news 1 news 2 # accept 8073 8848 3051 2247 # partial reject 2035 1192 149 143 # test 2713 1589 199 191 % male training 53. [sent-150, score-0.774]
64 0% We compare distribution matching on labeled and unlabeled data (Optimization Problems 1 to 3) and distribution matching only on labeled data by setting rt (x) = 1 in Optimization Problem 3 to ˆ2 the following reference models. [sent-158, score-1.357]
65 The first baseline is a one-size-fits-all model that directly trains a logistic regression on L (setting rt (x, y)ˆt (x) = 1 in Optimization Problem 3). [sent-159, score-0.561]
66 The second baseline ˆ1 r2 is a logistic regression trained only on Lt , the training examples of the target task. [sent-160, score-0.596]
67 Training only on the reweighted target task data and correcting for marginal shift with respect to the unlabeled test data is the third baseline [4]. [sent-161, score-0.735]
68 Training a logistic regression with their feature mapping over training examples from all tasks is equivalent to a joint MAP estimation of all model parameters and the mean of the Gaussian prior. [sent-164, score-0.552]
69 We evaluate the methods using all training examples from non-target tasks and different numbers of training examples of the target task. [sent-165, score-0.815]
70 From all available accept examples of the target task we randomly select a certain number (0-1600) of training examples. [sent-166, score-0.631]
71 From the remaining accept examples of the target task we randomly select an appropriate number and add them to all partial reject examples of the target task so that the evaluation set has the right proportions as described above. [sent-167, score-1.057]
72 We use a logistic loss as the target loss of distribution matching in Optimization Problem 3 and all reference methods. [sent-169, score-0.783]
73 • The inner loop temporarily tunes σw by cross-validation on rescaled L¬t merged with the rescaled current training folds of Lt . [sent-178, score-0.407]
74 In each loop Optimization Problem 3 is solved with fixed rt (x) = 1. [sent-180, score-0.433]
75 Optimization Problem 3 is solved for each outer loop with the temporary σw and with rt (x) = 1. [sent-182, score-0.511]
76 Test data Tt of the target task as well as the weighted pool L (weighted by rt (x, y), ˆ1 based on previously tuned σu ) are split into ten folds. [sent-187, score-1.085]
77 With the nine training folds of the test data and the nine training folds of the weighted pool L, Optimization Problem 2 is solved. [sent-188, score-0.838]
78 data distribution matching on labeled data hierarchical Bayes one-size-fits-all on pool of labeled data training only on lab. [sent-192, score-0.962]
79 56 0 25 50 100 200 400 800 1600 0 25 50 100 200 400 800 1600 training examples for target portal training examples for target portal news 1 news 2 0. [sent-203, score-1.414]
80 72 0 25 50 100 200 400 800 1600 training examples for target portal 0 25 50 100 200 400 800 1600 training examples for target portal Figure 1: Accuracy over different number of training examples for target portal. [sent-209, score-1.751]
81 Error bars indicate the standard error of the differences to distribution matching on labeled data. [sent-210, score-0.407]
82 σv is chosen to maximize the log-likelihood rt (x, y) log p(s = 1|x, vt ) + ˆ1 (x,y)∈Ltune log p(s = −1|x, vt ) tune x∈Tt on the tuning folds of test data and weighted pool (denoted by Ltune and Tttune ) over all ten cross-validation loops. [sent-211, score-1.158]
83 We follow [1] and smooth the estimated weights by rt (x)λ before including ˆ2 them into Optimization Problem 3. [sent-214, score-0.43]
84 Without looking at the test data of the target task we tune η on the non-target tasks so that the accuracy of the distribution matching method is maximized. [sent-216, score-0.848]
85 Finally, σw is tuned by cross-validation on L rescaled by rt (x, y)ˆt (x) (based on the previously ˆ1 r2 tuned parameters σu and σv ). [sent-220, score-0.545]
86 Figure 1 displays the accuracies over different numbers of labeled data for the four different target portals. [sent-222, score-0.487]
87 The error bars are the standard errors of the differences to the distribution matching method on labeled data (solid blue line). [sent-223, score-0.407]
88 For the “family” and “TV channel” portals the distribution matching method on labeled and unlabeled data outperforms all other methods in almost all cases. [sent-224, score-0.708]
89 The distribution matching method on labeled data outperforms the baselines trained only on the data of the target task for all portals and all data set sizes and it is at least as good as the one-size-fits-all model in almost all cases. [sent-225, score-0.963]
90 The hierarchical Bayesian method yields low accuracies for smaller numbers of training examples but becomes comparable to the distribution matching method when training set sizes of the target portal increase. [sent-226, score-1.059]
91 The simple covariate shift model that trains only on labeled and unlabeled data of the target task does not improve over the iid model that only trains on the labeled data of the target task. [sent-227, score-1.382]
92 This indicates that the marginal shift between training and test distributions is small, or could indicate that the approximation of the reject distribution which we use in our experimentation is not sufficiently close. [sent-228, score-0.509]
93 Either reason also explains why accounting for the marginal shift in the distribution matching method does not always improve over distribution matching using only labeled data. [sent-229, score-0.77]
94 Transfer learning by distribution matching passes all examples for all tasks to the underlying logistic regressions. [sent-230, score-0.553]
95 For example, the single task baseline trains only one logistic regression on the examples of the target task. [sent-232, score-0.642]
96 This led to an algorithm that discriminatively estimates these resampling weights by training two simple conditional models. [sent-235, score-0.37]
97 After weighting the pooled examples over all tasks the target model for a specific task can be trained. [sent-236, score-0.65]
98 In our empirical study on targeted advertising, we found that distribution matching using labeled data outperforms all reference methods in almost all cases; the differences are particularly large for small sample sizes. [sent-237, score-0.66]
99 Distribution matching with labeled and unlabeled data outperforms the reference methods and distribution matching with only labeled data in two out of four portals. [sent-238, score-0.933]
100 Even with no labeled data of the target task the performance of the distribution matching method is comparable to training on 1600 examples of the target task for all portals. [sent-239, score-1.327]
wordName wordTfidf (topN-words)
[('rt', 0.366), ('target', 0.245), ('pool', 0.201), ('portal', 0.199), ('portals', 0.199), ('labeled', 0.198), ('tasks', 0.158), ('resampling', 0.157), ('users', 0.155), ('governed', 0.153), ('sociodemographic', 0.152), ('matching', 0.151), ('ut', 0.145), ('targeted', 0.142), ('advertising', 0.133), ('lt', 0.131), ('training', 0.116), ('web', 0.114), ('folds', 0.114), ('vt', 0.113), ('shift', 0.112), ('task', 0.112), ('train', 0.107), ('unlabeled', 0.102), ('tt', 0.101), ('logistic', 0.096), ('reject', 0.095), ('weighted', 0.091), ('examples', 0.09), ('optimization', 0.087), ('test', 0.086), ('transfer', 0.083), ('equation', 0.082), ('loss', 0.079), ('reference', 0.075), ('wt', 0.07), ('tuned', 0.07), ('covariate', 0.07), ('bickel', 0.068), ('accept', 0.068), ('loop', 0.067), ('questionnaires', 0.066), ('campaigns', 0.066), ('fz', 0.066), ('weights', 0.064), ('user', 0.063), ('dxdy', 0.061), ('tv', 0.061), ('gender', 0.061), ('male', 0.059), ('distribution', 0.058), ('news', 0.057), ('questionnaire', 0.057), ('density', 0.055), ('ltune', 0.051), ('sawade', 0.051), ('scheffer', 0.051), ('surfed', 0.051), ('trains', 0.05), ('regression', 0.049), ('zm', 0.046), ('weighting', 0.045), ('originates', 0.044), ('fractions', 0.044), ('differ', 0.044), ('accuracies', 0.044), ('labels', 0.044), ('joint', 0.043), ('marginal', 0.042), ('ratio', 0.042), ('densities', 0.041), ('temporary', 0.041), ('hierarchical', 0.04), ('accepted', 0.04), ('survey', 0.04), ('rescaled', 0.039), ('arbitrarily', 0.038), ('sur', 0.038), ('evgeniou', 0.038), ('tune', 0.038), ('biased', 0.038), ('regularized', 0.037), ('outer', 0.037), ('channel', 0.037), ('sample', 0.036), ('maximize', 0.036), ('collects', 0.036), ('conditionals', 0.036), ('completion', 0.036), ('correcting', 0.036), ('trust', 0.034), ('surveys', 0.034), ('topics', 0.033), ('xm', 0.033), ('discriminatively', 0.033), ('discriminative', 0.033), ('newton', 0.032), ('merged', 0.032), ('devise', 0.032), ('instances', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
2 0.13682148 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation
Author: Adrian Haith, Carl P. Jackson, R. C. Miall, Sethu Vijayakumar
Abstract: Adaptation of visually guided reaching movements in novel visuomotor environments (e.g. wearing prism goggles) comprises not only motor adaptation but also substantial sensory adaptation, corresponding to shifts in the perceived spatial location of visual and proprioceptive cues. Previous computational models of the sensory component of visuomotor adaptation have assumed that it is driven purely by the discrepancy introduced between visual and proprioceptive estimates of hand position and is independent of any motor component of adaptation. We instead propose a unified model in which sensory and motor adaptation are jointly driven by optimal Bayesian estimation of the sensory and motor contributions to perceived errors. Our model is able to account for patterns of performance errors during visuomotor adaptation as well as the subsequent perceptual aftereffects. This unified model also makes the surprising prediction that force field adaptation will elicit similar perceptual shifts, even though there is never any discrepancy between visual and proprioceptive observations. We confirm this prediction with an experiment. 1
3 0.13461322 65 nips-2008-Domain Adaptation with Multiple Sources
Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
4 0.13015816 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
Author: Gabriele Schweikert, Gunnar Rätsch, Christian Widmer, Bernhard Schölkopf
Abstract: We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.
5 0.12184785 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Author: Yi Zhang, Artur Dubrawski, Jeff G. Schneider
Abstract: In this paper, we address the question of what kind of knowledge is generally transferable from unlabeled text. We suggest and analyze the semantic correlation of words as a generally transferable structure of the language and propose a new method to learn this structure using an appropriately chosen latent variable model. This semantic correlation contains structural information of the language space and can be used to control the joint shrinkage of model parameters for any specific task in the same space through regularization. In an empirical study, we construct 190 different text classification tasks from a real-world benchmark, and the unlabeled documents are a mixture from all these tasks. We test the ability of various algorithms to use the mixed unlabeled text to enhance all classification tasks. Empirical results show that the proposed approach is a reliable and scalable method for semi-supervised learning, regardless of the source of unlabeled data, the specific task to be enhanced, and the prediction model used.
6 0.10793016 237 nips-2008-The Recurrent Temporal Restricted Boltzmann Machine
7 0.10777781 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
8 0.10671686 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
9 0.10333227 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
10 0.099111147 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
11 0.09804631 228 nips-2008-Support Vector Machines with a Reject Option
12 0.094448008 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces
13 0.093367793 206 nips-2008-Sequential effects: Superstition or rational behavior?
14 0.092574492 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
15 0.09232638 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
16 0.091395058 169 nips-2008-Online Models for Content Optimization
17 0.088140406 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation
18 0.08647801 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
19 0.084986351 62 nips-2008-Differentiable Sparse Coding
20 0.084765412 113 nips-2008-Kernelized Sorting
topicId topicWeight
[(0, -0.247), (1, -0.043), (2, -0.018), (3, -0.078), (4, -0.069), (5, 0.123), (6, 0.031), (7, 0.09), (8, 0.012), (9, 0.028), (10, 0.044), (11, 0.079), (12, -0.145), (13, -0.102), (14, -0.015), (15, -0.016), (16, -0.08), (17, 0.088), (18, -0.061), (19, 0.164), (20, 0.11), (21, 0.022), (22, -0.113), (23, -0.093), (24, -0.027), (25, 0.091), (26, 0.009), (27, 0.12), (28, 0.014), (29, 0.17), (30, -0.066), (31, -0.02), (32, -0.088), (33, 0.028), (34, 0.003), (35, 0.058), (36, -0.051), (37, -0.015), (38, 0.077), (39, 0.095), (40, 0.073), (41, -0.021), (42, 0.042), (43, -0.025), (44, -0.014), (45, -0.014), (46, -0.054), (47, 0.078), (48, -0.085), (49, 0.094)]
simIndex simValue paperId paperTitle
same-paper 1 0.96825135 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
2 0.74708039 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
Author: Gabriele Schweikert, Gunnar Rätsch, Christian Widmer, Bernhard Schölkopf
Abstract: We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.
3 0.65560007 65 nips-2008-Domain Adaptation with Multiple Sources
Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
4 0.61282724 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation
Author: Adrian Haith, Carl P. Jackson, R. C. Miall, Sethu Vijayakumar
Abstract: Adaptation of visually guided reaching movements in novel visuomotor environments (e.g. wearing prism goggles) comprises not only motor adaptation but also substantial sensory adaptation, corresponding to shifts in the perceived spatial location of visual and proprioceptive cues. Previous computational models of the sensory component of visuomotor adaptation have assumed that it is driven purely by the discrepancy introduced between visual and proprioceptive estimates of hand position and is independent of any motor component of adaptation. We instead propose a unified model in which sensory and motor adaptation are jointly driven by optimal Bayesian estimation of the sensory and motor contributions to perceived errors. Our model is able to account for patterns of performance errors during visuomotor adaptation as well as the subsequent perceptual aftereffects. This unified model also makes the surprising prediction that force field adaptation will elicit similar perceptual shifts, even though there is never any discrepancy between visual and proprioceptive observations. We confirm this prediction with an experiment. 1
5 0.57155639 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification
Author: Doug Downey, Oren Etzioni
Abstract: Is accurate classification possible in the absence of hand-labeled data? This paper introduces the Monotonic Feature (MF) abstraction—where the probability of class membership increases monotonically with the MF’s value. The paper proves that when an MF is given, PAC learning is possible with no hand-labeled data under certain assumptions. We argue that MFs arise naturally in a broad range of textual classification applications. On the classic “20 Newsgroups” data set, a learner given an MF and unlabeled data achieves classification accuracy equal to that of a state-of-the-art semi-supervised learner relying on 160 hand-labeled examples. Even when MFs are not given as input, their presence or absence can be determined from a small amount of hand-labeled data, which yields a new semi-supervised learning method that reduces error by 15% on the 20 Newsgroups data. 1
6 0.53116554 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
7 0.52864343 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
8 0.52781034 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
9 0.52570844 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
10 0.51020753 228 nips-2008-Support Vector Machines with a Reject Option
11 0.49431041 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
12 0.49298367 237 nips-2008-The Recurrent Temporal Restricted Boltzmann Machine
13 0.47013769 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
14 0.44377476 169 nips-2008-Online Models for Content Optimization
15 0.43550763 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
16 0.43456721 41 nips-2008-Breaking Audio CAPTCHAs
17 0.4261674 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
18 0.41446131 124 nips-2008-Load and Attentional Bayes
19 0.41177729 172 nips-2008-Optimal Response Initiation: Why Recent Experience Matters
20 0.41128257 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making
topicId topicWeight
[(6, 0.061), (7, 0.048), (12, 0.016), (15, 0.013), (28, 0.116), (57, 0.047), (63, 0.013), (77, 0.024), (83, 0.57)]
simIndex simValue paperId paperTitle
same-paper 1 0.94291979 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
2 0.94224805 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis
Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1
3 0.91405886 225 nips-2008-Supervised Bipartite Graph Inference
Author: Yoshihiro Yamanishi
Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1
4 0.89775878 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
5 0.88598955 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
Author: Takafumi Kanamori, Shohei Hido, Masashi Sugiyama
Abstract: We address the problem of estimating the ratio of two probability density functions (a.k.a. the importance). The importance values can be used for various succeeding tasks such as non-stationarity adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally very efficient and numerically stable. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bound. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. 1
6 0.76376313 32 nips-2008-Bayesian Kernel Shaping for Learning Control
7 0.62102133 95 nips-2008-Grouping Contours Via a Related Image
8 0.60260946 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
9 0.59861523 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification
10 0.59261405 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
11 0.58375853 194 nips-2008-Regularized Learning with Networks of Features
12 0.57920498 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
13 0.57426393 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
14 0.57191324 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
15 0.57001281 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
16 0.55593443 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
17 0.54449362 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
18 0.54187781 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
19 0.53901607 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
20 0.53397632 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features