nips nips2008 nips2008-65 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. [sent-8, score-0.497]
2 For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. [sent-9, score-0.492]
3 The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. [sent-10, score-0.89]
4 In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. [sent-12, score-0.829]
5 Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. [sent-13, score-1.793]
6 We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. [sent-14, score-0.949]
7 Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset. [sent-15, score-0.515]
8 A typical situation is that of domain adaptation where little or no labeled data is at one’s disposal for the target domain, but large amounts of labeled data from a source domain somewhat similar to the target, or hypotheses derived from that source, are available instead. [sent-19, score-1.162]
9 This paper studies the problem of domain adaptation with multiple sources, which has also received considerable attention in many areas such as natural language processing and speech processing. [sent-21, score-0.614]
10 For each source i ∈ [1, k], the learner receives the distribution Di of the input points corresponding to that source as well as a hypothesis hi with loss at most ǫ on that source. [sent-25, score-1.16]
11 The learner’s task consists of combining the k hypotheses hi , i ∈ [1, k], to derive a hypothesis h with small loss with respect to the target distribution. [sent-26, score-1.31]
12 The target distribution is assumed to be a mixture of the distributions Di . [sent-27, score-0.594]
13 We will discuss both the case where the mixture is known to the learner and the case where it is unknown. [sent-28, score-0.361]
14 An alternative set-up for domain adaptation with multiple sources is one where the learner is not supplied with a good hypothesis hi for each source but where instead he has access to the labeled training data for each source domain. [sent-31, score-1.421]
15 A natural solution consists then of combining the raw labeled data from each source domain to form a new sample more representative of the target distribution and use that to train a learning algorithm. [sent-32, score-0.888]
16 Thirdly, combining labeled data sets requires the mixture parameters of the target distribution to be known, but it is not clear how to produce a hypothesis with a low error rate with respect to any mixture distribution. [sent-39, score-1.373]
17 Few theoretical studies have been devoted to the problem of adaptation with multiple sources. [sent-40, score-0.387]
18 [3] extended the work to give a bound on the error rate of a hypothesis derived from a weighted combination of the source data sets for the specific case of empirical risk minimization. [sent-43, score-0.586]
19 [5, 6] also addressed a problem where multiple sources are present but the nature of the problem differs from adaptation since the distribution of the input points is the same for all these sources, only the labels change due to varying amounts of noise. [sent-45, score-0.562]
20 We are not aware of a prior theoretical study of the problem of adaptation with multiple sources analyzed here. [sent-46, score-0.434]
21 The first type is simply based on convex combinations of the k hypotheses hi . [sent-49, score-0.448]
22 Namely, we give a simple example of two distributions and two matching hypotheses, each with zero error for their respective distribution, but such that any convex combination has expected absolute loss of 1/2 for the equal mixture of the distributions. [sent-51, score-0.651]
23 Namely, the weight of hypothesis hi on an input x is proportional to λi Di (x), were λ is the set of mixture weights. [sent-54, score-0.771]
24 We will refer to this method as the distribution weighted hypothesis combination. [sent-55, score-0.429]
25 Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any mixture of the k distributions. [sent-56, score-1.395]
26 We also show that there exists a distribution weighted combining rule that has loss at most 3ǫ with respect to any consistent target function (one for which each hi has loss ǫ on Di ) and any mixture of the k distributions. [sent-57, score-1.815]
27 In some sense, our results establish that the distribution weighted hypothesis combination is the “right” combination rule, and that it also benefits from a well-founded theoretical guarantee. [sent-58, score-0.562]
28 In Section 3, we analyze the abstract case where the mixture parameters of the target distribution are known and show that the distribution weighted hypothesis combination that uses as weights these mixture coefficients achieves a loss of at most ǫ. [sent-61, score-1.489]
29 In Section 4, we give a simple method to produce an error of Θ(kǫ) that does not require the prior knowledge of the mixture parameters of the target distribution. [sent-62, score-0.511]
30 Our main results showing the existence of a combined hypothesis performing well regardless of the target mixture are given in Section 5 for the case of a fixed target function, and in Section 6 for the case of multiple target functions. [sent-63, score-1.252]
31 Section 7 reports empirical results for a multiple source adaptation problem with a real-world dataset. [sent-64, score-0.539]
32 2 2 Problem Set-Up Let X be the input space, f : X → R the target function to learn, and L : R × R → R a loss function penalizing errors with respect to f . [sent-65, score-0.456]
33 The loss of a hypothesis h with respect to a distribution D and loss function L is denoted by L(D, h, f ) and defined as L(D, h, f ) = Ex∼D [L(h(x), f (x))] = k x∈X L(h(x), f (x))D(x). [sent-66, score-0.627]
34 We consider an adaptation problem with k source domains and a single target domain. [sent-68, score-0.784]
35 The distribution of the target domain, DT , is assumed to be a mixture of the k source distributions Di s, that is k DT (x) = i=1 λi Di (x), for some unknown mixture weight vector λ ∈ ∆. [sent-76, score-1.049]
36 The adaptation problem consists of combing the hypotheses hi s to derive a hypothesis with small loss on the target domain. [sent-77, score-1.336]
37 Since the target distribution DT is assumed to be a mixture, we will refer to this problem as the mixture adaptation problem. [sent-78, score-0.873]
38 A combining rule for the hypotheses takes as an input the hi s and outputs a single hypothesis h : X → R. [sent-79, score-1.054]
39 We define H to be the set of all distribution weighted combining rules. [sent-82, score-0.46]
40 Given the input to the adaptation problem we have implicit information about the target function f . [sent-83, score-0.555]
41 We define the set of consistent target functions, F , as follows, F = {g : ∀i ∈ [1, k], L(Di , hi , g) ≤ ǫ} . [sent-84, score-0.486]
42 3 Known Target Mixture Distribution In this section we assume that the parameters of the target mixture distribution are known. [sent-91, score-0.57]
43 Consider the target function f , where f (a) = 1 and f (b) = 0, and let the loss be the absolute loss. [sent-98, score-0.457]
44 The hypotheses h1 and h0 have zero expected absolute loss on the distributions Da and Db , respectively, i. [sent-100, score-0.424]
45 The hypothesis h(x) = (1/2)h1 (x) + (1/2)h0 (x) always outputs 1/2, and has an absolute loss of 1/2. [sent-104, score-0.456]
46 Furthermore, for any other parameter z of the linear combining rule, the expected absolute loss of h(x) = zh1 (x)+ (1 − z)h0 (x) with respect to DT is exactly 1/2. [sent-105, score-0.544]
47 There is a mixture adaptation problem with ǫ = 0 for which any linear combination rule has expected absolute loss of 1/2. [sent-108, score-1.076]
48 3 Next we show that the distribution weighted combining rule produces a hypothesis with a low exk pected loss. [sent-109, score-0.84]
49 Given a mixture DT (x) = i=1 λi Di (x), we consider the distribution weighted combining rule with parameter λ, which we denote by hλ . [sent-110, score-0.921]
50 Recall that, k hλ (x) = i=1 k λi Di (x) k j=1 λj Dj (x) hi (x) = i=1 λi Di (x) hi (x) . [sent-111, score-0.518]
51 DT (x) Using the convexity of L with respect to the first argument, the loss of hλ with respect to DT and a target f ∈ F can be bounded as follows, k L(DT , hλ , f ) = L(hλ (x), f (x))DT (x) ≤ k λi Di (x)L(hi (x), f (x)) = x∈X i=1 x∈X λi ǫi ≤ ǫ, i=1 where ǫi := L(Di , hi , f ) ≤ ǫ. [sent-112, score-0.733]
52 For any mixture adaptation problem with target distribution Dλ (x) = i=1 λi Di (x), the expected loss of the hypothesis hλ is at most ǫ with respect to any target function f ∈ F: L(Dλ , hλ , f ) ≤ ǫ. [sent-115, score-1.544]
53 4 Simple Adaptation Algorithms In this section we show how to construct a simple distribution weighted hypothesis that has an expected loss guarantee with respect to any mixture. [sent-116, score-0.698]
54 Thus, k hu (x) = i=1 k (1/k)Di (x) k j=1 (1/k)Dj (x) hi (x) = i=1 Di (x) hi (x). [sent-120, score-0.637]
55 k j=1 Dj (x) We show for hu an expected loss bound of kǫ, with respect to any mixture distribution DT and target function f ∈ F. [sent-121, score-0.93]
56 For any mixture adaptation problem the expected loss of hu is at most kǫ, for any mixture distribution DT and target function f ∈ F, i. [sent-124, score-1.474]
57 Unfortunately, the hypothesis hu can have an expected absolute loss as large as Ω(kǫ). [sent-127, score-0.589]
58 There is a mixture adaptation problem for which the expected absolute loss of hu is Ω(kǫ). [sent-130, score-0.973]
59 Also, for k = 2 there is an input to the mixture adaptation problem for which the expected absolute loss of hu is 2ǫ − ǫ2 . [sent-131, score-0.998]
60 5 Existence of a Good Hypothesis In this section, we will show that for any target function f ∈ F there is a distribution weighted combining rule hz that has a loss of at most ǫ with respect to any mixture DT . [sent-132, score-1.534]
61 In the first part, we will show, using a simple reduction to a zero-sum game, that one can obtain a mixture of hz s that guarantees a loss bounded by ǫ. [sent-134, score-0.627]
62 In the second part, which is the more interesting scenario, we will show that for any target function f ∈ F there is a single distribution weighted combining rule hz that has loss of at most ǫ with respect to any mixture DT . [sent-135, score-1.534]
63 1 Zero-sum game The adaptation problem can be viewed as a zero-sum game between two players, NATURE and LEARNER. [sent-138, score-0.389]
64 Let the input to the mixture adaptation problem be D1 , . [sent-139, score-0.612]
65 The player NATURE picks a distribution Di while the player LEARNER selects a distribution weighted combining rule hz ∈ H. [sent-146, score-0.94]
66 The loss when NATURE plays Di and LEARNER plays hz is L(Di , hz , f ). [sent-147, score-0.525]
67 For any target function f ∈ F and any δ > 0, there exists a function h(x) = m j=1 αj hzj (x), where hzi ∈ H, such that L(DT , h, f ) ≤ ǫ + δ for any mixture distribution DT (x) = k i=1 λi Di (x). [sent-161, score-0.613]
68 Since we can fix δ > 0 to be arbitrarily small, this implies that a linear mixture of distribution weighted combining rules can guarantee a loss of almost ǫ with respect to any product distribution. [sent-162, score-0.976]
69 2 Single distribution weighted combining rule In the previous subsection, we showed that a mixture of hypotheses in H would guarantee a loss of at most ǫ. [sent-164, score-1.243]
70 For the proof we will need that the distribution weighted combining rule hz be continuous in the parameter z. [sent-167, score-0.87]
71 To avoid this discontinuity, we will modify the definition of hz to hz , as follows. [sent-169, score-0.364]
72 Let U denote the uniform distribution over X , then for any η > 0 and z ∈ ∆, let hη : X → R be the function defined by z k hη (x) = z i=1 zi Di (x) + ηU (x)/k k j=1 zj Dj (x) + ηU (x) hi (x). [sent-171, score-0.522]
73 We first show that there exists a distribution weighted combining rule hη for which the losses z L(Di , hη , f ) are all nearly the same. [sent-176, score-0.68]
74 For any target function f ∈ F and any η, η ′ > 0, there exists z ∈ ∆, with zi = 0 for all i ∈ [1, k], such that the following holds for the distribution weighted combining rule hη ∈ H: z L(Di , hη , f ) = γ + η ′ − z for any 1 ≤ i ≤ k, where γ = η′ ≤ γ + η′ zi k k η j=1 zj L(Dj , hz , f ). [sent-178, score-1.439]
75 1 In addition to continuity, the perturbation to hz , hη , also helps us ensure that none of the mixture weights z zi is zero in the proof of the Lemma 2 . [sent-188, score-0.615]
76 5 Note that the lemma just presented does not use the structure of the distribution weighted combining rule, but only the fact that the loss is continuous in the parameter z ∈ ∆. [sent-189, score-0.702]
77 The real crux of the argument is, as shown in the next lemma, that γ is small for a distribution weighted combining rule (while it can be very large for a linear combination rule). [sent-191, score-0.682]
78 x∈X zi L(Di , hi , f ) + ηM = i=1 k X zi ǫi + ηM ≤ ǫ + ηM . [sent-204, score-0.503]
79 For any target function f ∈ F and any δ > 0, there exists η > 0 and z ∈ ∆, such that L(Dλ , hη , f ) ≤ ǫ + δ for any mixture parameter λ. [sent-209, score-0.554]
80 z 6 Arbitrary target function The results of the previous section show that for any fixed target function there is a good distribution weighted combining rule. [sent-210, score-0.914]
81 Thus, we seek a single distribution weighted combining rule that can perform well for any f ∈ F and any mixture Dλ . [sent-212, score-0.921]
82 To show this bound we will show that for any f1 , f2 ∈ F and any hypothesis h the difference of loss is bounded by at most 2ǫ. [sent-214, score-0.364]
83 Then for any f, f ′ ∈ F and any mixture DT , the inequality L(DT , h, f ′ ) ≤ L(DT , h, f ) + 2ǫ holds for any hypothesis h. [sent-219, score-0.543]
84 We can bound the term L(Dλ , f, f ′ ) with a similar inequality, L(Dλ , f, f ′ ) ≤ L(Dλ , f, hλ ) + L(Dλ , f ′ , hλ ) ≤ 2ǫ, where hλ is the distribution weighted combining rule produced by choosing z = λ and using Theorem 2. [sent-223, score-0.637]
85 8 1 (b) Figure 1: (a) MSE performance for a target mixture of four domains (1: books, 2: dvd, 3: electronics, 4: kitchen 5: linear, 6: weighted). [sent-253, score-0.669]
86 7 Empirical results This section reports the results of our experiments with a distribution weighted combining rule using real-world data. [sent-255, score-0.661]
87 In our experiments, we fixed a mixture target distribution Dλ and considered the distribution weighted combining rule hz , with z = λ. [sent-256, score-1.389]
88 3 In our experiments, we fixed a mixture target distribution Dλ and considered the distribution weighted combining rule hz , with z = λ. [sent-264, score-1.389]
89 4 We compared our proposed weighted combining rule to the linear combining rule. [sent-267, score-0.812]
90 They show that the base hypotheses perform poorly on the mixture test set, which justifies the need for adaptation. [sent-269, score-0.508]
91 Furthermore, the distribution weighted combining rule is shown to perform at least as well as the worst in-domain performance of a base hypothesis, as expected from our bounds. [sent-270, score-0.726]
92 Finally, we observe that this real-world data experiment gives an example in which a linear combining rule performs poorly compared to the distribution weighted combining rule. [sent-271, score-0.91]
93 In other experiments, we considered the mixture of two domains, where the mixture is varied according to the parameter α ∈ {0. [sent-272, score-0.568]
94 The distribution weighted combining rule outperforms the base hypotheses as well as the linear combining rule. [sent-282, score-1.056]
95 7 Thus, our preliminary experiments suggest that the distribution weighted combining rule performs well in practice and clearly outperforms a simple linear combining rule. [sent-306, score-0.871]
96 Furthermore, using statistical language models as approximations to the distribution oracles seem to be sufficient in practice and can help produce a good distribution weighted combining rule. [sent-307, score-0.594]
97 8 Conclusion We presented a theoretical analysis of the problem of adaptation with multiple sources. [sent-308, score-0.387]
98 Domain adaptation is an important problem that arises in a variety of modern applications where limited or no labeled data is available for a target application and our analysis can be relevant in a variety of situations. [sent-309, score-0.569]
99 The theoretical guarantees proven for the distribution weight combining rule provide it with a strong foundation. [sent-310, score-0.513]
100 Much of the results presented were based on the assumption that the target distribution is some mixture of the source distributions. [sent-312, score-0.741]
wordName wordTfidf (topN-words)
[('adaptation', 0.303), ('di', 0.291), ('mixture', 0.284), ('hi', 0.259), ('combining', 0.234), ('target', 0.227), ('hypothesis', 0.203), ('dt', 0.201), ('hz', 0.182), ('rule', 0.177), ('source', 0.171), ('weighted', 0.167), ('loss', 0.161), ('lz', 0.135), ('hypotheses', 0.133), ('zi', 0.122), ('hu', 0.119), ('brouwer', 0.11), ('dvd', 0.11), ('dz', 0.11), ('domain', 0.11), ('dj', 0.093), ('speech', 0.085), ('domains', 0.083), ('blitzer', 0.083), ('mse', 0.083), ('zj', 0.082), ('learner', 0.077), ('language', 0.075), ('kitchen', 0.075), ('crammer', 0.071), ('koby', 0.071), ('absolute', 0.069), ('jennifer', 0.063), ('distribution', 0.059), ('electronics', 0.057), ('book', 0.057), ('fernando', 0.057), ('lemma', 0.057), ('prague', 0.053), ('republic', 0.053), ('mohri', 0.053), ('base', 0.052), ('sources', 0.047), ('sentiment', 0.047), ('combination', 0.045), ('czech', 0.045), ('unigrams', 0.044), ('theoretical', 0.043), ('existence', 0.043), ('exists', 0.043), ('game', 0.043), ('respect', 0.043), ('rating', 0.043), ('theorem', 0.042), ('obeys', 0.041), ('multiple', 0.041), ('da', 0.04), ('labeled', 0.039), ('poorly', 0.039), ('mansour', 0.039), ('rostamizadeh', 0.039), ('db', 0.039), ('expected', 0.037), ('triangle', 0.036), ('dredze', 0.035), ('svr', 0.035), ('points', 0.034), ('pietra', 0.033), ('books', 0.033), ('inequality', 0.032), ('namely', 0.032), ('player', 0.031), ('convex', 0.031), ('amounts', 0.03), ('della', 0.03), ('courant', 0.03), ('guarantee', 0.028), ('kearns', 0.028), ('lj', 0.028), ('proof', 0.027), ('john', 0.027), ('derive', 0.027), ('acl', 0.027), ('fixed', 0.026), ('dk', 0.026), ('library', 0.026), ('raw', 0.025), ('combinations', 0.025), ('proceedings', 0.025), ('input', 0.025), ('holds', 0.024), ('remarkably', 0.024), ('reports', 0.024), ('distributions', 0.024), ('continuous', 0.024), ('nature', 0.023), ('consists', 0.023), ('outputs', 0.023), ('york', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 65 nips-2008-Domain Adaptation with Multiple Sources
Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
2 0.20407473 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
Author: Gabriele Schweikert, Gunnar Rätsch, Christian Widmer, Bernhard Schölkopf
Abstract: We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.
3 0.18109429 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging
Author: Ofer Dekel
Abstract: We present cutoff averaging, a technique for converting any conservative online learning algorithm into a batch learning algorithm. Most online-to-batch conversion techniques work well with certain types of online learning algorithms and not with others, whereas cutoff averaging explicitly tries to adapt to the characteristics of the online algorithm being converted. An attractive property of our technique is that it preserves the efficiency of the original online algorithm, making it appropriate for large-scale learning problems. We provide a statistical analysis of our technique and back our theoretical claims with experimental results. 1
4 0.17957172 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation
Author: Adrian Haith, Carl P. Jackson, R. C. Miall, Sethu Vijayakumar
Abstract: Adaptation of visually guided reaching movements in novel visuomotor environments (e.g. wearing prism goggles) comprises not only motor adaptation but also substantial sensory adaptation, corresponding to shifts in the perceived spatial location of visual and proprioceptive cues. Previous computational models of the sensory component of visuomotor adaptation have assumed that it is driven purely by the discrepancy introduced between visual and proprioceptive estimates of hand position and is independent of any motor component of adaptation. We instead propose a unified model in which sensory and motor adaptation are jointly driven by optimal Bayesian estimation of the sensory and motor contributions to perceived errors. Our model is able to account for patterns of performance errors during visuomotor adaptation as well as the subsequent perceptual aftereffects. This unified model also makes the surprising prediction that force field adaptation will elicit similar perceptual shifts, even though there is never any discrepancy between visual and proprioceptive observations. We confirm this prediction with an experiment. 1
5 0.13461322 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
6 0.12639052 24 nips-2008-An improved estimator of Variance Explained in the presence of noise
7 0.11695637 103 nips-2008-Implicit Mixtures of Restricted Boltzmann Machines
8 0.1079771 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization
9 0.10216585 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
10 0.10202099 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
11 0.090313651 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
12 0.088817671 178 nips-2008-Performance analysis for L\ 2 kernel classification
13 0.081401646 189 nips-2008-Rademacher Complexity Bounds for Non-I.I.D. Processes
14 0.076964416 15 nips-2008-Adaptive Martingale Boosting
15 0.07568939 194 nips-2008-Regularized Learning with Networks of Features
16 0.072895333 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
17 0.072634131 228 nips-2008-Support Vector Machines with a Reject Option
18 0.070718996 239 nips-2008-Tighter Bounds for Structured Estimation
19 0.06927482 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
20 0.069018938 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
topicId topicWeight
[(0, -0.223), (1, -0.01), (2, -0.06), (3, 0.001), (4, -0.107), (5, 0.085), (6, -0.003), (7, 0.042), (8, 0.026), (9, 0.049), (10, 0.063), (11, 0.251), (12, -0.01), (13, 0.06), (14, -0.066), (15, 0.009), (16, -0.005), (17, 0.119), (18, -0.189), (19, 0.129), (20, 0.228), (21, 0.051), (22, -0.036), (23, 0.06), (24, 0.005), (25, 0.073), (26, -0.028), (27, 0.001), (28, 0.041), (29, 0.238), (30, 0.047), (31, 0.033), (32, -0.15), (33, 0.059), (34, 0.128), (35, -0.016), (36, 0.014), (37, -0.01), (38, -0.026), (39, -0.026), (40, -0.063), (41, -0.034), (42, 0.082), (43, -0.068), (44, 0.106), (45, -0.129), (46, -0.04), (47, 0.099), (48, 0.08), (49, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.97911459 65 nips-2008-Domain Adaptation with Multiple Sources
Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
2 0.76228011 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
Author: Gabriele Schweikert, Gunnar Rätsch, Christian Widmer, Bernhard Schölkopf
Abstract: We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.
3 0.61106926 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation
Author: Adrian Haith, Carl P. Jackson, R. C. Miall, Sethu Vijayakumar
Abstract: Adaptation of visually guided reaching movements in novel visuomotor environments (e.g. wearing prism goggles) comprises not only motor adaptation but also substantial sensory adaptation, corresponding to shifts in the perceived spatial location of visual and proprioceptive cues. Previous computational models of the sensory component of visuomotor adaptation have assumed that it is driven purely by the discrepancy introduced between visual and proprioceptive estimates of hand position and is independent of any motor component of adaptation. We instead propose a unified model in which sensory and motor adaptation are jointly driven by optimal Bayesian estimation of the sensory and motor contributions to perceived errors. Our model is able to account for patterns of performance errors during visuomotor adaptation as well as the subsequent perceptual aftereffects. This unified model also makes the surprising prediction that force field adaptation will elicit similar perceptual shifts, even though there is never any discrepancy between visual and proprioceptive observations. We confirm this prediction with an experiment. 1
4 0.60753894 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
5 0.49076501 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging
Author: Ofer Dekel
Abstract: We present cutoff averaging, a technique for converting any conservative online learning algorithm into a batch learning algorithm. Most online-to-batch conversion techniques work well with certain types of online learning algorithms and not with others, whereas cutoff averaging explicitly tries to adapt to the characteristics of the online algorithm being converted. An attractive property of our technique is that it preserves the efficiency of the original online algorithm, making it appropriate for large-scale learning problems. We provide a statistical analysis of our technique and back our theoretical claims with experimental results. 1
6 0.47382328 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
7 0.47197413 52 nips-2008-Correlated Bigram LSA for Unsupervised Language Model Adaptation
8 0.41128924 124 nips-2008-Load and Attentional Bayes
9 0.40949082 38 nips-2008-Bio-inspired Real Time Sensory Map Realignment in a Robotic Barn Owl
10 0.4027659 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
11 0.39781564 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
12 0.37676299 29 nips-2008-Automatic online tuning for fast Gaussian summation
13 0.37419349 211 nips-2008-Simple Local Models for Complex Dynamical Systems
14 0.36990905 228 nips-2008-Support Vector Machines with a Reject Option
15 0.35288963 35 nips-2008-Bayesian Synchronous Grammar Induction
16 0.3485705 24 nips-2008-An improved estimator of Variance Explained in the presence of noise
17 0.34665805 111 nips-2008-Kernel Change-point Analysis
18 0.3338244 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
19 0.3212074 41 nips-2008-Breaking Audio CAPTCHAs
20 0.31963897 178 nips-2008-Performance analysis for L\ 2 kernel classification
topicId topicWeight
[(6, 0.621), (7, 0.036), (12, 0.017), (28, 0.093), (57, 0.044), (63, 0.017), (77, 0.031), (83, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.95758569 65 nips-2008-Domain Adaptation with Multiple Sources
Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.
2 0.92489666 43 nips-2008-Cell Assemblies in Large Sparse Inhibitory Networks of Biologically Realistic Spiking Neurons
Author: Adam Ponzi, Jeff Wickens
Abstract: Cell assemblies exhibiting episodes of recurrent coherent activity have been observed in several brain regions including the striatum[1] and hippocampus CA3[2]. Here we address the question of how coherent dynamically switching assemblies appear in large networks of biologically realistic spiking neurons interacting deterministically. We show by numerical simulations of large asymmetric inhibitory networks with fixed external excitatory drive that if the network has intermediate to sparse connectivity, the individual cells are in the vicinity of a bifurcation between a quiescent and firing state and the network inhibition varies slowly on the spiking timescale, then cells form assemblies whose members show strong positive correlation, while members of different assemblies show strong negative correlation. We show that cells and assemblies switch between firing and quiescent states with time durations consistent with a power-law. Our results are in good qualitative agreement with the experimental studies. The deterministic dynamical behaviour is related to winner-less competition[3], shown in small closed loop inhibitory networks with heteroclinic cycles connecting saddle-points. 1
3 0.90288162 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
Author: Julia Owen, Hagai T. Attias, Kensuke Sekihara, Srikantan S. Nagarajan, David P. Wipf
Abstract: The synchronous brain activity measured via MEG (or EEG) can be interpreted as arising from a collection (possibly large) of current dipoles or sources located throughout the cortex. Estimating the number, location, and orientation of these sources remains a challenging task, one that is significantly compounded by the effects of source correlations and the presence of interference from spontaneous brain activity, sensor noise, and other artifacts. This paper derives an empirical Bayesian method for addressing each of these issues in a principled fashion. The resulting algorithm guarantees descent of a cost function uniquely designed to handle unknown orientations and arbitrary correlations. Robust interference suppression is also easily incorporated. In a restricted setting, the proposed method is shown to have theoretically zero bias estimating both the location and orientation of multi-component dipoles even in the presence of correlations, unlike a variety of existing Bayesian localization methods or common signal processing techniques such as beamforming and sLORETA. Empirical results on both simulated and real data sets verify the efficacy of this approach. 1
4 0.88487685 214 nips-2008-Sparse Online Learning via Truncated Gradient
Author: John Langford, Lihong Li, Tong Zhang
Abstract: We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1 -regularization method in the batch setting. We prove small rates of sparsification result in only small additional regret with respect to typical online-learning guarantees. Finally, the approach works well empirically. We apply it to several datasets and find for datasets with large numbers of features, substantial sparsity is discoverable. 1
5 0.81522655 178 nips-2008-Performance analysis for L\ 2 kernel classification
Author: Jooseuk Kim, Clayton Scott
Abstract: We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the L2 or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the L2 kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a cross-validation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results also specialize to give performance guarantees for an existing method of L2 kernel density estimation. 1
6 0.71760446 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
7 0.70475549 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
8 0.64646822 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization
9 0.58943707 202 nips-2008-Robust Regression and Lasso
10 0.57961851 75 nips-2008-Estimating vector fields using sparse basis field expansions
11 0.56203872 182 nips-2008-Posterior Consistency of the Silverman g-prior in Bayesian Model Choice
12 0.55903739 62 nips-2008-Differentiable Sparse Coding
13 0.55568999 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging
14 0.55489552 85 nips-2008-Fast Rates for Regularized Objectives
15 0.55369097 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
16 0.54985267 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
17 0.54944265 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization
18 0.54792982 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
19 0.54456174 228 nips-2008-Support Vector Machines with a Reject Option
20 0.53139931 210 nips-2008-Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms