nips nips2012 nips2012-308 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf
Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. [sent-7, score-0.043]
2 The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. [sent-8, score-0.063]
3 Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. [sent-9, score-0.019]
4 Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. [sent-10, score-0.04]
5 Domain adaptation methods are concerned about what knowledge we can share between different tasks, how we can transfer this knowledge and when we should do it or not to avoid additional damage [4]. [sent-21, score-0.014]
6 In this work, we study semi-supervised domain adaptation for regression tasks. [sent-22, score-0.018]
7 In these problems, the object of interest (the mechanism that maps a set of inputs to a set of outputs) can be stated as a conditional density function. [sent-23, score-0.013]
8 Firstly intro1 duced by Sklar [22], copulas have been successfully used in a wide range of applications, including finance, time series or natural phenomena modeling [12]. [sent-30, score-0.027]
9 Recently, a new family of copulas named vines have gained interest in the statistics literature [1]. [sent-31, score-0.041]
10 These are methods that factorize multivariate densities into a product of marginal distributions and bivariate copula functions. [sent-32, score-0.053]
11 First, we propose a non-parametric vine copula model which can be used as a high-dimensional density estimator. [sent-35, score-0.051]
12 Second, by making use of this method, we present a new framework to address semi-supervised domain adaptation problems, which performance is validated in a series of experiments with real-world data and competing state-of-the-art techniques. [sent-36, score-0.02]
13 The rest of the paper is organized as follows: Section 2 provides a brief introduction to copulas, and describes a non-parametric estimator for the bivariate case. [sent-37, score-0.015]
14 Section 3 introduces a novel nonparametric vine copula model, which is formed by the described bivariate non-parametric copulas. [sent-38, score-0.056]
15 Section 4 describes a new framework to address semi-supervised domain adaptation problems using the proposed vine method. [sent-39, score-0.029]
16 , xd ) are jointly independent, their density function p(x) can be written as d p(xi ) . [sent-44, score-0.02]
17 This function is called the copula of p(x) [18] and satisfies d p(xi ) c(P (x1 ), . [sent-53, score-0.03]
18 p(x) = i=1 (2) copula The copula c is the joint density of P (x1 ), . [sent-57, score-0.071]
19 Therefore, the copula captures any distributional pattern that does not depend on their specific form, or, in other words, all the information regarding the dependencies between x1 , . [sent-69, score-0.03]
20 , P (xd ) are continuous, the copula c is unique [22]. [sent-76, score-0.03]
21 However, infinitely many multivariate models share the same underlying copula function, as illustrated in Figure 1. [sent-77, score-0.033]
22 The main advantage of copulas is that they allow us to model separately the marginal distributions and the dependencies linking them together to produce the multivariate model subject of study. [sent-78, score-0.034]
23 The transformed data are then used to obtain an estimate c for the copula of p(x). [sent-88, score-0.03]
24 ˆ ˆ ˆ p(x) = ˆ (3) i=1 The estimation of marginal pdfs and cdfs can be implemented in a non-parametric manner by using unidimensional kernel density estimates. [sent-93, score-0.021]
25 By contrast, it is common practice to assume a parametric model for the estimation of the copula function. [sent-94, score-0.032]
26 Some examples of parametric copulas are Gaussian, Gumbel, Frank, Clayton or Student copulas [18]. [sent-95, score-0.052]
27 Nevertheless, real-world data often exhibit complex dependencies which cannot be correctly described by these parametric copula models. [sent-96, score-0.032]
28 This lack of flexibility of parametric copulas is illustrated in Figure 2. [sent-97, score-0.027]
29 0 Figure 1: Left, sample from a Gaussian copula with correlation ρ = 0. [sent-114, score-0.03]
30 Middle and right, two samples drawn from multivariate models with this same copula but different marginal distributions, depicted as rug plots. [sent-116, score-0.036]
31 00 100 100 75 75 50 50 25 25 0 0 0 25 50 75 100 0 25 50 75 100 Figure 2: Left, sample from the copula linking variables 4 and 11 in the W IRELESS dataset. [sent-127, score-0.031]
32 Middle, density estimate generated by a Gaussian copula model when fitted to the data. [sent-128, score-0.041]
33 Right, copula density estimate generated by the non-parametric method described in section 2. [sent-130, score-0.041]
34 to approximate the copula function in a non-parametric manner. [sent-132, score-0.03]
35 Kernel density estimates can also be used to generate non-parametric approximations of copulas, as described in [8]. [sent-133, score-0.014]
36 1 Non-parametric Bivariate Copulas We now elaborate on how to non-parametrically estimate the copula of a given bivariate density p(x, y). [sent-136, score-0.055]
37 Recall that this density can be factorized as the product of its marginals and its copula p(x, y) = p(x) p(y) c(P (x), P (y)). [sent-137, score-0.045]
38 (4) {(xi , yi )}n i=1 Additionally, given a sample from p(x, y), we can obtain a pseudo-sample from its copula c by mapping each observation to the unit square using estimates of the marginal cdfs, namely ˆ ˆ {(ui , vi )}n := {(P (xi ), P (yi ))}n . [sent-138, score-0.038]
39 i=1 i=1 (5) These are approximate observations from the uniformly distributed random variables u = P (x) and v = P (y), whose joint density is the copula function c(u, v). [sent-139, score-0.041]
40 We could try to approximate this density function by placing Gaussian kernels on each observation ui and vi . [sent-140, score-0.017]
41 (6) The copula of this new density is identical to the copula of (4), since the performed transformations are marginal-wise. [sent-146, score-0.071]
42 Then, p(z, w) = ˆ 1 n n N (z, w|zi , wi , Σ), (7) i=1 where N (·, ·|ν1 , ν2 , Σ) is a two-dimensional Gaussian density with mean (ν1 , ν2 ) and covariance matrix Σ. [sent-149, score-0.014]
43 Finally, the copula density c(u, v) is approximated by combining (6) with (7): n p(Φ−1 (u), Φ−1 (v)) ˆ 1 N (Φ−1 (u), Φ−1 (v)|Φ−1 (ui ), Φ−1 (vi ), Σ) c(u, v) = ˆ = . [sent-151, score-0.041]
44 (8) φ(Φ−1 (u))φ(Φ−1 (v)) n i=1 φ(Φ−1 (u))φ(Φ−1 (v)) 3 Regular Vines The method described above can be generalized to the estimation of copulas of more than two random variables. [sent-152, score-0.025]
45 However, although kernel density estimates can be successful in spaces of one or two dimensions, as the number of variables increases, this methods start to be significantly affected by the curse of dimensionality and tend to overfit to the training data. [sent-153, score-0.019]
46 Additionally, for addressing domain adaptation problems, we are interested in factorizing these high-dimensional copulas into simpler building blocks transferrable accross learning domains. [sent-154, score-0.048]
47 These two drawbacks can be addressed by recent methods in copula modelling called vines [1]. [sent-155, score-0.046]
48 Vines decompose any high-dimensional copula density as a product of bivariate copula densities that can be approximated using the nonparametric model described above. [sent-156, score-0.085]
49 These bivariate copulas (as well as the marginals) correspond to the simple building blocks that we plan to transfer from one learning domain to another. [sent-157, score-0.051]
50 Different types of vines have been proposed in the literature. [sent-158, score-0.016]
51 Some examples are canonical vines, D-vines or regular vines [16, 1]. [sent-159, score-0.021]
52 In this work we focus on regular vines (R-vines) since they are the most general models. [sent-160, score-0.021]
53 In particular, each of the edges in the trees from V specify a different conditional copula density in (10). [sent-193, score-0.047]
54 Changes in each of these factors can be detected and independently transferred accross different learning domains to improve the estimation of the target density function. [sent-195, score-0.025]
55 Later, each edge in bold will correspond to a different bivariate copula function. [sent-200, score-0.047]
56 One major advantage of vines is that they can model high-dimensional data by estimating density functions of only one or two random variables. [sent-202, score-0.027]
57 For this reason, these techniques are significantly less affected by the curse of dimensionality than regular density estimators based on kernels, as we show in Section 5. [sent-203, score-0.02]
58 So far Vines have been generally constructed using parametric models for the estimation of bivariate copulas. [sent-204, score-0.016]
59 1 Non-parametric Regular Vines In this section, we introduce a vine distribution in which all participant bivariate copulas can be estimated in a non-parametric manner. [sent-207, score-0.049]
60 Todo so, we model each of the copulas in (10) using the nonparametric method described in Section 2. [sent-208, score-0.025]
61 Let {(ui , vi )}n be a sample from the copula density i=1 c(u, v). [sent-210, score-0.043]
62 We have a total of d(d − 1)/2 bivariate copulas 5 which should be distributed among the different trees. [sent-224, score-0.039]
63 Ideally, we would like to include in the first trees of the hierarchy the copulas with strongest dependence level. [sent-225, score-0.03]
64 This will allow us to prune the model by assuming independence in the last k < d trees, since the density function for the independent copula is constant and equal to 1. [sent-226, score-0.041]
65 4 Domain Adaptation with Regular Vines In this section we describe how regular vines can be used to address domain adaptation problems in the non-linear regression setting with continuous data. [sent-234, score-0.04]
66 In regression problems, we are interested in inferring the mapping mechanism or conditional distribution with density p(y|x) that maps one feature vector x = (x1 , . [sent-236, score-0.014]
67 Rephrased into the copula framework, this conditional density can be expressed as d p(y|x) ∝ p(y) cjk|D(e) (13) i=1 e(j,k)∈Ei where E1 , . [sent-240, score-0.043]
68 In the classic domain adaptation setup we usually have large amounts of data for solving a source task characterized by the density function ps (x, y). [sent-245, score-0.039]
69 However, only a partial or reduced sample is available for solving a target task with density pt (x, y). [sent-246, score-0.022]
70 Given the data available for both tasks, our objective is to build a good estimate for the conditional density pt (y|x). [sent-247, score-0.017]
71 To address this domain adaptation problem, we assume that pt is a modified version of ps . [sent-248, score-0.027]
72 First, ps is expressed using an R-vine representation as in (10) and second, some of the factors included in that representation (marginal distributions or pairwise copulas) are modified to derive pt . [sent-250, score-0.013]
73 All we need to address the adaptation across domains is to reconstruct the R-vine representation of ps using data from the source task, and then identify which of the factors have been modified to produce pt . [sent-251, score-0.03]
74 , d, or Ps (y) = Pt (y), and we need to re-generate the estimates of the affected marginals using data from the target task. [sent-258, score-0.014]
75 Additionally, some of the bivariate copulas cjk|D(e) may differ from source to target tasks. [sent-259, score-0.048]
76 In this case, we also re-estimate the affected copulas using data from the target task. [sent-260, score-0.032]
77 Simultaneous changes in both copulas and marginals can occur. [sent-261, score-0.03]
78 Finally, if some of the factors remain constant across domains, we can use the available data from the target task to improve the estimates obtained using only the data from the source task. [sent-263, score-0.016]
79 Specifically, extra unlabeled target task data can be used to refine the factors in the R-Vine decomposition of pt which do not depend on y. [sent-310, score-0.015]
80 This is still valid even in the limiting case of not having access to labeled data from the target task at training time (unsupervised domain adaptation). [sent-311, score-0.013]
81 The first series illustrates the accuracy of the density estimates generated by the proposed non-parametric vine method. [sent-313, score-0.026]
82 The second series validates the effectiveness of the proposed framework for domain adaptation problems in the non-linear regression setting. [sent-314, score-0.02]
83 For comparative purposes, we include the results of different state-of-the-art domain adaptation methods whose parameters are selected by a 10-fold cross validation process on the training data. [sent-316, score-0.017]
84 Approximations: A complete R-Vine requires the use of conditional copula functions, which are challenging to learn. [sent-317, score-0.032]
85 A common approximation is to ignore any dependence between the copula functional form and its set of conditioning variables. [sent-318, score-0.035]
86 Note that the copula functions arguments remain to be conditioned cdfs. [sent-319, score-0.032]
87 1 Accuracy of Non-parametric Regular Vines for Density Estimation The density estimates generated by the new non-parametric R-vine method (NPRV) are evaluated on data from six normalized UCI datasets [9]. [sent-323, score-0.014]
88 We compare against a standard density estimator based on Gaussian kernels (KDE), and a parametric vine method based on bivariate Gaussian copulas (GRV). [sent-324, score-0.062]
89 2 Comparison with other Domain Adaptation Methods NPRV is analyzed in a series of experiments for domain adaptation on the non-linear regression setting with real-world data. [sent-331, score-0.02]
90 They are two gaussian process (GP) methods, the first one trained only with data from the source task, and the second one trained with the normalized union of data from both source and target problems. [sent-335, score-0.013]
91 The other five methods are considered state-of-the-art domain adaptation techniques. [sent-336, score-0.017]
92 KMM [11] minimizes the distance of marginal distributions in source and target domains by matching their means when mapped into an universal RKHS. [sent-458, score-0.017]
93 For training, we randomly sample 1000 data points for both source and target tasks, where all the data in the source task and 5% of the data in the target task are labeled. [sent-461, score-0.022]
94 Finally, the two bottom rows in Table 2 show the average number of marginals and bivariate copulas which are updated in each dataset during the execution of NPRV, respectively. [sent-466, score-0.043]
95 Parametric copulas may be used to reduce the computational demands. [sent-473, score-0.025]
96 6 Conclusions We have proposed a novel non-parametric domain adaptation strategy based on copulas. [sent-474, score-0.017]
97 The new approach works by decomposing any multivariate density into a product of marginal densities and bivariate copula functions. [sent-475, score-0.061]
98 Changes in these factors across different domains can be detected using two sample tests, and transferred across domains in order to adapt the target task density model. [sent-476, score-0.027]
99 This technique leads to better density estimates than standard parametric vines or KDE, and is also able to outperform a large number of alternative domain adaptation methods in a collection of regression problems with real-world data. [sent-478, score-0.05]
100 Families of m-variate distributions with given margins and m(m − 1)/2 bivariate dependence parameters. [sent-564, score-0.018]
wordName wordTfidf (topN-words)
[('qq', 0.898), ('qqq', 0.337), ('qqqq', 0.205), ('qqqqq', 0.125), ('qqqqqqq', 0.085), ('qqqqqq', 0.074), ('qqqqqqqq', 0.055), ('qqqqqqqqq', 0.04), ('copula', 0.03), ('copulas', 0.025), ('qqqqqqqqqq', 0.024), ('vines', 0.016), ('nprv', 0.014), ('bivariate', 0.014), ('qqqqqqqqqqq', 0.013), ('qqqqqqqqqqqqqqq', 0.011), ('adaptation', 0.011), ('density', 0.011), ('cjk', 0.01), ('vine', 0.01), ('xd', 0.009), ('qqqqqqqqqqqq', 0.009), ('qqqqqqqqqqqqq', 0.009), ('qqqqqqqqqqqqqq', 0.008), ('qqqqqqqqqqqqqqqq', 0.008), ('domain', 0.006), ('target', 0.005), ('ps', 0.005), ('regular', 0.005), ('pt', 0.004), ('marginals', 0.004), ('cdf', 0.004), ('source', 0.004), ('kde', 0.004), ('ui', 0.004), ('cdfs', 0.004), ('marginal', 0.003), ('conditioning', 0.003), ('ei', 0.003), ('transfer', 0.003), ('accross', 0.003), ('grv', 0.003), ('qqqqqqqqqqqqqqqqqq', 0.003), ('qqqqqqqqqqqqqqqqqqqqq', 0.003), ('unprv', 0.003), ('wi', 0.003), ('zi', 0.003), ('daume', 0.003), ('trees', 0.003), ('edge', 0.003), ('multivariate', 0.003), ('estimates', 0.003), ('domains', 0.003), ('mmd', 0.002), ('parametric', 0.002), ('dependence', 0.002), ('vi', 0.002), ('uci', 0.002), ('tree', 0.002), ('pj', 0.002), ('td', 0.002), ('factors', 0.002), ('atgp', 0.002), ('kmm', 0.002), ('qqqqqqqqqqqqqqqqq', 0.002), ('qqqqqqqqqqqqqqqqqqqq', 0.002), ('qqqqqqqqqqqqqqqqqqqqqqqq', 0.002), ('task', 0.002), ('corrected', 0.002), ('conditional', 0.002), ('curse', 0.002), ('xi', 0.002), ('frustratingly', 0.002), ('kurowicka', 0.002), ('affected', 0.002), ('language', 0.002), ('distributions', 0.002), ('unlabeled', 0.002), ('blocks', 0.002), ('pdfs', 0.002), ('series', 0.002), ('conditioned', 0.002), ('nmse', 0.002), ('mpi', 0.002), ('borgwardt', 0.002), ('formed', 0.002), ('changes', 0.001), ('linking', 0.001), ('quantile', 0.001), ('transferred', 0.001), ('address', 0.001), ('correcting', 0.001), ('bonilla', 0.001), ('kernel', 0.001), ('regression', 0.001), ('building', 0.001), ('describes', 0.001), ('factorize', 0.001), ('joining', 0.001), ('edges', 0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas
Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf
Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1
2 0.89288235 310 nips-2012-Semiparametric Principal Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
3 0.54677123 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning
Author: Liping Liu, Thomas G. Dietterich
Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1
4 0.39111444 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting
Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey
Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1
5 0.023539983 211 nips-2012-Meta-Gaussian Information Bottleneck
Author: Melanie Rey, Volker Roth
Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1
6 0.021983845 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
7 0.016972804 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
8 0.0090040537 142 nips-2012-Generalization Bounds for Domain Adaptation
9 0.0068706064 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
10 0.0054920288 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video
11 0.0052863602 351 nips-2012-Transelliptical Component Analysis
12 0.004232341 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests
13 0.0038358339 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
14 0.0035419676 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
15 0.0035138559 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
16 0.0034325784 264 nips-2012-Optimal kernel choice for large-scale two-sample tests
17 0.0033528619 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms
18 0.0031045079 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation
19 0.0029888032 187 nips-2012-Learning curves for multi-task Gaussian process regression
20 0.0028343203 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data
topicId topicWeight
[(0, 0.026), (1, 0.024), (2, 0.018), (3, -0.012), (4, -0.007), (5, -0.006), (6, 0.901), (7, -0.058), (8, -0.042), (9, 0.028), (10, 0.068), (11, -0.032), (12, -0.039), (13, 0.037), (14, 0.022), (15, 0.028), (16, -0.034), (17, -0.003), (18, 0.019), (19, -0.023), (20, 0.02), (21, 0.007), (22, 0.018), (23, -0.039), (24, -0.003), (25, 0.024), (26, 0.023), (27, 0.005), (28, 0.013), (29, 0.006), (30, 0.004), (31, -0.028), (32, -0.018), (33, 0.028), (34, 0.012), (35, 0.01), (36, 0.004), (37, 0.016), (38, -0.007), (39, -0.001), (40, 0.016), (41, -0.008), (42, -0.007), (43, 0.008), (44, -0.011), (45, -0.001), (46, 0.005), (47, -0.006), (48, 0.019), (49, -0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.9994747 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas
Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf
Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1
2 0.97954756 310 nips-2012-Semiparametric Principal Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
3 0.78774041 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning
Author: Liping Liu, Thomas G. Dietterich
Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1
4 0.78039545 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting
Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey
Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1
5 0.084852569 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification
Author: Yao-nan Chen, Hsuan-tien Lin
Abstract: Label space dimension reduction (LSDR) is an efficient and effective paradigm for multi-label classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many real-world datasets. 1
6 0.057279844 211 nips-2012-Meta-Gaussian Information Bottleneck
7 0.055671096 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss
8 0.043135021 256 nips-2012-On the connections between saliency and tracking
9 0.042317174 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing
10 0.039468583 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
11 0.038599595 280 nips-2012-Proper losses for learning from partial labels
12 0.038500961 142 nips-2012-Generalization Bounds for Domain Adaptation
13 0.034942131 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
14 0.033333965 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent
15 0.031986531 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification
16 0.030969856 351 nips-2012-Transelliptical Component Analysis
17 0.030957879 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification
18 0.028914476 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis
19 0.028391792 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms
20 0.027073113 194 nips-2012-Learning to Discover Social Circles in Ego Networks
topicId topicWeight
[(15, 0.57), (38, 0.019), (39, 0.015), (55, 0.025), (74, 0.015), (76, 0.046), (80, 0.018), (92, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.95210719 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas
Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf
Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1
2 0.54192793 31 nips-2012-Action-Model Based Multi-agent Plan Recognition
Author: Hankz H. Zhuo, Qiang Yang, Subbarao Kambhampati
Abstract: Multi-Agent Plan Recognition (MAPR) aims to recognize dynamic team structures and team behaviors from the observed team traces (activity sequences) of a set of intelligent agents. Previous MAPR approaches required a library of team activity sequences (team plans) be given as input. However, collecting a library of team plans to ensure adequate coverage is often difficult and costly. In this paper, we relax this constraint, so that team plans are not required to be provided beforehand. We assume instead that a set of action models are available. Such models are often already created to describe domain physics; i.e., the preconditions and effects of effects actions. We propose a novel approach for recognizing multi-agent team plans based on such action models rather than libraries of team plans. We encode the resulting MAPR problem as a satisfiability problem and solve the problem using a state-of-the-art weighted MAX-SAT solver. Our approach also allows for incompleteness in the observed plan traces. Our empirical studies demonstrate that our algorithm is both effective and efficient in comparison to state-of-the-art MAPR methods based on plan libraries. 1
3 0.34322435 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines
Author: Geoffrey E. Hinton, Ruslan Salakhutdinov
Abstract: We describe how the pretraining algorithm for Deep Boltzmann Machines (DBMs) is related to the pretraining algorithm for Deep Belief Networks and we show that under certain conditions, the pretraining procedure improves the variational lower bound of a two-hidden-layer DBM. Based on this analysis, we develop a different method of pretraining DBMs that distributes the modelling work more evenly over the hidden layers. Our results on the MNIST and NORB datasets demonstrate that the new pretraining algorithm allows us to learn better generative models. 1
4 0.22530665 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity
Author: Odalric-ambrym Maillard
Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a fixed partition P of a continuous space X being given, and a process ν defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis. 1
5 0.13893975 310 nips-2012-Semiparametric Principal Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
6 0.12582283 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
7 0.10301466 95 nips-2012-Density-Difference Estimation
8 0.1022541 215 nips-2012-Minimizing Uncertainty in Pipelines
9 0.10128675 211 nips-2012-Meta-Gaussian Information Bottleneck
10 0.10097964 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
11 0.10028389 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks
12 0.099486105 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
13 0.099141479 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
14 0.098244451 188 nips-2012-Learning from Distributions via Support Measure Machines
15 0.097466178 210 nips-2012-Memorability of Image Regions
16 0.09707471 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection
17 0.09706369 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
18 0.097049743 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition
19 0.096683599 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images
20 0.096400857 352 nips-2012-Transelliptical Graphical Models