nips nips2012 nips2012-308 knowledge-graph by maker-knowledge-mining

308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Source: pdf

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efﬁcacy of the proposed approach when compared to state-of-the-art techniques. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. [sent-7, score-0.043]

2 The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. [sent-8, score-0.063]

3 Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. [sent-9, score-0.019]

4 Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. [sent-10, score-0.04]

5 Domain adaptation methods are concerned about what knowledge we can share between different tasks, how we can transfer this knowledge and when we should do it or not to avoid additional damage [4]. [sent-21, score-0.014]

6 In this work, we study semi-supervised domain adaptation for regression tasks. [sent-22, score-0.018]

7 In these problems, the object of interest (the mechanism that maps a set of inputs to a set of outputs) can be stated as a conditional density function. [sent-23, score-0.013]

8 Firstly intro1 duced by Sklar [22], copulas have been successfully used in a wide range of applications, including ﬁnance, time series or natural phenomena modeling [12]. [sent-30, score-0.027]

9 Recently, a new family of copulas named vines have gained interest in the statistics literature [1]. [sent-31, score-0.041]

10 These are methods that factorize multivariate densities into a product of marginal distributions and bivariate copula functions. [sent-32, score-0.053]

11 First, we propose a non-parametric vine copula model which can be used as a high-dimensional density estimator. [sent-35, score-0.051]

12 Second, by making use of this method, we present a new framework to address semi-supervised domain adaptation problems, which performance is validated in a series of experiments with real-world data and competing state-of-the-art techniques. [sent-36, score-0.02]

13 The rest of the paper is organized as follows: Section 2 provides a brief introduction to copulas, and describes a non-parametric estimator for the bivariate case. [sent-37, score-0.015]

14 Section 3 introduces a novel nonparametric vine copula model, which is formed by the described bivariate non-parametric copulas. [sent-38, score-0.056]

15 Section 4 describes a new framework to address semi-supervised domain adaptation problems using the proposed vine method. [sent-39, score-0.029]

16 , xd ) are jointly independent, their density function p(x) can be written as d p(xi ) . [sent-44, score-0.02]

17 This function is called the copula of p(x) [18] and satisﬁes d p(xi ) c(P (x1 ), . [sent-53, score-0.03]

18 p(x) = i=1 (2) copula The copula c is the joint density of P (x1 ), . [sent-57, score-0.071]

19 Therefore, the copula captures any distributional pattern that does not depend on their speciﬁc form, or, in other words, all the information regarding the dependencies between x1 , . [sent-69, score-0.03]

20 , P (xd ) are continuous, the copula c is unique [22]. [sent-76, score-0.03]

21 However, inﬁnitely many multivariate models share the same underlying copula function, as illustrated in Figure 1. [sent-77, score-0.033]

22 The main advantage of copulas is that they allow us to model separately the marginal distributions and the dependencies linking them together to produce the multivariate model subject of study. [sent-78, score-0.034]

23 The transformed data are then used to obtain an estimate c for the copula of p(x). [sent-88, score-0.03]

24 ˆ ˆ ˆ p(x) = ˆ (3) i=1 The estimation of marginal pdfs and cdfs can be implemented in a non-parametric manner by using unidimensional kernel density estimates. [sent-93, score-0.021]

25 By contrast, it is common practice to assume a parametric model for the estimation of the copula function. [sent-94, score-0.032]

26 Some examples of parametric copulas are Gaussian, Gumbel, Frank, Clayton or Student copulas [18]. [sent-95, score-0.052]

27 Nevertheless, real-world data often exhibit complex dependencies which cannot be correctly described by these parametric copula models. [sent-96, score-0.032]

28 This lack of ﬂexibility of parametric copulas is illustrated in Figure 2. [sent-97, score-0.027]

29 0 Figure 1: Left, sample from a Gaussian copula with correlation ρ = 0. [sent-114, score-0.03]

30 Middle and right, two samples drawn from multivariate models with this same copula but different marginal distributions, depicted as rug plots. [sent-116, score-0.036]

31 00 100 100 75 75 50 50 25 25 0 0 0 25 50 75 100 0 25 50 75 100 Figure 2: Left, sample from the copula linking variables 4 and 11 in the W IRELESS dataset. [sent-127, score-0.031]

32 Middle, density estimate generated by a Gaussian copula model when ﬁtted to the data. [sent-128, score-0.041]

33 Right, copula density estimate generated by the non-parametric method described in section 2. [sent-130, score-0.041]

34 to approximate the copula function in a non-parametric manner. [sent-132, score-0.03]

35 Kernel density estimates can also be used to generate non-parametric approximations of copulas, as described in [8]. [sent-133, score-0.014]

36 1 Non-parametric Bivariate Copulas We now elaborate on how to non-parametrically estimate the copula of a given bivariate density p(x, y). [sent-136, score-0.055]

37 Recall that this density can be factorized as the product of its marginals and its copula p(x, y) = p(x) p(y) c(P (x), P (y)). [sent-137, score-0.045]

38 (4) {(xi , yi )}n i=1 Additionally, given a sample from p(x, y), we can obtain a pseudo-sample from its copula c by mapping each observation to the unit square using estimates of the marginal cdfs, namely ˆ ˆ {(ui , vi )}n := {(P (xi ), P (yi ))}n . [sent-138, score-0.038]

39 i=1 i=1 (5) These are approximate observations from the uniformly distributed random variables u = P (x) and v = P (y), whose joint density is the copula function c(u, v). [sent-139, score-0.041]

40 We could try to approximate this density function by placing Gaussian kernels on each observation ui and vi . [sent-140, score-0.017]

41 (6) The copula of this new density is identical to the copula of (4), since the performed transformations are marginal-wise. [sent-146, score-0.071]

42 Then, p(z, w) = ˆ 1 n n N (z, w|zi , wi , Σ), (7) i=1 where N (·, ·|ν1 , ν2 , Σ) is a two-dimensional Gaussian density with mean (ν1 , ν2 ) and covariance matrix Σ. [sent-149, score-0.014]

43 Finally, the copula density c(u, v) is approximated by combining (6) with (7): n p(Φ−1 (u), Φ−1 (v)) ˆ 1 N (Φ−1 (u), Φ−1 (v)|Φ−1 (ui ), Φ−1 (vi ), Σ) c(u, v) = ˆ = . [sent-151, score-0.041]

44 (8) φ(Φ−1 (u))φ(Φ−1 (v)) n i=1 φ(Φ−1 (u))φ(Φ−1 (v)) 3 Regular Vines The method described above can be generalized to the estimation of copulas of more than two random variables. [sent-152, score-0.025]

45 However, although kernel density estimates can be successful in spaces of one or two dimensions, as the number of variables increases, this methods start to be signiﬁcantly affected by the curse of dimensionality and tend to overﬁt to the training data. [sent-153, score-0.019]

46 Additionally, for addressing domain adaptation problems, we are interested in factorizing these high-dimensional copulas into simpler building blocks transferrable accross learning domains. [sent-154, score-0.048]

47 These two drawbacks can be addressed by recent methods in copula modelling called vines [1]. [sent-155, score-0.046]

48 Vines decompose any high-dimensional copula density as a product of bivariate copula densities that can be approximated using the nonparametric model described above. [sent-156, score-0.085]

49 These bivariate copulas (as well as the marginals) correspond to the simple building blocks that we plan to transfer from one learning domain to another. [sent-157, score-0.051]

50 Different types of vines have been proposed in the literature. [sent-158, score-0.016]

51 Some examples are canonical vines, D-vines or regular vines [16, 1]. [sent-159, score-0.021]

52 In this work we focus on regular vines (R-vines) since they are the most general models. [sent-160, score-0.021]

53 In particular, each of the edges in the trees from V specify a different conditional copula density in (10). [sent-193, score-0.047]

54 Changes in each of these factors can be detected and independently transferred accross different learning domains to improve the estimation of the target density function. [sent-195, score-0.025]

55 Later, each edge in bold will correspond to a different bivariate copula function. [sent-200, score-0.047]

56 One major advantage of vines is that they can model high-dimensional data by estimating density functions of only one or two random variables. [sent-202, score-0.027]

57 For this reason, these techniques are signiﬁcantly less affected by the curse of dimensionality than regular density estimators based on kernels, as we show in Section 5. [sent-203, score-0.02]

58 So far Vines have been generally constructed using parametric models for the estimation of bivariate copulas. [sent-204, score-0.016]

59 1 Non-parametric Regular Vines In this section, we introduce a vine distribution in which all participant bivariate copulas can be estimated in a non-parametric manner. [sent-207, score-0.049]

60 Todo so, we model each of the copulas in (10) using the nonparametric method described in Section 2. [sent-208, score-0.025]

61 Let {(ui , vi )}n be a sample from the copula density i=1 c(u, v). [sent-210, score-0.043]

62 We have a total of d(d − 1)/2 bivariate copulas 5 which should be distributed among the different trees. [sent-224, score-0.039]

63 Ideally, we would like to include in the ﬁrst trees of the hierarchy the copulas with strongest dependence level. [sent-225, score-0.03]

64 This will allow us to prune the model by assuming independence in the last k < d trees, since the density function for the independent copula is constant and equal to 1. [sent-226, score-0.041]

65 4 Domain Adaptation with Regular Vines In this section we describe how regular vines can be used to address domain adaptation problems in the non-linear regression setting with continuous data. [sent-234, score-0.04]

66 In regression problems, we are interested in inferring the mapping mechanism or conditional distribution with density p(y|x) that maps one feature vector x = (x1 , . [sent-236, score-0.014]

67 Rephrased into the copula framework, this conditional density can be expressed as d p(y|x) ∝ p(y) cjk|D(e) (13) i=1 e(j,k)∈Ei where E1 , . [sent-240, score-0.043]

68 In the classic domain adaptation setup we usually have large amounts of data for solving a source task characterized by the density function ps (x, y). [sent-245, score-0.039]

69 However, only a partial or reduced sample is available for solving a target task with density pt (x, y). [sent-246, score-0.022]

70 Given the data available for both tasks, our objective is to build a good estimate for the conditional density pt (y|x). [sent-247, score-0.017]

71 To address this domain adaptation problem, we assume that pt is a modiﬁed version of ps . [sent-248, score-0.027]

72 First, ps is expressed using an R-vine representation as in (10) and second, some of the factors included in that representation (marginal distributions or pairwise copulas) are modiﬁed to derive pt . [sent-250, score-0.013]

73 All we need to address the adaptation across domains is to reconstruct the R-vine representation of ps using data from the source task, and then identify which of the factors have been modiﬁed to produce pt . [sent-251, score-0.03]

74 , d, or Ps (y) = Pt (y), and we need to re-generate the estimates of the affected marginals using data from the target task. [sent-258, score-0.014]

75 Additionally, some of the bivariate copulas cjk|D(e) may differ from source to target tasks. [sent-259, score-0.048]

76 In this case, we also re-estimate the affected copulas using data from the target task. [sent-260, score-0.032]

77 Simultaneous changes in both copulas and marginals can occur. [sent-261, score-0.03]

78 Finally, if some of the factors remain constant across domains, we can use the available data from the target task to improve the estimates obtained using only the data from the source task. [sent-263, score-0.016]

79 Speciﬁcally, extra unlabeled target task data can be used to reﬁne the factors in the R-Vine decomposition of pt which do not depend on y. [sent-310, score-0.015]

80 This is still valid even in the limiting case of not having access to labeled data from the target task at training time (unsupervised domain adaptation). [sent-311, score-0.013]

81 The ﬁrst series illustrates the accuracy of the density estimates generated by the proposed non-parametric vine method. [sent-313, score-0.026]

82 The second series validates the effectiveness of the proposed framework for domain adaptation problems in the non-linear regression setting. [sent-314, score-0.02]

83 For comparative purposes, we include the results of different state-of-the-art domain adaptation methods whose parameters are selected by a 10-fold cross validation process on the training data. [sent-316, score-0.017]

84 Approximations: A complete R-Vine requires the use of conditional copula functions, which are challenging to learn. [sent-317, score-0.032]

85 A common approximation is to ignore any dependence between the copula functional form and its set of conditioning variables. [sent-318, score-0.035]

86 Note that the copula functions arguments remain to be conditioned cdfs. [sent-319, score-0.032]

87 1 Accuracy of Non-parametric Regular Vines for Density Estimation The density estimates generated by the new non-parametric R-vine method (NPRV) are evaluated on data from six normalized UCI datasets [9]. [sent-323, score-0.014]

88 We compare against a standard density estimator based on Gaussian kernels (KDE), and a parametric vine method based on bivariate Gaussian copulas (GRV). [sent-324, score-0.062]

89 2 Comparison with other Domain Adaptation Methods NPRV is analyzed in a series of experiments for domain adaptation on the non-linear regression setting with real-world data. [sent-331, score-0.02]

90 They are two gaussian process (GP) methods, the ﬁrst one trained only with data from the source task, and the second one trained with the normalized union of data from both source and target problems. [sent-335, score-0.013]

91 The other ﬁve methods are considered state-of-the-art domain adaptation techniques. [sent-336, score-0.017]

92 KMM [11] minimizes the distance of marginal distributions in source and target domains by matching their means when mapped into an universal RKHS. [sent-458, score-0.017]

93 For training, we randomly sample 1000 data points for both source and target tasks, where all the data in the source task and 5% of the data in the target task are labeled. [sent-461, score-0.022]

94 Finally, the two bottom rows in Table 2 show the average number of marginals and bivariate copulas which are updated in each dataset during the execution of NPRV, respectively. [sent-466, score-0.043]

95 Parametric copulas may be used to reduce the computational demands. [sent-473, score-0.025]

96 6 Conclusions We have proposed a novel non-parametric domain adaptation strategy based on copulas. [sent-474, score-0.017]

97 The new approach works by decomposing any multivariate density into a product of marginal densities and bivariate copula functions. [sent-475, score-0.061]

98 Changes in these factors across different domains can be detected using two sample tests, and transferred across domains in order to adapt the target task density model. [sent-476, score-0.027]

99 This technique leads to better density estimates than standard parametric vines or KDE, and is also able to outperform a large number of alternative domain adaptation methods in a collection of regression problems with real-world data. [sent-478, score-0.05]

100 Families of m-variate distributions with given margins and m(m − 1)/2 bivariate dependence parameters. [sent-564, score-0.018]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('qq', 0.898), ('qqq', 0.337), ('qqqq', 0.205), ('qqqqq', 0.125), ('qqqqqqq', 0.085), ('qqqqqq', 0.074), ('qqqqqqqq', 0.055), ('qqqqqqqqq', 0.04), ('copula', 0.03), ('copulas', 0.025), ('qqqqqqqqqq', 0.024), ('vines', 0.016), ('nprv', 0.014), ('bivariate', 0.014), ('qqqqqqqqqqq', 0.013), ('qqqqqqqqqqqqqqq', 0.011), ('adaptation', 0.011), ('density', 0.011), ('cjk', 0.01), ('vine', 0.01), ('xd', 0.009), ('qqqqqqqqqqqq', 0.009), ('qqqqqqqqqqqqq', 0.009), ('qqqqqqqqqqqqqq', 0.008), ('qqqqqqqqqqqqqqqq', 0.008), ('domain', 0.006), ('target', 0.005), ('ps', 0.005), ('regular', 0.005), ('pt', 0.004), ('marginals', 0.004), ('cdf', 0.004), ('source', 0.004), ('kde', 0.004), ('ui', 0.004), ('cdfs', 0.004), ('marginal', 0.003), ('conditioning', 0.003), ('ei', 0.003), ('transfer', 0.003), ('accross', 0.003), ('grv', 0.003), ('qqqqqqqqqqqqqqqqqq', 0.003), ('qqqqqqqqqqqqqqqqqqqqq', 0.003), ('unprv', 0.003), ('wi', 0.003), ('zi', 0.003), ('daume', 0.003), ('trees', 0.003), ('edge', 0.003), ('multivariate', 0.003), ('estimates', 0.003), ('domains', 0.003), ('mmd', 0.002), ('parametric', 0.002), ('dependence', 0.002), ('vi', 0.002), ('uci', 0.002), ('tree', 0.002), ('pj', 0.002), ('td', 0.002), ('factors', 0.002), ('atgp', 0.002), ('kmm', 0.002), ('qqqqqqqqqqqqqqqqq', 0.002), ('qqqqqqqqqqqqqqqqqqqq', 0.002), ('qqqqqqqqqqqqqqqqqqqqqqqq', 0.002), ('task', 0.002), ('corrected', 0.002), ('conditional', 0.002), ('curse', 0.002), ('xi', 0.002), ('frustratingly', 0.002), ('kurowicka', 0.002), ('affected', 0.002), ('language', 0.002), ('distributions', 0.002), ('unlabeled', 0.002), ('blocks', 0.002), ('pdfs', 0.002), ('series', 0.002), ('conditioned', 0.002), ('nmse', 0.002), ('mpi', 0.002), ('borgwardt', 0.002), ('formed', 0.002), ('changes', 0.001), ('linking', 0.001), ('quantile', 0.001), ('transferred', 0.001), ('address', 0.001), ('correcting', 0.001), ('bonilla', 0.001), ('kernel', 0.001), ('regression', 0.001), ('building', 0.001), ('describes', 0.001), ('factorize', 0.001), ('joining', 0.001), ('edges', 0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

2 0.89288235 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspeciﬁed marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefﬁcient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

3 0.54677123 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It ﬁrst maps data points to mixture components and then assigns to each mixture component a label drawn from a component-speciﬁc multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classiﬁcation predictions. 1

4 0.39111444 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efﬁcient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) ﬁlter. In order to quickly track changes in the model and put more weight on recent data, the RLS ﬁlter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

5 0.023539983 211 nips-2012-Meta-Gaussian Information Bottleneck

Author: Melanie Rey, Volker Roth

Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1

6 0.021983845 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

7 0.016972804 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

8 0.0090040537 142 nips-2012-Generalization Bounds for Domain Adaptation

9 0.0068706064 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

10 0.0054920288 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

11 0.0052863602 351 nips-2012-Transelliptical Component Analysis

12 0.004232341 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

13 0.0038358339 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

14 0.0035419676 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

15 0.0035138559 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

16 0.0034325784 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

17 0.0033528619 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

18 0.0031045079 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

19 0.0029888032 187 nips-2012-Learning curves for multi-task Gaussian process regression

20 0.0028343203 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.026), (1, 0.024), (2, 0.018), (3, -0.012), (4, -0.007), (5, -0.006), (6, 0.901), (7, -0.058), (8, -0.042), (9, 0.028), (10, 0.068), (11, -0.032), (12, -0.039), (13, 0.037), (14, 0.022), (15, 0.028), (16, -0.034), (17, -0.003), (18, 0.019), (19, -0.023), (20, 0.02), (21, 0.007), (22, 0.018), (23, -0.039), (24, -0.003), (25, 0.024), (26, 0.023), (27, 0.005), (28, 0.013), (29, 0.006), (30, 0.004), (31, -0.028), (32, -0.018), (33, 0.028), (34, 0.012), (35, 0.01), (36, 0.004), (37, 0.016), (38, -0.007), (39, -0.001), (40, 0.016), (41, -0.008), (42, -0.007), (43, 0.008), (44, -0.011), (45, -0.001), (46, 0.005), (47, -0.006), (48, 0.019), (49, -0.008)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9994747 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

2 0.97954756 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

3 0.78774041 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

4 0.78039545 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

5 0.084852569 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

Author: Yao-nan Chen, Hsuan-tien Lin

Abstract: Label space dimension reduction (LSDR) is an efﬁcient and effective paradigm for multi-label classiﬁcation with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efﬁciently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many real-world datasets. 1

6 0.057279844 211 nips-2012-Meta-Gaussian Information Bottleneck

7 0.055671096 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

8 0.043135021 256 nips-2012-On the connections between saliency and tracking

9 0.042317174 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

10 0.039468583 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

11 0.038599595 280 nips-2012-Proper losses for learning from partial labels

12 0.038500961 142 nips-2012-Generalization Bounds for Domain Adaptation

13 0.034942131 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

14 0.033333965 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

15 0.031986531 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

16 0.030969856 351 nips-2012-Transelliptical Component Analysis

17 0.030957879 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

18 0.028914476 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

19 0.028391792 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

20 0.027073113 194 nips-2012-Learning to Discover Social Circles in Ego Networks

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.57), (38, 0.019), (39, 0.015), (55, 0.025), (74, 0.015), (76, 0.046), (80, 0.018), (92, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95210719 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

2 0.54192793 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

Author: Hankz H. Zhuo, Qiang Yang, Subbarao Kambhampati

Abstract: Multi-Agent Plan Recognition (MAPR) aims to recognize dynamic team structures and team behaviors from the observed team traces (activity sequences) of a set of intelligent agents. Previous MAPR approaches required a library of team activity sequences (team plans) be given as input. However, collecting a library of team plans to ensure adequate coverage is often difﬁcult and costly. In this paper, we relax this constraint, so that team plans are not required to be provided beforehand. We assume instead that a set of action models are available. Such models are often already created to describe domain physics; i.e., the preconditions and effects of effects actions. We propose a novel approach for recognizing multi-agent team plans based on such action models rather than libraries of team plans. We encode the resulting MAPR problem as a satisﬁability problem and solve the problem using a state-of-the-art weighted MAX-SAT solver. Our approach also allows for incompleteness in the observed plan traces. Our empirical studies demonstrate that our algorithm is both effective and efﬁcient in comparison to state-of-the-art MAPR methods based on plan libraries. 1

3 0.34322435 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We describe how the pretraining algorithm for Deep Boltzmann Machines (DBMs) is related to the pretraining algorithm for Deep Belief Networks and we show that under certain conditions, the pretraining procedure improves the variational lower bound of a two-hidden-layer DBM. Based on this analysis, we develop a different method of pretraining DBMs that distributes the modelling work more evenly over the hidden layers. Our results on the MNIST and NORB datasets demonstrate that the new pretraining algorithm allows us to learn better generative models. 1

4 0.22530665 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

Author: Odalric-ambrym Maillard

Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a ﬁxed partition P of a continuous space X being given, and a process ν deﬁned on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a ﬁnite-time regret analysis. 1

5 0.13893975 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

6 0.12582283 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

7 0.10301466 95 nips-2012-Density-Difference Estimation

8 0.1022541 215 nips-2012-Minimizing Uncertainty in Pipelines

9 0.10128675 211 nips-2012-Meta-Gaussian Information Bottleneck

10 0.10097964 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

11 0.10028389 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

12 0.099486105 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

13 0.099141479 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

14 0.098244451 188 nips-2012-Learning from Distributions via Support Measure Machines

15 0.097466178 210 nips-2012-Memorability of Image Regions

16 0.09707471 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

17 0.09706369 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

18 0.097049743 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

19 0.096683599 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

20 0.096400857 352 nips-2012-Transelliptical Graphical Models