nips nips2012 nips2012-35 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey
Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. [sent-9, score-0.103]
2 In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. [sent-11, score-0.415]
3 The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. [sent-12, score-0.63]
4 The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). [sent-14, score-0.741]
5 This considerable attention comes from the ability of Additive Models to represent non-linear associations between covariates and response variables in an intuitive way, and the availability of efficient training methods. [sent-17, score-0.086]
6 The fundamental assumption of Additive Models is that the effect of covariates on the dependent variable follows an additive form. [sent-18, score-0.207]
7 The separate effects are modeled by smoothing splines, which can be learned using penalized least squares. [sent-19, score-0.103]
8 A particularly fruitful field for the application of Additive Models is the modeling and forecasting of short term electricity load. [sent-20, score-0.557]
9 Additive Models were applied, with good results, to the nation-wide load in France [11] and to regional loads in Australia [12]. [sent-22, score-0.41]
10 Besides electricity load, Additive Models have also been applied to natural gas demand [13]. [sent-23, score-0.453]
11 Several methods have been proposed to track time-varying behaviour of the smoothing splines in Additive Models. [sent-24, score-0.242]
12 A componentwise smoothing spline is suggested by Chiang et al. [sent-29, score-0.236]
13 Fan and Zhang [17] propose a two-stage algorithm which first computes raw estimates of the smoothing functions at different time points and then smoothes the estimates. [sent-31, score-0.103]
14 In [19], an algorithm based on iterative QR decompositions is proposed, which yields promising results for the French electricity load but also highlights the need for a forgetting factor to be more reactive, e. [sent-35, score-1.156]
15 Harvey and Koopman [20] propose an adaptive learning method which is restricted to changing periodic patterns. [sent-38, score-0.083]
16 The contributions of our paper are threefold: First, we introduce a new algorithm which combines Additive Models with a Recursive Least Squares (RLS) filter to track time-varying behaviour of the smoothing splines. [sent-40, score-0.146]
17 Second, in order to enhance the tracking ability, we consider filters that include a forgetting factor which can be either fixed, or updapted using a gradient descent approach [23]. [sent-41, score-0.469]
18 The basic idea is to decrease the forgetting factor (and hence increase the reactivity) in transient phases, and increasing the forgetting factor (thus decreasing the variability) during stationary regimes. [sent-42, score-0.83]
19 Third, we evaluate the proposed methodology on 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). [sent-44, score-0.741]
20 The results show that e the adaptive learning algorithm outperforms state-of-the-art methods in terms of model tracking and prediction accuracy. [sent-45, score-0.137]
21 Moreover, the experiments demonstrate that using an adaptive forgetting factor stabilizes the algorithm and yields results comparable to those obtained by using the (a priori unknown) optimal value for a fixed forgetting factor. [sent-46, score-0.901]
22 The reason is that we are specifically interested in adaptive versions of Additive Models, which have been shown to be particularly well-suited for modeling and forecasting electricity load. [sent-48, score-0.64]
23 Section 2 reviews the definition of Additive Models and provides some background on the spline representation of smoothing functions. [sent-50, score-0.236]
24 In Section 3 we present our adaptive learning algorithms which combine Additive Models with a Recursive Least Squares (RLS) filter. [sent-51, score-0.083]
25 We discuss different approaches for including forgetting factors and analyze the learning rate for the gradient descent method in the adaptive forgetting factor approach. [sent-52, score-0.866]
26 A case study with real electricity load data from EDF is presented in Section 4. [sent-53, score-0.741]
27 2 Additive Models In this section we review the Additive Models and provide background information on the spline representation of smoothing functions. [sent-55, score-0.236]
28 Additive Models have the following form: I yk = fi (xk ) + k. [sent-56, score-0.073]
29 i=1 In this formulation, xk is a vector of covariates which can be either categorical or continuous, and yk is the dependent variable, which is assumed to be continuous. [sent-57, score-0.302]
30 The functions fi are the transfer functions of the model, which can be of the following types: constant (exactly one transfer function, representing the intercept of the model), categorical (evaluating to 0 or 1 depending on whether the covariates satisfy certain conditions), or continuous. [sent-59, score-0.159]
31 The continuous transfer functions can be either linear functions of covariates (representing simple linear trends), or smoothing splines. [sent-60, score-0.189]
32 Typically, smoothing splines depend on only 1-2 of the continuous covariates. [sent-61, score-0.199]
33 An interesting possibility is to combine smoothing splines with categorical conditions; in the context of electricity load modeling this allows, e. [sent-62, score-0.971]
34 , for having different effects of the time of the day depending on the day of the week. [sent-64, score-0.116]
35 Note that the basis functions are defined by a (fixed) sequence of knot points, while the coefficients are used to fit the spline to the data (see [1] for details). [sent-66, score-0.133]
36 The quantity Ji in equation (1) is the number of spline coefficients associated with the transfer function fi . [sent-67, score-0.133]
37 Now, let β denote the stacked vector containing the spline coefficients, and b(xk ) the stacked vector containing the spline basis functions of all the transfer functions. [sent-68, score-0.266]
38 , b(xK )T containing the evaluated spline basis functions. [sent-83, score-0.133]
39 In this paper, we consider two scenarios: ΩK is the identity matrix (putting equal weight on the K regressors), or a diagonal matrix which puts exponentially decreasing weights on the samples, according to the order of their arrival (thus giving rise to the notion of forgetting factors). [sent-85, score-0.368]
40 The matrix S K in (3) introduces a penalizing term in order to avoid overfitting of the smoothing splines. [sent-87, score-0.103]
41 Note that this penalizer shrinks the smoothing splines towards zero functions, and the strength of this effect is tuned by γ. [sent-92, score-0.34]
42 K K (5) Adaptive learning of smoothing functions Equation (5) gives rise to an efficient batch learning algorithm for Additive Models. [sent-94, score-0.103]
43 Next, we propose an adaptive method which allows us to track changes in the smoothing functions in an online fashion. [sent-95, score-0.186]
44 To improve the tracking behaviour, we introduce a forgetting factor which puts more weight on recent samples. [sent-97, score-0.469]
45 The initial precision matrix P 0 is set equal to the inverse of the penalizer S in (4). [sent-102, score-0.109]
46 Let us discuss the role of the forgetting factor ω in the adaptive learning algorithm. [sent-104, score-0.498]
47 , ω 2 , ω, 1) and the penalizer S as defined in (4). [sent-108, score-0.109]
48 In general, a smaller forgetting factor improves the tracking of temporal changes in the model coefficients β. [sent-111, score-0.469]
49 Therefore, finding the right balance between the forgetting factor ω and the strength γ of the penalizer in (4) is crucial for a good performance of the forecasting algorithm. [sent-113, score-0.727]
50 Algorithm 1 Adaptive learning (fixed forgetting factor) 1: Input: Initial estimate β 0 , forgetting factor ω ∈ (0, 1], penalizer strength γ > 0. [sent-114, score-0.924]
51 do 4: Obtain new covariates xk and dependent variable yk . [sent-122, score-0.271]
52 5: Compute the spline basis functions bk = b(xk ). [sent-123, score-0.163]
53 6: Compute the a priori error and the Kalman gain: k gk 7: = yk − bT β k−1 , k P k−1 bk = . [sent-124, score-0.138]
54 k 8: end for Algorithm 2 Adaptive learning (adaptive forgetting factor) 1: Input: Initial estimate β 0 , initial forgetting factor ω0 ∈ (0, 1], lower bound for the forgetting factor ωmin ∈ (0, 1], learning rate η > 0, penalizer strength γ > 0. [sent-126, score-1.339]
55 6: Update the forgetting factor: ωk 7: 8: = ωk−1 + η bT ψ k−1 k . [sent-133, score-0.368]
56 1 Adaptive forgetting factors In this section we present a modification of Algorithm 1 which uses adaptive forgetting factors in order to improve the stability and the tracking behaviour. [sent-139, score-0.927]
57 The basic idea is to choose a large forgetting factor during stationary regimes (when the a priori errors are small), and small forgetting factors during transient phases (when the a priori error is large). [sent-140, score-0.853]
58 In this paper we adopt the gradient descent approach in [23] and update the forgetting factor according to the following formula: ωk = ωk−1 − η 2 ∂ E[ k ] . [sent-141, score-0.415]
59 The learning rate η > 0 determines the reactivity of the algorithm: if it is high, then the errors lead to large decreases of the forgetting factor, and vice versa. [sent-143, score-0.395]
60 The details of the adaptive forgetting factor approach are given in Algorithm 2. [sent-144, score-0.498]
61 Recall the definition of the a priori error, k = yk − bT β k−1 . [sent-149, score-0.108]
62 bT ψ k−1 k Case study: Forecasting of electricity load In this section, we apply our adaptive learning algorithms to real electricity load data provided by the French utility company Electricit´ de France (EDF). [sent-162, score-1.565]
63 Modeling and forecasting electricity load e is a challenging task due to the non-linear effects, e. [sent-163, score-0.912]
64 Moreover, the electricity load exhibits many non-stationary patterns, e. [sent-166, score-0.741]
65 , due to changing macroeconomic conditions (leading to an increase/decrease in electricity demand), or varying customer portfolios resulting from the liberalization of European electricity markets. [sent-168, score-0.799]
66 The performance on these highly complex, non-linear and non-stationary learning tasks is a challenging benchmark for our adaptive algorithms. [sent-169, score-0.083]
67 1 Experimental data The dependent variables yk in the data provided by EDF represent half-hourly electricity load measurements between February 2, 2006 and April 6, 2011. [sent-171, score-0.814]
68 The covariates xk include the following information: xk = xDayType , xTimeOfDay , xTimeOfYear , xTemperature , xCloudCover , xLoadDecrease . [sent-172, score-0.31]
69 k k k k k k Let us explain these components in more detail: • xDayType is a categorical variable representing the day type: 1 for Sunday, 2 for Monday, 3 k for Tuesday-Wednesday-Thursday, 4 for Friday, 5 for Saturday, and 6 for bank holidays. [sent-173, score-0.089]
70 • xTemperature and xCloudCover represent the temperature and the cloud cover (ranging from 0 for a k k blue sky to 8 for overcast). [sent-179, score-0.088]
71 These meteorological covariates have been provided by M´ t´ o ee France; the raw data include temperature and cloud cover data recorded every 3 hours from 26 weather stations all over France. [sent-180, score-0.201]
72 A weighted average – the weights reflecting the importance of a region in terms of the national electricity load – is computed to obtain the national temperature and cloud cover covariates. [sent-182, score-0.829]
73 • xLoadDecrease contains information about the activation of contracts between EDF and some k big customers to reduce the electricity load during peak days. [sent-183, score-0.741]
74 • f LagLoad (yk−48 ) takes into account the electricity load of the previous day. [sent-190, score-0.741]
75 • f CloudCover (xk ) and f Temperature/TimeOfDay (xk ) represent respectively the effect of the cloud cover and the bivariate effect of the temperature and the time of the day. [sent-192, score-0.088]
76 • f TimeOfYear (xk ) represents yearly cycles, and xLoadDecrease f LoadDecrease (xk ) models the effect of k contracts to reduce peak loads depending on the time of the day. [sent-194, score-0.096]
77 For more information about the design of models for electricity data we refer to [19, 11]. [sent-196, score-0.386]
78 Figure 1 shows the estimated joint effect of the temperature and the time of the day, and the estimated yearly cycle. [sent-197, score-0.092]
79 high) temperatures lead to an increase of the electricity load due to electrical heating (resp. [sent-199, score-0.809]
80 cooling), whereas temperatures between 10◦ and 20◦ Celsius have almost no effect on the electricity load. [sent-200, score-0.413]
81 Due to the widespread usage of electrical heating and relatively low usage of air conditioning in France, the effect of heating is approximately four times higher than the effect of cooling. [sent-201, score-0.082]
82 The yearly cycle reveals a strong decrease of the electricity load during the summer and Christmas holidays (around 0. [sent-202, score-0.782]
83 The fitted model consists of 268 spline basis coefficients, which indicates the complexity of modeling electricity load data. [sent-209, score-0.874]
84 q q q qq q q q q q qq q q qq qq q qq q q q q q q q q q q q qq q q q qq q qq q q q q q q q q q q q q q q q qq qq q q q q 1. [sent-210, score-4.07]
85 5 q q q q qq q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq q qq q qq q q q q q q q q q q q q q q q q q q E q Y C 0. [sent-212, score-2.442]
86 Values of the forgetting factor close to 1 result in reduced tracking behaviour and less improvement over the offline approach. [sent-214, score-0.512]
87 Choosing too small values for the forgetting factor can lead to loss of information and instabilities of the algorithm. [sent-215, score-0.415]
88 Increasing the penalizer reduces the variability of the smoothing splines, however, it also introduces a bias as the splines are shrinked towards zero. [sent-216, score-0.308]
89 0 Table 1: Performance of the five different forecasting methods −1. [sent-228, score-0.171]
90 We have introduced methods to improve the tracking behaviour based on forgetting factors and analyzed theoretical properties using results from Lyapunov stability theory. [sent-230, score-0.519]
91 The significance of the proposed algorithms was demonstrated in the context of forecasting electricity load data. [sent-231, score-0.912]
92 Experiments on 5 years of data from Electricit´ de France e have shown the superior performance of algorithms using an adaptive forgetting factor. [sent-233, score-0.451]
93 As it turned out, a crucial point is to find the right combination of forgetting factors and the strength of the penalizer. [sent-234, score-0.4]
94 While forgetting factors tend to reduce the bias of models evolving over time, they typically increase the variance, an effect which can be compensated by choosing stronger penalizer. [sent-235, score-0.368]
95 , by integrating beliefs for the initial values of the adaptive algorithms. [sent-239, score-0.083]
96 Modeling electricity loads in California: ARMA models with hyperbolic noise. [sent-244, score-0.441]
97 Short-term load forecasting based on an adaptive hybrid method . [sent-273, score-0.609]
98 A statistical model for natural gas standardized load profiles. [sent-286, score-0.382]
99 Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. [sent-292, score-0.103]
100 Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. [sent-306, score-0.133]
wordName wordTfidf (topN-words)
[('qq', 0.407), ('electricity', 0.386), ('forgetting', 0.368), ('load', 0.355), ('forecasting', 0.171), ('forge', 0.157), ('fac', 0.145), ('spline', 0.133), ('additive', 0.121), ('xk', 0.112), ('penalizer', 0.109), ('smoothing', 0.103), ('splines', 0.096), ('edf', 0.096), ('hod', 0.096), ('ng', 0.094), ('bt', 0.092), ('covariates', 0.086), ('lyapunov', 0.086), ('adaptive', 0.083), ('adap', 0.082), ('mape', 0.082), ('yk', 0.073), ('gor', 0.072), ('deno', 0.068), ('erms', 0.068), ('fff', 0.068), ('reng', 0.068), ('zer', 0.068), ('qqq', 0.063), ('rls', 0.063), ('transaction', 0.063), ('pena', 0.06), ('day', 0.058), ('aff', 0.055), ('bes', 0.055), ('electricit', 0.055), ('hms', 0.055), ('loads', 0.055), ('xloaddecrease', 0.055), ('tracking', 0.054), ('stability', 0.054), ('temperature', 0.051), ('france', 0.049), ('se', 0.049), ('va', 0.049), ('ve', 0.048), ('wh', 0.048), ('factor', 0.047), ('ch', 0.043), ('behaviour', 0.043), ('intercept', 0.042), ('comb', 0.042), ('fan', 0.042), ('affg', 0.041), ('heating', 0.041), ('parame', 0.041), ('resu', 0.041), ('upda', 0.041), ('xdaytype', 0.041), ('yearly', 0.041), ('demand', 0.04), ('rmse', 0.037), ('cloud', 0.037), ('eva', 0.036), ('mulhuddart', 0.036), ('squares', 0.035), ('priori', 0.035), ('na', 0.034), ('da', 0.034), ('dublin', 0.033), ('strength', 0.032), ('ireland', 0.031), ('categorical', 0.031), ('french', 0.031), ('pos', 0.03), ('bk', 0.03), ('coef', 0.029), ('bra', 0.027), ('cloudcover', 0.027), ('colin', 0.027), ('daytype', 0.027), ('eubank', 0.027), ('fltimeofday', 0.027), ('forecas', 0.027), ('gas', 0.027), ('goude', 0.027), ('jianqing', 0.027), ('lagload', 0.027), ('lagtemperature', 0.027), ('loaddecrease', 0.027), ('meteorological', 0.027), ('mgcv', 0.027), ('mprovemen', 0.027), ('penalizers', 0.027), ('portfolios', 0.027), ('reactivity', 0.027), ('temperatures', 0.027), ('timeofyear', 0.027), ('xcloudcover', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting
Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey
Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1
2 0.39949682 310 nips-2012-Semiparametric Principal Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
3 0.39111444 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas
Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf
Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1
4 0.25105974 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning
Author: Liping Liu, Thomas G. Dietterich
Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1
5 0.097904302 300 nips-2012-Scalable nonconvex inexact proximal splitting
Author: Suvrit Sra
Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1
6 0.071590587 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization
7 0.064862065 27 nips-2012-A quasi-Newton proximal splitting method
8 0.062396668 282 nips-2012-Proximal Newton-type methods for convex optimization
9 0.056027994 41 nips-2012-Ancestor Sampling for Particle Gibbs
10 0.050104514 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
12 0.046638012 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection
13 0.045444719 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
14 0.044176247 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
15 0.041650459 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
16 0.040985107 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
17 0.036590133 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
18 0.035340153 187 nips-2012-Learning curves for multi-task Gaussian process regression
19 0.035317138 34 nips-2012-Active Learning of Multi-Index Function Models
20 0.035176646 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
topicId topicWeight
[(0, 0.099), (1, 0.022), (2, 0.044), (3, -0.012), (4, 0.011), (5, 0.027), (6, 0.557), (7, -0.077), (8, 0.047), (9, 0.04), (10, -0.001), (11, 0.001), (12, 0.002), (13, -0.008), (14, -0.02), (15, -0.008), (16, -0.045), (17, 0.016), (18, -0.001), (19, -0.034), (20, 0.003), (21, 0.019), (22, -0.037), (23, -0.056), (24, -0.0), (25, 0.027), (26, 0.008), (27, -0.009), (28, 0.045), (29, -0.028), (30, 0.003), (31, -0.001), (32, -0.001), (33, 0.046), (34, 0.046), (35, 0.042), (36, -0.018), (37, 0.002), (38, 0.005), (39, -0.001), (40, -0.02), (41, -0.03), (42, 0.001), (43, -0.036), (44, 0.034), (45, 0.001), (46, 0.013), (47, -0.011), (48, -0.051), (49, 0.037)]
simIndex simValue paperId paperTitle
1 0.93676978 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas
Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf
Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1
same-paper 2 0.92582542 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting
Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey
Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1
3 0.92422903 310 nips-2012-Semiparametric Principal Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1
4 0.75277853 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning
Author: Liping Liu, Thomas G. Dietterich
Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1
5 0.25520298 300 nips-2012-Scalable nonconvex inexact proximal splitting
Author: Suvrit Sra
Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1
7 0.22652346 282 nips-2012-Proximal Newton-type methods for convex optimization
8 0.21601129 27 nips-2012-A quasi-Newton proximal splitting method
9 0.20006014 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition
10 0.19539116 41 nips-2012-Ancestor Sampling for Particle Gibbs
11 0.19121838 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
13 0.17232262 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
14 0.16990305 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
15 0.16635789 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection
16 0.16310537 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
17 0.16218388 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process
18 0.15396158 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
19 0.15221372 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models
20 0.15209258 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
topicId topicWeight
[(0, 0.026), (21, 0.033), (36, 0.485), (38, 0.06), (42, 0.011), (54, 0.032), (55, 0.018), (74, 0.018), (76, 0.11), (80, 0.054), (92, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.74785626 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting
Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey
Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1
2 0.61034107 295 nips-2012-Risk-Aversion in Multi-armed Bandits
Author: Amir Sani, Alessandro Lazaric, Rémi Munos
Abstract: Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results. 1
3 0.58826351 288 nips-2012-Rational inference of relative preferences
Author: Nisheeth Srivastava, Paul R. Schrater
Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1
4 0.43696812 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric
Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1
5 0.41605806 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
Author: Po-ling Loh, Martin J. Wainwright
Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1
6 0.41411358 69 nips-2012-Clustering Sparse Graphs
7 0.34487239 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
8 0.31536043 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback
9 0.31456625 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button
10 0.30834466 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity
11 0.30566037 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
12 0.30432209 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation
13 0.30348781 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling
14 0.30152097 11 nips-2012-A Marginalized Particle Gaussian Process Regression
15 0.299503 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions
16 0.29331762 358 nips-2012-Value Pursuit Iteration
17 0.29305348 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
18 0.29140607 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
19 0.2912091 41 nips-2012-Ancestor Sampling for Particle Gibbs