jmlr jmlr2008 jmlr2008-37 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jia Li, Andrew W. Moore
Abstract: Web sites must forecast Web page views in order to plan computer resource allocation and estimate upcoming revenue and advertising growth. In this paper, we focus on extracting trends and seasonal patterns from page view series, two dominant factors in the variation of such series. We investigate the Holt-Winters procedure and a state space model for making relatively short-term prediction. It is found that Web page views exhibit strong impulsive changes occasionally. The impulses cause large prediction errors long after their occurrences. A method is developed to identify impulses and to alleviate their damage on prediction. We also develop a long-range trend and season extraction method, namely the Elastic Smooth Season Fitting (ESSF) algorithm, to compute scalable and smooth yearly seasons. ESSF derives the yearly season by minimizing the residual sum of squares under smoothness regularization, a quadratic optimization problem. It is shown that for longterm prediction, ESSF improves accuracy significantly over other methods that ignore the yearly seasonality. Keywords: web page views, forecast, Holt-Winters, Kalman filtering, elastic smooth season fitting
Reference: text
sentIndex sentText sentNum sentScore
1 We also develop a long-range trend and season extraction method, namely the Elastic Smooth Season Fitting (ESSF) algorithm, to compute scalable and smooth yearly seasons. [sent-12, score-0.951]
2 ESSF derives the yearly season by minimizing the residual sum of squares under smoothness regularization, a quadratic optimization problem. [sent-13, score-0.866]
3 Keywords: web page views, forecast, Holt-Winters, Kalman filtering, elastic smooth season fitting 1. [sent-15, score-0.704]
4 There are multiple approaches to trend and season removal (Brockwell and Davis, 2002). [sent-31, score-0.611]
5 In some cases, however, trend and season may be the dominant factors in prediction and require methods devoted to their extraction. [sent-40, score-0.658]
6 For daily page view series, there is usually a weekly season and sometimes a long-range yearly season. [sent-53, score-1.042]
7 Both HW and SSM can effectively extract the weekly season, but not the yearly season for several reasons elaborated in Section 4. [sent-54, score-0.921]
8 It is observed that instead of being a periodic sequence, the yearly seasonality often emerges as a yearly pattern that may scale differently across the years. [sent-56, score-0.698]
9 Experiments show that the prediction accuracy can be improved remarkably based on the yearly season computed by ESSF, especially for forecasting distant future. [sent-58, score-0.972]
10 For long-term prediction several months ahead, we develop the ESSF algorithm to extract global trends and scalable yearly seasonal effects after separating the weekly season using HW. [sent-81, score-1.041]
11 3 Application Scope The prediction methods in this paper focus on extracting trend and season at several scales, and are not suitable for modeling stationary stochastic processes. [sent-83, score-0.658]
12 The yearly season extraction by ESSF is found to improve long-term prediction. [sent-92, score-0.839]
13 The basic assumption of ESSF is that the time series exhibits a yearly pattern, possibly scaled differently across the years. [sent-93, score-0.417]
14 Therefore, the series are insufficient for exploiting 2219 L I AND M OORE yearly seasonality in prediction. [sent-100, score-0.469]
15 The Holt-Winters (HW) procedure (Chatfield, 2004) decomposes the series into level Lt , season It , and noise. [sent-120, score-0.628]
16 To better see this, let us assume the season and linear growth terms are absent. [sent-125, score-0.578]
17 When the season is added, xt subtracted by the estimated season at t becomes the part of Lt indicated by current information. [sent-134, score-1.18]
18 At this point of recursion, the most up-to-date estimation for the season at t is It−d under period d. [sent-135, score-0.552]
19 Similar rationale applies to the update 2220 F ORECASTING W EB PAGE V IEWS Original series Predicted series Error 1 Original series Predicted series 1 1 0. [sent-139, score-0.412]
20 The linear function of h, Lt + hTt , with slope given by the most updated linear growth Tt , can be regarded as an estimation for Lt+h ; while It−d+h mod d , the most updated season at the same cyclic position as t + h, which is already available at t, is the estimation for It+h . [sent-186, score-0.578]
21 At time t, let the prediction for xt based on the past series up to t − 1 be xt , and the prediction ˆ error be et = xt − xt . [sent-220, score-0.752]
22 2222 F ORECASTING W EB PAGE V IEWS Apply HW to obtain level L τ and weekly season I τ at τ =1,2,. [sent-240, score-0.607]
23 , 2D Detect impulse and adjust the series up to t, using the HW procedure Update L t , T t , I t Does impulse exist? [sent-243, score-0.425]
24 Apply ESSF to L τ in the immediate past two years to obtain yearly season No Estimate parameters H and Q based on series up to time t, using Kalman filtering and the EM algorithm Initialize Kalman filter at τ=1 Yes Adjust I t , I t−1, . [sent-244, score-1.019]
25 Forecast future at t 2D+1−−> t Apply HW recursion to obtain L and I t t Compute global linear trend using level subtracted by yearly season Forecast at time t Yes t+1−−> t Forecast future at t based on Kalman filtering result Does t enter a new year? [sent-248, score-0.993]
26 This is equivalent to discarding the season computed during the impulse segment and using the most recent season right before the impulse. [sent-255, score-1.211]
27 Let Lt = 1 Lt−s1 + 1 (xt − It ), where Lt−s1 is the level before the impulse and It is the already 2 2 revised season at t. [sent-257, score-0.686]
28 1 The Level with Season Model Denote the level at t by µt and the season with period d by it . [sent-293, score-0.552]
29 In its simplest form, with both the linear growth and season term removed, HW reduces to exponential smoothing with recursion Lt = ζxt + (1 − ζ)Lt−1 . [sent-300, score-0.661]
30 When season is added, there is no complete match between the recursion of HW and that of the Kalman filter. [sent-306, score-0.575]
31 In the LS model, it is assumed that ∑d it+τ = 0 τ=1 up to white noise, but HW does not enforce the zero sum of one period of the season terms. [sent-307, score-0.552]
32 The decomposition of xt into level µt and season it by LS is however similar to that assumed by HW. [sent-308, score-0.655]
33 Each time series demonstrates apparently a global trend and yearly seasonality. [sent-408, score-0.503]
34 Assume the period of the long-range season is a year. [sent-412, score-0.552]
35 Because the Internet is highly dynamic, it is necessary to derive the yearly season using past data over recent periods and usually only a few (e. [sent-413, score-0.874]
36 A mechanism to control the smoothness of the long-range season is needed. [sent-417, score-0.552]
37 By enforcing smoothness, the extracted season tends to be more robust, a valuable feature especially when given limited past data. [sent-418, score-0.56]
38 The magnitude of the yearly season may vary across the years. [sent-420, score-0.839]
39 The yearly season thus should be allowed to scale. [sent-422, score-0.839]
40 Furthermore, HW requires a relatively large number of periods to settle to the intended season; and importantly, HW assumes a fixed season over the years. [sent-427, score-0.525]
41 Although HW is capable of adjusting with a slowly changing season when given enough periods of data, it does not directly treat the scaling of the season, and hence is vulnerable to the scaling phenomenon. [sent-428, score-0.525]
42 We inject elasticity into the yearly season and allow it to scale from a certain yearly pattern. [sent-430, score-1.153]
43 1 Elastic Smooth Season Fitting Before extracting long-range trend and season, we apply HW with impulse detection to obtain the weekly season and the smoothed level series Lt , t = 1, . [sent-434, score-0.975]
44 We want to exploit the global trend and yearly season existing in the level series to better predict Lt+h based on {L1 , L2 , . [sent-439, score-1.028]
45 , n, into a yearly season, yt , a global linear trend ut , and a volatility part nt : Lt = ut + yt + nt , t = 1, 2, . [sent-446, score-0.556]
46 Thus the original series xt is decomposed into: xt = ut + yt + nt + It + Nt , (9) where It and Nt are the season and noise terms from HW. [sent-450, score-0.986]
47 Here zt is the yearly season combined with noise, taking out the current estimate of the global trend. [sent-485, score-0.886]
48 We now present the ESSF algorithm for computing the yearly season based on the trend removed z. [sent-497, score-0.925]
49 Denote the residue zt = Lt − ut by zk, j , the yearly season yt by yk, j (we abuse the notation here and assume the meaning is clear from the context), and the noise term n t by nk, j . [sent-499, score-0.928]
50 We call the yearly season pattern y = {y1 , y2 , . [sent-502, score-0.839]
51 Since we allow the yearly season yk, j to scale over time, it relates to the season template by yk, j = αk, j y j , k = 1, 2, . [sent-506, score-1.364]
52 We optimize over both the season template y j , j = 1, . [sent-531, score-0.525]
53 Since y is one period of the yearly season, when j is out of the range [1, D], y j is understood as y j mod D . [sent-549, score-0.341]
54 Experiments show that allowing scalable yearly season improves prediction accuracy, so does the smoothness regularization of the yearly season. [sent-556, score-1.253]
55 A more ad-hoc approach to enforce smoothness is to apply moving average to the yearly season extracted without smoothness regularization. [sent-559, score-0.911]
56 To predict xt , the weekly season extracted by HW should be added to the level Lt . [sent-575, score-0.737]
57 We assume that prediction starts on the 3rd year since the first two years have to serve as past data for computing the yearly season. [sent-577, score-0.466]
58 Apply HW to obtain the weekly season It , and the level Lt , t = 1, 2, . [sent-579, score-0.607]
59 , take the series of Lt ’s in the past two years (year k − 2 and k − 1) and apply ESSF to this series to solve the yearly season template y and the scaling factors, c1 and c2 for year k − 2 and k − 1 respectively. [sent-587, score-1.15]
60 Predict the yearly season for future years k ≥ k by c2 y. [sent-588, score-0.863]
61 Denote the predicted yearly season at time t in any year k ≥ k by Yt,k , where the second subscript clarifies that only the series before year k is used by ESSF. [sent-589, score-1.071]
62 Let the yearly season removed level be Lt = Lt − ˜ t−2D+1 , . [sent-592, score-0.839]
63 5 Weekly season Yearly season Long−range growth 2. [sent-602, score-1.103]
64 (14) and (9), we see that Lt + hTt is essentially the prediction for the global linear trend term ut+h , Yt+h,ν(t) the prediction for the yearly season yt+h , and It+h−r(t+h)·d the prediction for the weekly season It+h . [sent-617, score-1.673]
65 If day t +h is in the same year as t, Yt+h,ν(t) = Yt+h,ν(t+h) is the freshest possible prediction for the yearly season at t + h. [sent-619, score-0.992]
66 If instead ν(t) < ν(t + h), the yearly season at t + h is predicted based on data more than one year ago. [sent-620, score-0.922]
67 One might have noticed that we use only two years of data to extract the yearly season regardless of the available amount of past data. [sent-621, score-0.898]
68 Figure 3(a) shows that during these two months, the predicted yearly season is much more prominent than the weekly season and the slight linear growth. [sent-626, score-1.483]
69 The series predicted by HW is weekly periodic with a flat level, while that by ESSF incorporates the yearly seasonal variation and is much closer to the original series, as one might have expected. [sent-628, score-0.603]
70 t −1 We can consider xt , the mean up to time t −1, as the simplest prediction of xt using past data. [sent-674, score-0.342]
71 ˜ t=t0 +h t=t0 +h For the series Auton, Wang, and citeseer, t0 = 4d, where d is the season period. [sent-721, score-0.628]
72 Because the series are not long enough for extracting long-range trend and season by the ESSF algorithm, we only test the HW procedure with or without impulse detection and the GLS approach. [sent-731, score-0.893]
73 The smoothing parameters for the level and the season terms in Eq. [sent-742, score-0.558]
74 To assess the importance of weekly (or daily) seasonality for forecasting, we compare HW and and its reduced form without the season term. [sent-747, score-0.659]
75 Similarly as the linear growth term, the season term can be deleted by initializing it to zero and setting its corresponding smoothing parameter δ in Eq. [sent-748, score-0.611]
76 The reduced HW procedure without the local linear growth and season terms is essentially Exponential Smoothing (ES) (Chatfield, 2004). [sent-750, score-0.578]
77 5 1 Wang 1 Citeseer (b) ad j Figure 4: Compare the prediction performance in terms of R e and Re on the three series Auton, Wang, and citeseer using different methods: HW with or without impulse detection, ES without season, MA, and MP. [sent-772, score-0.406]
78 The predicted series by HW with impulse detection returns close to the original series shortly after the impulse, while that without ripples with large errors over several periods afterward. [sent-785, score-0.422]
79 The fluctuation of the predicted series obtained 2234 F ORECASTING W EB PAGE V IEWS 4 8 x 10 Original series HW HW impulse detection Number of daily page views 7 6 5 4 3 2 1 0 400 410 420 430 440 Day 450 460 470 (a) 5 4 x 10 Number of weekly page views 3. [sent-798, score-0.841]
80 5 0 50 100 150 200 250 Starting day 300 350 400 (b) Figure 5: Compare predicted series for Auton: (a) Results obtained by HW with and without impulse detection. [sent-802, score-0.361]
81 Figure 7(c) and (d) show the yearly season templates extracted by ESSF from year 2004 and 2005 with smoothing parameter λ = 0, 1000 respectively. [sent-945, score-0.918]
82 As expected, at λ = 1000, the yearly seasons are much smoother than those obtained at λ = 0, especially for the series Renoir and greenhouse effect. [sent-946, score-0.516]
83 6 0 2004 Jan 2005 Jan 2006 Jan 2007 Jan 2 Renoir 1 0 2004 Jan 2005 Jan 2006 Jan 2007 Jan greenhouse effect 2 Scaling factor for yearly season amazon 1. [sent-957, score-0.993]
84 (d) Error rates obtained for the three series by ESSF, with λ = 1000, assuming a scalable yearly season versus fixed season. [sent-986, score-0.994]
85 This demonstrates the advantage of imposing smoothness on the yearly season. [sent-993, score-0.341]
86 First, recall that the fitting of the yearly season and the long-range trend is repeated multiple times, as described in Section 4. [sent-996, score-0.925]
87 To study the effect of the number of iterations, we plot in Figure 9(a) the ratio of change in the yearly season after each iteration, as given by Eq. [sent-998, score-0.839]
88 In ESSF, the yearly season is not assumed simply as a periodic series. [sent-1010, score-0.857]
89 Instead, it can scale differently over the years based on the season template. [sent-1011, score-0.549]
90 To evaluate the gain, we compare ESSF with scalable yearly seasons versus fixed seasons. [sent-1012, score-0.362]
91 Here, the fixed season can be thought of as a special case of the scalable season with all the scaling parameters set to 1, or equivalently, the yearly season is the plain repeat of the season template. [sent-1013, score-2.44]
92 Better performance is achieved by allowing scalable yearly seasons for all the three series. [sent-1024, score-0.362]
93 In addition to the variation rate, the yearly correlation of the time series also indicates the potential for accurate prediction. [sent-1073, score-0.44]
94 A series with high yearly correlation tends to benefit more in prediction from the yearly season extraction of ESSF. [sent-1078, score-1.303]
95 In contrast, for NBA (ID 16) and democracy (ID 9), which have high yearly correlation, ESSF achieves substantially better prediction accuracy than HW and MA at all the values of h. [sent-1081, score-0.361]
96 To compare the computational load of the prediction algorithms, we acquire the average user running time over the twenty g-trends series for one day ahead (h = 1) prediction at all the days in the last two years, 2006 and 2007. [sent-1082, score-0.382]
97 Discussion and Conclusions We have so far focused on extracting the trend and season parts of a time series using either HW or GLS, and have not considered predicting the noise part, as given in Eq. [sent-1091, score-0.734]
98 Specifically, we compute the level Lt and the season It by HW and let the noise Nt = xt − Lt − It . [sent-1095, score-0.675]
99 We developed the ESSF algorithm to extract global trend and scalable long-range season with smoothness regularization. [sent-1129, score-0.664]
100 Let the one-step forecast error of xt given Xt−1 be vt and the variance of vt be Ft : vt = xt − E(xt | Xt−1 ) = xt − Zt at , Ft = Var(vt ) = Zt Pt Zt + Ht . [sent-1198, score-0.52]
wordName wordTfidf (topN-words)
[('season', 0.525), ('hw', 0.394), ('essf', 0.362), ('yearly', 0.314), ('lt', 0.222), ('impulse', 0.161), ('gls', 0.146), ('xt', 0.13), ('auton', 0.112), ('ssm', 0.108), ('series', 0.103), ('page', 0.096), ('trend', 0.086), ('weekly', 0.082), ('amazon', 0.077), ('greenhouse', 0.077), ('citeseer', 0.077), ('oore', 0.073), ('renoir', 0.073), ('tt', 0.072), ('iews', 0.069), ('orecasting', 0.069), ('jan', 0.069), ('traf', 0.066), ('kalman', 0.065), ('web', 0.065), ('impulses', 0.065), ('forecasting', 0.062), ('views', 0.06), ('day', 0.06), ('ahead', 0.059), ('nt', 0.056), ('growth', 0.053), ('seasonality', 0.052), ('recursion', 0.05), ('prediction', 0.047), ('zt', 0.047), ('year', 0.046), ('eb', 0.045), ('forecast', 0.043), ('pt', 0.04), ('wang', 0.038), ('days', 0.038), ('predicted', 0.037), ('qt', 0.036), ('past', 0.035), ('xn', 0.034), ('smoothing', 0.033), ('id', 0.033), ('ht', 0.033), ('internet', 0.033), ('var', 0.032), ('zk', 0.031), ('window', 0.031), ('ft', 0.03), ('dec', 0.029), ('nov', 0.029), ('rss', 0.029), ('vt', 0.029), ('twenty', 0.028), ('smoothness', 0.027), ('kt', 0.027), ('period', 0.027), ('units', 0.027), ('rates', 0.026), ('brockwell', 0.026), ('impulsive', 0.026), ('seasonal', 0.026), ('sss', 0.026), ('scalable', 0.026), ('daily', 0.025), ('distant', 0.024), ('ay', 0.024), ('years', 0.024), ('ck', 0.024), ('variation', 0.023), ('apr', 0.022), ('jun', 0.022), ('aug', 0.022), ('feb', 0.022), ('ite', 0.022), ('jul', 0.022), ('oct', 0.022), ('yt', 0.022), ('chat', 0.022), ('seasons', 0.022), ('trends', 0.021), ('lter', 0.021), ('noise', 0.02), ('ls', 0.02), ('re', 0.019), ('ma', 0.019), ('ad', 0.018), ('sites', 0.018), ('elastic', 0.018), ('periodic', 0.018), ('detection', 0.018), ('filtering', 0.018), ('moving', 0.018), ('mp', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 37 jmlr-2008-Forecasting Web Page Views: Methods and Observations
Author: Jia Li, Andrew W. Moore
Abstract: Web sites must forecast Web page views in order to plan computer resource allocation and estimate upcoming revenue and advertising growth. In this paper, we focus on extracting trends and seasonal patterns from page view series, two dominant factors in the variation of such series. We investigate the Holt-Winters procedure and a state space model for making relatively short-term prediction. It is found that Web page views exhibit strong impulsive changes occasionally. The impulses cause large prediction errors long after their occurrences. A method is developed to identify impulses and to alleviate their damage on prediction. We also develop a long-range trend and season extraction method, namely the Elastic Smooth Season Fitting (ESSF) algorithm, to compute scalable and smooth yearly seasons. ESSF derives the yearly season by minimizing the residual sum of squares under smoothness regularization, a quadratic optimization problem. It is shown that for longterm prediction, ESSF improves accuracy significantly over other methods that ignore the yearly seasonality. Keywords: web page views, forecast, Holt-Winters, Kalman filtering, elastic smooth season fitting
2 0.063749455 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
3 0.046913434 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments
Author: Balázs Csanád Csáji, László Monostori
Abstract: The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (ε, δ)-MDPs is introduced, which is a generalization of MDPs and ε-MDPs. In this model the environment may change over time, more precisely, the transition function and the cost function may vary from time to time, but the changes must be bounded in the limit. Then, learning algorithms in changing environments are analyzed. A general relaxed convergence theorem for stochastic iterative algorithms is presented. We also demonstrate the results through three classical RL methods: asynchronous value iteration, Q-learning and temporal difference learning. Finally, some numerical experiments concerning changing environments are presented. Keywords: Markov decision processes, reinforcement learning, changing environments, (ε, δ)MDPs, value function bounds, stochastic iterative algorithms
4 0.039932631 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
Author: Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen
Abstract: Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies—attempting to do data record detection and attribute labeling in two separate phases. In this paper, we propose an integrated web data extraction paradigm with hierarchical models. The proposed model is called Dynamic Hierarchical Markov Random Fields (DHMRFs). DHMRFs take structural uncertainty into consideration and define a joint distribution of both model structure and class labels. The joint distribution is an exponential family distribution. As a conditional model, DHMRFs relax the independence assumption as made in directed models. Since exact inference is intractable, a variational method is developed to learn the model’s parameters and to find the MAP model structure and label assignments. We apply DHMRFs to a real-world web data extraction task. Experimental results show that: (1) integrated web data extraction models can achieve significant improvements on both record detection and attribute labeling compared to decoupled models; (2) in diverse web data extraction DHMRFs can potentially address the blocky artifact issue which is suffered by fixed-structured hierarchical models. Keywords: conditional random fields, dynamic hierarchical Markov random fields, integrated web data extraction, statistical hierarchical modeling, blocky artifact issue
5 0.038608506 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers
Author: Vojtěch Franc, Bogdan Savchynskyy
Abstract: The max-sum classifier predicts n-tuple of labels from n-tuple of observable variables by maximizing a sum of quality functions defined over neighbouring pairs of labels and observable variables. Predicting labels as MAP assignments of a Random Markov Field is a particular example of the max-sum classifier. Learning parameters of the max-sum classifier is a challenging problem because even computing the response of such classifier is NP-complete in general. Estimating parameters using the Maximum Likelihood approach is feasible only for a subclass of max-sum classifiers with an acyclic structure of neighbouring pairs. Recently, the discriminative methods represented by the perceptron and the Support Vector Machines, originally designed for binary linear classifiers, have been extended for learning some subclasses of the max-sum classifier. Besides the max-sum classifiers with the acyclic neighbouring structure, it has been shown that the discriminative learning is possible even with arbitrary neighbouring structure provided the quality functions fulfill some additional constraints. In this article, we extend the discriminative approach to other three classes of max-sum classifiers with an arbitrary neighbourhood structure. We derive learning algorithms for two subclasses of max-sum classifiers whose response can be computed in polynomial time: (i) the max-sum classifiers with supermodular quality functions and (ii) the max-sum classifiers whose response can be computed exactly by a linear programming relaxation. Moreover, we show that the learning problem can be approximately solved even for a general max-sum classifier. Keywords: max-xum classifier, hidden Markov networks, support vector machines
6 0.033500873 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
8 0.030747173 75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis
9 0.02938268 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
10 0.024295978 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
11 0.022442974 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
12 0.020313954 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
13 0.019331908 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
14 0.018034508 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
15 0.018009448 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting
16 0.017466363 7 jmlr-2008-A Tutorial on Conformal Prediction
17 0.016962551 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
18 0.016770128 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
19 0.015960954 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
20 0.015687544 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
topicId topicWeight
[(0, 0.092), (1, -0.012), (2, -0.009), (3, -0.001), (4, -0.077), (5, -0.038), (6, -0.014), (7, 0.011), (8, -0.101), (9, -0.103), (10, -0.073), (11, -0.03), (12, -0.038), (13, -0.151), (14, -0.015), (15, -0.023), (16, -0.061), (17, -0.026), (18, 0.07), (19, -0.027), (20, -0.065), (21, -0.041), (22, 0.013), (23, -0.033), (24, 0.11), (25, 0.126), (26, 0.18), (27, -0.073), (28, -0.199), (29, 0.144), (30, 0.075), (31, 0.127), (32, 0.224), (33, 0.025), (34, -0.228), (35, 0.215), (36, -0.043), (37, 0.288), (38, 0.304), (39, 0.216), (40, -0.392), (41, -0.218), (42, -0.057), (43, 0.053), (44, 0.068), (45, -0.005), (46, 0.028), (47, -0.052), (48, 0.189), (49, -0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.97737575 37 jmlr-2008-Forecasting Web Page Views: Methods and Observations
Author: Jia Li, Andrew W. Moore
Abstract: Web sites must forecast Web page views in order to plan computer resource allocation and estimate upcoming revenue and advertising growth. In this paper, we focus on extracting trends and seasonal patterns from page view series, two dominant factors in the variation of such series. We investigate the Holt-Winters procedure and a state space model for making relatively short-term prediction. It is found that Web page views exhibit strong impulsive changes occasionally. The impulses cause large prediction errors long after their occurrences. A method is developed to identify impulses and to alleviate their damage on prediction. We also develop a long-range trend and season extraction method, namely the Elastic Smooth Season Fitting (ESSF) algorithm, to compute scalable and smooth yearly seasons. ESSF derives the yearly season by minimizing the residual sum of squares under smoothness regularization, a quadratic optimization problem. It is shown that for longterm prediction, ESSF improves accuracy significantly over other methods that ignore the yearly seasonality. Keywords: web page views, forecast, Holt-Winters, Kalman filtering, elastic smooth season fitting
2 0.33785769 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
3 0.21748385 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
Author: Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen
Abstract: Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies—attempting to do data record detection and attribute labeling in two separate phases. In this paper, we propose an integrated web data extraction paradigm with hierarchical models. The proposed model is called Dynamic Hierarchical Markov Random Fields (DHMRFs). DHMRFs take structural uncertainty into consideration and define a joint distribution of both model structure and class labels. The joint distribution is an exponential family distribution. As a conditional model, DHMRFs relax the independence assumption as made in directed models. Since exact inference is intractable, a variational method is developed to learn the model’s parameters and to find the MAP model structure and label assignments. We apply DHMRFs to a real-world web data extraction task. Experimental results show that: (1) integrated web data extraction models can achieve significant improvements on both record detection and attribute labeling compared to decoupled models; (2) in diverse web data extraction DHMRFs can potentially address the blocky artifact issue which is suffered by fixed-structured hierarchical models. Keywords: conditional random fields, dynamic hierarchical Markov random fields, integrated web data extraction, statistical hierarchical modeling, blocky artifact issue
4 0.14765705 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments
Author: Balázs Csanád Csáji, László Monostori
Abstract: The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (ε, δ)-MDPs is introduced, which is a generalization of MDPs and ε-MDPs. In this model the environment may change over time, more precisely, the transition function and the cost function may vary from time to time, but the changes must be bounded in the limit. Then, learning algorithms in changing environments are analyzed. A general relaxed convergence theorem for stochastic iterative algorithms is presented. We also demonstrate the results through three classical RL methods: asynchronous value iteration, Q-learning and temporal difference learning. Finally, some numerical experiments concerning changing environments are presented. Keywords: Markov decision processes, reinforcement learning, changing environments, (ε, δ)MDPs, value function bounds, stochastic iterative algorithms
5 0.11918345 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
Author: Eric Bax, Augusto Callejas
Abstract: This paper introduces a new PAC transductive error bound for classification. The method uses information from the training examples and inputs of working examples to develop a set of likely assignments to outputs of the working examples. A likely assignment with maximum error determines the bound. The method is very effective for small data sets. Keywords: error bound, transduction, nearest neighbor, dynamic programming
6 0.11494651 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
7 0.11401736 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
8 0.11024244 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)
9 0.10912891 7 jmlr-2008-A Tutorial on Conformal Prediction
10 0.10615689 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
11 0.099292941 75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis
12 0.090484165 30 jmlr-2008-Discriminative Learning of Max-Sum Classifiers
13 0.088084288 78 jmlr-2008-Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
14 0.070900925 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses
15 0.067047782 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
16 0.065500759 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
17 0.063393563 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
18 0.062122341 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
19 0.061454605 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
20 0.057610877 92 jmlr-2008-Universal Multi-Task Kernels
topicId topicWeight
[(0, 0.022), (5, 0.019), (9, 0.523), (31, 0.023), (40, 0.033), (54, 0.045), (58, 0.037), (66, 0.033), (76, 0.014), (88, 0.036), (92, 0.033), (94, 0.042), (99, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.89196348 37 jmlr-2008-Forecasting Web Page Views: Methods and Observations
Author: Jia Li, Andrew W. Moore
Abstract: Web sites must forecast Web page views in order to plan computer resource allocation and estimate upcoming revenue and advertising growth. In this paper, we focus on extracting trends and seasonal patterns from page view series, two dominant factors in the variation of such series. We investigate the Holt-Winters procedure and a state space model for making relatively short-term prediction. It is found that Web page views exhibit strong impulsive changes occasionally. The impulses cause large prediction errors long after their occurrences. A method is developed to identify impulses and to alleviate their damage on prediction. We also develop a long-range trend and season extraction method, namely the Elastic Smooth Season Fitting (ESSF) algorithm, to compute scalable and smooth yearly seasons. ESSF derives the yearly season by minimizing the residual sum of squares under smoothness regularization, a quadratic optimization problem. It is shown that for longterm prediction, ESSF improves accuracy significantly over other methods that ignore the yearly seasonality. Keywords: web page views, forecast, Holt-Winters, Kalman filtering, elastic smooth season fitting
2 0.73002261 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes
Author: David C. Hoyle
Abstract: Bayesian inference from high-dimensional data involves the integration over a large number of model parameters. Accurate evaluation of such high-dimensional integrals raises a unique set of issues. These issues are illustrated using the exemplar of model selection for principal component analysis (PCA). A Bayesian model selection criterion, based on a Laplace approximation to the model evidence for determining the number of signal principal components present in a data set, has previously been show to perform well on various test data sets. Using simulated data we show that for d-dimensional data and small sample sizes, N, the accuracy of this model selection method is strongly affected by increasing values of d. By taking proper account of the contribution to the evidence from the large number of model parameters we show that model selection accuracy is substantially improved. The accuracy of the improved model evidence is studied in the asymptotic limit d → ∞ at fixed ratio α = N/d, with α < 1. In this limit, model selection based upon the improved model evidence agrees with a frequentist hypothesis testing approach. Keywords: PCA, Bayesian model selection, random matrix theory, high dimensional inference
3 0.23160392 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
4 0.22417101 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
Author: Hannes Nickisch, Carl Edward Rasmussen
Abstract: We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches. Keywords: Gaussian process priors, probabilistic classification, Laplaces’s approximation, expectation propagation, variational bounding, mean field methods, marginal likelihood evidence, MCMC
5 0.20101099 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming
Author: Ashwin Srinivasan, Ross D. King
Abstract: The use of computational models is increasingly expected to play an important role in predicting the behaviour of biological systems. Models are being sought at different scales of biological organisation namely: sub-cellular, cellular, tissue, organ, organism and ecosystem; with a view of identifying how different components are connected together, how they are controlled and how they behave when functioning as a system. Except for very simple biological processes, system identification from first principles can be extremely difficult. This has brought into focus automated techniques for constructing models using data of system behaviour. Such techniques face three principal issues: (1) The model representation language must be rich enough to capture system behaviour; (2) The system identification technique must be powerful enough to identify substantially complex models; and (3) There may not be sufficient data to obtain both the model’s structure and precise estimates of all of its parameters. In this paper, we address these issues in the following ways: (1) Models are represented in an expressive subset of first-order logic. Specifically, they are expressed as logic programs; (2) System identification is done using techniques developed in Inductive Logic Programming (ILP). This allows the identification of first-order logic models from data. Specifically, we employ an incremental approach in which increasingly complex models are constructed from simpler ones using snapshots of system behaviour; and (3) We restrict ourselves to “qualitative” models. These are non-parametric: thus, usually less data are required than for identifying parametric quantitative models. A further advantage is that the data need not be precise numerical observations (instead, they are abstractions like positive, negative, zero, increasing, decreasing and so on). We describe incremental construction of qualitative models using a simple physical system and demonstrate its application to identificatio
6 0.19503745 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
7 0.1936301 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses
8 0.18615422 60 jmlr-2008-Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis
9 0.1852476 90 jmlr-2008-Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective
10 0.18408287 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
11 0.18183646 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
12 0.18031143 86 jmlr-2008-SimpleMKL
13 0.17934482 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
14 0.17899604 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
15 0.17725968 56 jmlr-2008-Magic Moments for Structured Output Prediction
16 0.17546763 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)
17 0.17512277 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
18 0.17469731 58 jmlr-2008-Max-margin Classification of Data with Absent Features
19 0.17448875 83 jmlr-2008-Robust Submodular Observation Selection
20 0.17227301 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning