nips nips2013 nips2013-178 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniele Durante, Bruno Scarpa, David Dunson
Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application. 1 1.1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. [sent-7, score-0.346]
2 In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. [sent-8, score-0.21]
3 If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. [sent-9, score-0.26]
4 We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. [sent-11, score-0.468]
5 This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. [sent-12, score-0.183]
6 It is typical in many domains to cycle irregularly between periods of rapid and slow change; most statistical models are insufficiently flexible to capture such locally varying smoothness in assuming a single bandwidth parameter. [sent-19, score-0.323]
7 Inappropriately restricting the smoothness to be constant can have a major impact on the quality of inferences and predictions, with over-smoothing occurring during times of rapid change. [sent-20, score-0.164]
8 There is a rich literature on modeling a p × 1 time-varying mean vector µt , covering multivariate generalizations of autoregressive models (VAR, e. [sent-22, score-0.161]
9 [1]), Kalman filtering [2], nonparametric mean regression via Gaussian processes (GP) [3], polynomial spline [4], smoothing spline [5] and Kernel smoothing methods [6]. [sent-24, score-0.451]
10 Such approaches perform well for slowly-changing trajectories with 1 constant bandwidth parameters regulating implicitly or explicitly global smoothness; however, our interest is allowing smoothness to vary locally in continuous time. [sent-25, score-0.268]
11 Other flexible approaches include wavelet shrinkage [10], local polynomial fitting via variable bandwidth [11] and linear combination of kernels with variable bandwidths [12]. [sent-27, score-0.093]
12 Once µt has been estimated, the focus shifts to the p × p time-varying covariance matrix Σt . [sent-28, score-0.074]
13 Multivariate generalizations of GARCH models (DVEC [13], BEKK [14], DCC-GARCH [15]), exponential smoothing (EWMA, e. [sent-30, score-0.07]
14 [1]) and approaches based on dimensionality reduction through a latent factor formulation (PC-GARCH [16] and O-GARCH [17]-[18]) represent common approaches in multivariate stochastic volatility modeling. [sent-32, score-0.197]
15 Bayesian dynamic factor models for multivariate stochastic volatility [19] lead to apparently improved performance in portfolio allocation by allowing the dependence in the covariance matrices Σt and Σt+∆ to vary as a function of both t and ∆. [sent-36, score-0.265]
16 Nonparametric mean regression for µ(t) is also considered via GP priors; however, the trajectories of means and covariances inherit the smooth behavior of the underlying Gaussian processes, limiting the flexibility of the approach in times exhibiting sharp changes. [sent-41, score-0.272]
17 More specifically given a set of p × 1 vector of observations yi ∼ Np (µ(ti ), Σ(ti )) where i = 1, . [sent-43, score-0.086]
18 , T indexes time, they define cov(yi |ti = t) = Σ(t) = Θξ(t)ξ(t)T ΘT + Σ0 , t∈T ⊂ + , (1) where Θ is a p × L matrix of coefficients, ξ(t) is a time-varying L × K matrix with unknown continuous dictionary functions entries ξlk : T → , and finally Σ0 is a positive definite diagonal matrix. [sent-46, score-0.099]
19 Model (1) can be induced by marginalizing out the latent factors ηi in yi = Θξ(ti )ηi + i , (2) with ηi ∼ NK (0, IK ) and i ∼ Np (0, Σ0 ). [sent-47, score-0.126]
20 A generalization includes a nonparametric mean regression by assuming ηi = ψ(ti ) + νi , where νi ∼ NK (0, IK ) and ψ(t) is a K × 1 matrix with unknown continuous entries ψk : T → that can be modeled in a related manner to the dictionary elements in ξ(t). [sent-48, score-0.238]
21 The induced mean of yi conditionally on ti = t, and marginalizing out νi is then E(yi |ti = t) = µ(t) = Θξ(t)ψ(t). [sent-49, score-0.68]
22 2 (3) Our modeling contribution We follow the lead of [22] in using a nonparametric latent factor model as in (2), but induce fundamentally different behavior by carefully modifying the priors Πξ and Πψ for the dictionary elements 2 ξT = {ξ(t), t ∈ T }, and ψT = {ψ(t), t ∈ T } respectively. [sent-51, score-0.225]
23 Fox and Dunson [22] consider the dictionary functions ξlk and ψk , for each l = 1, . [sent-53, score-0.099]
24 Moreover the well known computational problems with usual GP regression are inherited, leading to difficulties in scaling to long series and issues in mixing of MCMC algorithms for posterior computation. [sent-61, score-0.201]
25 This formulation allows continuous time and an irregular grid of observations over t by relating the latent states at i + 1 to those at i through the distance δi between ti+1 and ti , with ti ∈ T the time observation related to the ith observation. [sent-65, score-1.11]
26 Moreover, compared to [23] our approach extends the analysis to the multivariate case and accommodates locally adaptive smoothing not only on the mean but also on the time-varying variance and covariance functions. [sent-66, score-0.377]
27 Finally, the state space formulation allows the implementation of an online updating algorithm and facilitates the definition of a simple Gibbs sampling which reduces the GP computational burden involving matrix inversions from O(T 3 ) to O(T ), with T denoting the length of the time series. [sent-67, score-0.113]
28 Specifically, considering the observations (yi , ti ) for i = 1, . [sent-70, score-0.535]
29 If the mean process needs not to be estimated, recalling the prior ηi ∼ NK ∗ (0, IK ∗ ) and model (2), the standard conjugate posterior distribution from which to sample the vector of latent −2 factors for each i given Θ, {σj }p , {yi }T and {ξ(ti )}T is Gaussian. [sent-79, score-0.212]
30 i=1 i=1 j=1 Otherwise, if we want to incorporate the mean regression, we implement a block sampling of {ψ(ti )}T and {νi }T following a similar approach used for drawing samples from i=1 i=1 the dictionary elements process. [sent-80, score-0.158]
31 For both approaches the dotted lines represent the 95% highest posterior density intervals. [sent-95, score-0.149]
32 Finally, conditioned on {yi }T , {ηi }T , {σj }p and {ξ(ti )}T , and recalling the shrinkage i=1 i=1 i=1 j=1 prior for the elements of Θ defined in [22], we update Θ, each local shrinkage hyperparameter φjl and the global shrinkage hyperparameters τl via standard conjugate analysis. [sent-97, score-0.12]
33 The problem of online updating represents a key point in multivariate time series with high frequency data. [sent-98, score-0.265]
34 Referring to our formulation, we are interested in updating an approximated posterior distribution for Σ(tT +h ) and µ(tT +h ) with h = 1, . [sent-99, score-0.179]
35 , H once a new vector of observations {yi }T +H is i=T +1 available, instead of rerunning posterior computation for the whole time series. [sent-102, score-0.113]
36 To initialize the algorithm at T + 1 we propose to run the online updating for {yi }T +H , with k small, and i=T −k choosing a diffuse but proper prior for the initial states at T −k. [sent-104, score-0.113]
37 Such approach is suggested to reduce the problem related to the larger conditional variances (see, e. [sent-105, score-0.082]
38 The first dataset consists in 5-dimensional observations yi for each ti ∈ To = {1, 2, . [sent-113, score-0.621]
39 To allow sharp changes of the covariances and means in the generating mechanism, we consider a 2 × 2 (i. [sent-117, score-0.15]
40 L = K = 2) matrix {ξ(ti )}100 i=1 of time-varying functions adapted from Donoho and Johnstone [26] with locally-varying smoothness (more specifically we choose ‘bumps’ functions also to mimic possible behavior in practical settings). [sent-119, score-0.111]
41 The second set of simulated data is the same dataset of 10-dimensional observations yi 4 Table 1: Summaries of the standardized squared errors. [sent-120, score-0.127]
42 95 max covariance Σ(ti ) Constant smoothness mean q0. [sent-123, score-0.244]
43 95 max covariance Σ(ti ) EWMA PC-GARCH GO-GARCH DCC-GARCH BCR LBCR 1. [sent-125, score-0.074]
44 026 investigated in Fox and Dunson [22], with smooth GP dictionary functions for each element of the 5 × 4 (i. [sent-197, score-0.099]
45 05 for both approaches and 95th quantiles equal to 1. [sent-210, score-0.063]
46 As regards the other approaches, EWMA has been implemented by choosing the smoothing parameter λ that minimizes the mean squared error (MSE) between the estimated covariances and the true values. [sent-213, score-0.194]
47 PC-GARCH algorithm follows the steps provided by Burns [16] with GARCH(1,1) assumed for the conditional volatilities of each single time series and the principal components. [sent-214, score-0.144]
48 GO-GARCH and DCC-GARCH recall the formulations provided by van der Weide [18] and Engle [15] respectively, assuming a GARCH(1,1) for the conditional variances of the processes analyzed, which proves to be a correct choice in many financial applications and also in our setting. [sent-215, score-0.118]
49 Therefore in these cases we first model the conˆ i=1 ditional mean via smoothing spline and in a second step we estimate the models for the innovations. [sent-217, score-0.197]
50 The smoothing parameter for spline estimation has been set to 0. [sent-218, score-0.138]
51 Figure 1 compares, in both simulated samples, true i=1 and posterior mean of µ(t) and Σ(t) over the predictor space To together with the point-wise 95% highest posterior density (hpd) intervals for LBCR and BCR. [sent-220, score-0.329]
52 From the upper plots we can clearly note that our approach is able to capture conditional heteroscedasticity as well as mean patterns, also in correspondence of sharp changes in the time-varying true functions. [sent-221, score-0.216]
53 However, even in the most problematic cases, the true values are within the bands of the 95% hpd intervals. [sent-223, score-0.105]
54 Much more problematic is the behavior of the posterior distributions for BCR which badly over-smooth 5 0. [sent-224, score-0.151]
55 006 USA NASDAQ 2004-07-19 2007-09-21 2010-11-23 2004-07-19 2007-09-21 2010-11-23 Figure 2: For 2 NSI posterior mean (black) and 95% hpd (dotted red) for the variances {Σjj (ti )}415 . [sent-232, score-0.284]
56 i=1 both covariance and mean functions leading also to many 95% hpd intervals not containing the true values. [sent-233, score-0.244]
57 Bottom plots in Figure 1 show that the performance of our approach is very close to that of BCR, when data are simulated from a model where the covariances and means evolve smoothly across time and local adaptivity is not required. [sent-234, score-0.12]
58 Table 1 shows that, when local adaptivity is required, LBCR provides a superior performance having standardized residuals lower than those of the other approaches. [sent-237, score-0.096]
59 EWMA seems to provide quite accurate estimates, however it is important to underline that we choose the optimal smoothing parameter λ in order to minimize the MSE between estimated and true parameters, which are clearly not known in practical applications. [sent-238, score-0.07]
60 The closeness of LBCR and BCR in the constant smoothness dataset confirms the flexibility of LBCR and highlights the better performance of the two approaches with respect to the other competitors also when smooth processes are investigated. [sent-240, score-0.147]
61 In this application we focus our attention on the multivariate weekly time series of the main 33 (i. [sent-242, score-0.185]
62 We consider the heteroscedastic model for the log returns yi ∼ N33 (µ(ti ), Σ(ti )) for i = 1, . [sent-247, score-0.125]
63 Posterior computation is performed by using the same settings of the first simulation study and fixing K ∗ = 4 and L∗ = 5 (which we found to be sufficiently large from the fact that the posterior samples of the last few columns of Θ assumed values close to 0). [sent-254, score-0.17]
64 Missing values in our dataset do not represent a limitation since the Bayesian approach allows us to update our posterior considering solely the observed data. [sent-255, score-0.113]
65 Similar conclusions hold for the posterior distributions of the trajectories of the means, with rapid changes detected in correspondence of the world financial crisis in 2008. [sent-261, score-0.439]
66 0 A 2004-07-19 2006-04-10 2007-12-31 2009-09-21 2011-06-13 2004-07-19 2006-04-10 2007-12-31 2009-09-21 2011-06-13 Figure 3: Black line: For USA NASDAQ median of correlations with the other 32 NSI based on posterior mean of {Σ(ti )}415 . [sent-271, score-0.228]
67 Red lines: 25%, 75% (dotted lines) and 50% (solid line) quantiles i=1 of correlations between USA NASDAQ and European countries (without considering Greece and Russia). [sent-272, score-0.174]
68 Green lines: 25%, 75% (dotted lines) and 50% (solid line) quantiles of correlations between USA NASDAQ and the countries of Southeast Asia (Asian Tigers and India). [sent-273, score-0.174]
69 The flexibility of the proposed approach and the possibility of accommodating varying smoothness in the trajectories over time, allow us to obtain a good characterization of the dynamic dependence structure according with the major theories on financial crisis. [sent-277, score-0.177]
70 Left plot in Figure 3 shows how the change of regime in correlations occurs exactly in correspondence to the burst of the U. [sent-278, score-0.091]
71 Moreover we can immediately notice that the correlations among financial markets increase significantly during the crises, showing a clear international financial contagion effect in agreement with other theories on financial crises. [sent-281, score-0.095]
72 As expected the persistence of high levels of correlation is evident during the global financial crisis between late-2008 and end2009 (C), at the beginning of which our approach also captures a dramatic change in the correlations between the U. [sent-282, score-0.254]
73 Further rapid changes are identified in correspondence of Greek crisis (D), the worsening of European sovereigndebt crisis and the rejection of the U. [sent-285, score-0.46]
74 budget (F) and the recent crisis of credit institutions in Spain together with the growing financial instability in Eurozone (G). [sent-287, score-0.166]
75 financial reform launched by Barack Obama and EU efforts to save Greece (E), we can notice two peaks representing respectively Irish debt crisis and Portugal debt crisis. [sent-290, score-0.258]
76 BCR, as expected, tends to over-smooth the dynamic dependence structure during the financial crisis, proving to be not able to model the sharp change in the correlations between USA NASDAQ and Economic Tigers during late-2008, and the two peaks in (E) at the beginning of 2011. [sent-291, score-0.135]
77 The possibility to quickly update the estimates and the predictions as soon as new data arrive, represents a crucial aspect to obtain quantitative informations about the future scenarios of the crisis in financial markets. [sent-292, score-0.209]
78 To answer this goal, we apply the proposed online updating algorithm to the new set of weekly observations {yi }422 from 02/07/2012 to 13/08/2012 conditioning on posi=416 terior estimates of the Gibbs sampler based on observations {yi }415 available up to 25/06/2012. [sent-293, score-0.184]
79 i=1 We initialized the simulation smoother algorithm with the last 8 observations of the previous sample. [sent-294, score-0.107]
80 Plots at the top of Figure 4 show, for 3 selected National Stock Indices, the new observed log returns {yji }422 together with the mean and the 2. [sent-295, score-0.098]
81 We use standard formulas of the multivariate normal distribution based on the posterior mean of the updated {Σ(ti )}422 and {µ(ti )}422 after 5,000 Gibbs iterations i=416 i=416 with a burn in of 500. [sent-298, score-0.274]
82 We can clearly notice the good performance of our proposed online updating algorithm in obtaining a characterization for the distribution of new observations. [sent-299, score-0.113]
83 Also note that the multivariate approach together with a flexible model for the mean and covariance, allow for significant improvements when the conditional distribution of an index given the others is analyzed. [sent-300, score-0.198]
84 To obtain further informations about the predictive performance of our LBCR, we can easily use our online updating algorithm to obtain h step-ahead predictions for Σ(tT +h|T ) and µ(tT +h|T ) with h = 1, . [sent-301, score-0.189]
85 In particular, referring to Durbin and Koopman [25], we can generate posterior 7 FRANCE CAC40 0. [sent-305, score-0.113]
86 08 2012-07-02 2012-07-02 2012-07-16 2012-07-30 2012-08-13 Figure 4: Top: For 3 selected NSI, plot of the observed log returns (black) together with the mean and the 2. [sent-334, score-0.098]
87 , H merely by treating {yi }T +H as missing i=T +1 values in the proposed online updating algorithm. [sent-341, score-0.113]
88 In (a) we forecast the future log returns with the unconditional mean {˜i+1 }421 = 0, which is y i=415 what is often done in practice under the general assumption of zero mean, stationary log returns. [sent-350, score-0.098]
89 In (b) we consider yi+1|i = µ(ti+1|i ), the posterior mean of the one step ahead predictive distribution ˜ ˆ of µ(ti+1|i ), obtained from the previous proposed approach after 5,000 Gibbs iterations with a burn in of 500. [sent-351, score-0.261]
90 Moreover, the combination of our approach and the use of conditional distributions of one return given the others (c) further improves forecasts reducing also the variability of the predictive distribution. [sent-356, score-0.07]
91 We additionally obtain well calibrated predictive intervals unlike competing methods. [sent-357, score-0.077]
92 4 Discussion In this paper, we have presented a generalization of Bayesian nonparametric covariance regression to obtain a better characterization for mean and covariance temporal dynamics. [sent-358, score-0.287]
93 Maintaining simple conjugate posterior updates and tractable computations in moderately large p settings, our model increases the flexibility of previous approaches as shown in the simulation studies. [sent-359, score-0.208]
94 Beside these key advantages, the state space formulation enables development of a fast online updating algorithm useful for high frequency data. [sent-360, score-0.113]
95 The application to the problem of capturing temporal and geoeconomic structure between financial markets shows the utility of our approach in the analysis of multivariate financial time series. [sent-361, score-0.141]
96 Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. [sent-400, score-0.07]
97 Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. [sent-429, score-0.106]
98 Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. [sent-458, score-0.176]
99 Dynamic factor volatility modeling: A Bayesian latent threshold approach. [sent-478, score-0.095]
100 A simple and efficient simulation smoother for state space time series analysis. [sent-508, score-0.157]
wordName wordTfidf (topN-words)
[('ti', 0.535), ('bcr', 0.302), ('lbcr', 0.302), ('nancial', 0.179), ('crisis', 0.166), ('lk', 0.151), ('alk', 0.133), ('ewma', 0.132), ('nsi', 0.132), ('nasdaq', 0.116), ('posterior', 0.113), ('smoothness', 0.111), ('multivariate', 0.102), ('dunson', 0.101), ('dictionary', 0.099), ('ngp', 0.094), ('yi', 0.086), ('garch', 0.083), ('tt', 0.079), ('gp', 0.078), ('durbin', 0.075), ('padua', 0.075), ('covariance', 0.074), ('locally', 0.072), ('smoothing', 0.07), ('spline', 0.068), ('koopman', 0.067), ('tigers', 0.067), ('engle', 0.067), ('hpd', 0.067), ('updating', 0.066), ('covariances', 0.065), ('stock', 0.064), ('quantiles', 0.063), ('mean', 0.059), ('fox', 0.057), ('wishart', 0.057), ('simulation', 0.057), ('volatilities', 0.057), ('ahead', 0.056), ('correlations', 0.056), ('volatility', 0.055), ('adaptivity', 0.055), ('countries', 0.055), ('bandwidth', 0.053), ('rapid', 0.053), ('gibbs', 0.051), ('smoother', 0.05), ('series', 0.05), ('online', 0.047), ('economic', 0.047), ('knot', 0.046), ('debt', 0.046), ('econometrics', 0.046), ('variances', 0.045), ('exible', 0.045), ('sharp', 0.045), ('intervals', 0.044), ('priors', 0.044), ('informations', 0.043), ('nonparametric', 0.042), ('standardized', 0.041), ('latent', 0.04), ('shrinkage', 0.04), ('changes', 0.04), ('markets', 0.039), ('returns', 0.039), ('problematic', 0.038), ('regression', 0.038), ('moderately', 0.038), ('battisti', 0.038), ('bru', 0.038), ('burns', 0.038), ('durante', 0.038), ('dvec', 0.038), ('scarpa', 0.038), ('terior', 0.038), ('uenza', 0.038), ('weide', 0.038), ('conditional', 0.037), ('jasa', 0.037), ('indices', 0.036), ('dotted', 0.036), ('processes', 0.036), ('usa', 0.036), ('johnstone', 0.035), ('correspondence', 0.035), ('bayesian', 0.035), ('dynamic', 0.034), ('periods', 0.034), ('predictive', 0.033), ('weekly', 0.033), ('loadings', 0.033), ('bekk', 0.033), ('richly', 0.033), ('donoho', 0.033), ('italy', 0.033), ('exhibiting', 0.033), ('levels', 0.032), ('trajectories', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
Author: Daniele Durante, Bruno Scarpa, David Dunson
Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application. 1 1.1
2 0.18027644 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.11768745 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha
Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1
4 0.11505794 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
Author: Tzu-Kuo Huang, Jeff Schneider
Abstract: Learning dynamic models from observed data has been a central issue in many scientific studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientific modeling tasks, however, it turns out that reliable sequential data are rather difficult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover firstorder Markov models and hidden Markov models. To the best of our knowledge, this is the first formal guarantee on learning from non-sequence data. Preliminary simulation results confirm our theoretical findings. 1
5 0.10492283 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
Author: Roger Frigola, Fredrik Lindsten, Thomas B. Schon, Carl Rasmussen
Abstract: State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference and learning (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and, instead, infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity. 1
6 0.1005301 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
7 0.096633129 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
8 0.083021536 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
9 0.077285334 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
10 0.076898962 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression
11 0.076838389 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
12 0.071281165 251 nips-2013-Predicting Parameters in Deep Learning
13 0.070617549 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
14 0.069847487 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
15 0.069819905 224 nips-2013-On the Sample Complexity of Subspace Learning
16 0.067991182 205 nips-2013-Multisensory Encoding, Decoding, and Identification
17 0.067959294 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
18 0.06770473 126 nips-2013-Gaussian Process Conditional Copulas with Applications to Financial Time Series
19 0.066680834 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
20 0.063781656 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms
topicId topicWeight
[(0, 0.175), (1, 0.038), (2, -0.002), (3, 0.005), (4, -0.029), (5, 0.083), (6, 0.099), (7, 0.012), (8, 0.071), (9, -0.042), (10, -0.008), (11, -0.028), (12, -0.058), (13, 0.041), (14, -0.018), (15, 0.044), (16, -0.018), (17, 0.011), (18, -0.047), (19, -0.057), (20, 0.029), (21, 0.048), (22, -0.044), (23, 0.016), (24, 0.059), (25, 0.06), (26, -0.033), (27, 0.046), (28, -0.058), (29, -0.113), (30, -0.112), (31, 0.056), (32, -0.158), (33, -0.042), (34, -0.089), (35, 0.052), (36, -0.021), (37, -0.034), (38, 0.047), (39, -0.013), (40, 0.009), (41, -0.025), (42, -0.007), (43, -0.058), (44, -0.198), (45, -0.099), (46, 0.038), (47, -0.058), (48, -0.165), (49, 0.081)]
simIndex simValue paperId paperTitle
same-paper 1 0.95047396 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
Author: Daniele Durante, Bruno Scarpa, David Dunson
Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application. 1 1.1
2 0.66482103 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.60287941 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha
Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1
4 0.57203591 299 nips-2013-Solving inverse problem of Markov chain with partial observations
Author: Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide
Abstract: The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. The observations include the frequency of visiting a state and the rate of reaching a state from another. Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. We formulate this task as a regularized optimization problem, which is efficiently solved using the notion of natural gradient. Using synthetic and real-world data sets including city traffic monitoring data, we demonstrate the effectiveness of our method.
5 0.51985741 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
Author: Roger Frigola, Fredrik Lindsten, Thomas B. Schon, Carl Rasmussen
Abstract: State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference and learning (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and, instead, infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity. 1
6 0.50283444 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
7 0.49191323 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
8 0.49169454 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models
9 0.48116103 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
10 0.481011 332 nips-2013-Tracking Time-varying Graphical Structure
11 0.47767371 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
12 0.46598974 126 nips-2013-Gaussian Process Conditional Copulas with Applications to Financial Time Series
13 0.4599463 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
14 0.45328456 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
15 0.43757612 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms
16 0.4364728 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
17 0.43227714 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
18 0.43002197 284 nips-2013-Robust Spatial Filtering with Beta Divergence
19 0.42840695 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression
20 0.41924706 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
topicId topicWeight
[(16, 0.063), (19, 0.011), (33, 0.122), (34, 0.121), (41, 0.029), (49, 0.027), (56, 0.138), (70, 0.019), (84, 0.264), (85, 0.045), (89, 0.036), (93, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.80801833 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
Author: Daniele Durante, Bruno Scarpa, David Dunson
Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application. 1 1.1
2 0.74035215 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification
Author: Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, Antonio Criminisi
Abstract: Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization. 1
3 0.68933719 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
Author: Akshay Krishnamurthy, Aarti Singh
Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1
4 0.65718204 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
5 0.65586221 249 nips-2013-Polar Operators for Structured Sparse Estimation
Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans
Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difficulties that entail sophisticated algorithms. Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efficient latent fused lasso. 1
6 0.65535116 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
7 0.65316767 355 nips-2013-Which Space Partitioning Tree to Use for Search?
8 0.65242893 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
9 0.65165764 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
10 0.65100247 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
11 0.65082979 50 nips-2013-Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search
12 0.65067452 55 nips-2013-Bellman Error Based Feature Generation using Random Projections on Sparse Spaces
13 0.6505565 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
14 0.64996624 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
15 0.64881104 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
16 0.64870769 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
17 0.64859706 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models
18 0.64851099 252 nips-2013-Predictive PAC Learning and Process Decompositions
19 0.64754981 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
20 0.64752787 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models