nips nips2013 nips2013-299 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide
Abstract: The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. The observations include the frequency of visiting a state and the rate of reaching a state from another. Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. We formulate this task as a regularized optimization problem, which is efficiently solved using the notion of natural gradient. Using synthetic and real-world data sets including city traffic monitoring data, we demonstrate the effectiveness of our method.
Reference: text
sentIndex sentText sentNum sentScore
1 Solving inverse problem of Markov chain with partial observations Tetsuro Morimura IBM Research - Tokyo tetsuro@jp. [sent-1, score-0.386]
2 com Abstract The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. [sent-9, score-0.349]
3 A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. [sent-10, score-0.312]
4 In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. [sent-11, score-0.289]
5 This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. [sent-12, score-0.246]
6 The observations include the frequency of visiting a state and the rate of reaching a state from another. [sent-13, score-0.234]
7 Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. [sent-14, score-0.415]
8 1 Introduction The Markov chain is a standard model for analyzing the dynamics of stochastic systems, including economic systems [29], traffic systems [11], social systems [12], and ecosystems [6]. [sent-17, score-0.328]
9 There is a large body of the literature on the problem of analyzing the properties a Markov chain given its initial distribution and a matrix of transition probabilities [21, 26]. [sent-18, score-0.438]
10 For example, there exist established methods for analyzing the stationary distribution and the mixing time of a Markov chain [23, 16]. [sent-19, score-0.331]
11 , the initial distribution and the transition-probability matrix) of the Markov chain that models a particular system under consideration. [sent-23, score-0.282]
12 For example, one can analyze a traffic system [27, 24], including how the vehicles are distributed across a city, by modeling the dynamics of vehicles as a Markov chain [11]. [sent-24, score-0.486]
13 It is, however, difficult to directly measure the fraction of the vehicles that turns right or left at every intersection. [sent-25, score-0.106]
14 The inverse problem of a Markov chain that we address in this paper is an inverse version of the traditional problem of analyzing a Markov chain with given input parameters. [sent-26, score-0.701]
15 Namely, our goal is to estimate the parameters of a Markov chain from partial observations of the corresponding system. [sent-27, score-0.28]
16 In the context of the traffic system, for example, we seek to find the parameters of a Markov chain, given the traffic volumes at stationary observation points and/or the rate of vehicles moving between 1 Figure 1: An inverse Markov chain problem. [sent-28, score-0.652]
17 The traffic volume on every road is inferred from traffic volumes at limited observation points and/or the rates of vehicles transitioning between these points. [sent-29, score-0.434]
18 Such statistics can be reliably estimated from observations with web-cameras [27], automatic number plate recognition devices [10], or radio-frequency identification (RFID) [25], whose availability is however limited to a small number of observation points in general (see Figure 1). [sent-31, score-0.149]
19 By estimating the parameters of a Markov chain and analyzing its stationary probability, one can infer the traffic volumes at unobserved points. [sent-32, score-0.435]
20 The primary contribution of this paper is the first methodology for solving the inverse problem of a Markov chain when only the observation at a limited number of stationary observation points are given. [sent-33, score-0.527]
21 Specifically, we assume that the frequency of visiting a state and/or the rate of reaching a state from another are given for a small number of states. [sent-34, score-0.207]
22 We formulate the inverse problem of a Markov chain as a regularized optimization problem. [sent-35, score-0.359]
23 Then we can efficiently find a solution to the inverse problem of a Markov chain based on the notion of natural gradient [3]. [sent-36, score-0.375]
24 The inverse problem of a Markov chain has been addressed in the literature [9, 28, 31], but the existing methods assume that sample paths of the Markov chain are available. [sent-37, score-0.586]
25 Related work of inverse reinforcement learning [20, 1, 32] also assumes that sample paths are available. [sent-38, score-0.152]
26 Even when it is available, it is often limited to vehicles of a particular type such as taxis or in a particular region. [sent-43, score-0.139]
27 On the other hand, stationary observation data is often less expensive and more obtainable. [sent-44, score-0.115]
28 In Section 3, we formulate an inverse problem of a Markov chain as a regularized optimization problem. [sent-48, score-0.359]
29 A method for efficiently solving the inverse problem of a Markov chain is proposed in Section 4. [sent-49, score-0.337]
30 2 Preliminaries A discrete-time Markov chain [26, 21] is a stochastic process, X = (X0 , X1 , . [sent-52, score-0.231]
31 A Markov chain is defined by the triplet {X , pI , pT }, where X = {1, . [sent-56, score-0.231]
32 , pI (x) ≜ Pr(X0 = x), and pT : X × X → [0, 1] specifies the state transition probability from x to x′ , i. [sent-62, score-0.137]
33 Note the state transition is conditionally independent of the past states given the current state, which is called the Markov property. [sent-65, score-0.144]
34 Any Markov chain can be converted into another Markov chain, called a Markov chain with restart, by modifying the transition probability. [sent-66, score-0.533]
35 There, the initial-state probability stays unchanged, but the state transition probability is modified into p such that p(x′ | x) ≜ βpT (x′ | x) + (1 − β)pI (x′ ), (1) where β ∈ [0, 1) is a continuation rate of the Markov chain1 . [sent-67, score-0.178]
36 In the limit of β → 1, this Markov chain with restart is equivalent to the original Markov chain. [sent-68, score-0.268]
37 In the following, we refer to p as the (total) transition probability, while pT as a partial transition (or p-transition) probability. [sent-69, score-0.164]
38 So, restarting a chain means that an agent’s origin of a trip is decided by the initial distribution, and the trip ends at each time-step with probability 1 − β. [sent-73, score-0.349]
39 So we will denote those as pIν and pTω , respectively, and the total transition probability as pθ , where θ is the total model parameter, ˜ ˜ θ ≜ [ν⊤ , ω⊤ , β]⊤ ∈ Rd where d = d1 +d2 +1 and β ≜ ς −1 (β) with the inverse of sigmoid function ς −1 . [sent-75, score-0.2]
40 (2) The Markov chain with restart can be represented as M(θ) ≜ {X , pIν , pTω , β}. [sent-78, score-0.268]
41 Assumption 1 The Markov chain M(θ) for any θ ∈ Rd is ergodic (irreducible and aperiodic). [sent-80, score-0.231]
42 Assumption 2 indicates that the transition probability pθ is also differentiable for any state pair (x, x′ ) ∈ X × X with respect to any θ ∈ Rd . [sent-83, score-0.137]
43 Finally we define hitting probabilities for a Markov chain of indefinite-horizon. [sent-84, score-0.4]
44 The Markov chain ˜ is represented as M(θ) = {X , pTω , β}, which evolves according to the p-transition probability pTω , not to pθ , and terminates with a probability 1 − β at every step. [sent-85, score-0.277]
45 The hitting probability of a state x′ given x is defined as ˜ ˜ hθ (x′ | x) ≜ Pr(x′ ∈ X | X0 = x, M(θ)), (4) ˜ ˜ ˜ ˜ where X = (X0 , . [sent-86, score-0.177]
46 3 Inverse Markov Chain Problem Here we formulate an inverse problem of the Markov chain M(θ). [sent-90, score-0.359]
47 In the inverse problem, the model family M ∈ {M(θ) | θ ∈ Rd }, which may be subject to a transition structure as in the road network, is known or given a priori, but the model parameter θ is unknown. [sent-91, score-0.325]
48 Objective functions for the inverse problem are discussed in Section 3. [sent-94, score-0.106]
49 1 Problem setting The input and output of our inverse problem of the Markov chain is as follows. [sent-97, score-0.337]
50 In the context of traffic monitoring, f (x) denotes the number of vehicles that went through an observation point, x; g(x, x′ ) denotes the number of vehicles that went through x and x′ in this order divided by f (x). [sent-101, score-0.314]
51 • Output is the estimated parameter θ of the Markov chain M(θ), which specifies the totaltransition probability function pθ in Eq. [sent-102, score-0.254]
52 Specifically, we assume that the observed f is proportional to the true stationary probability of the Markov chain: π ∗ (x) = cf (x), x ∈ Xo , (5) where c is an unknown constant to satisfy the normalization condition. [sent-106, score-0.096]
53 We further assume that the observed reaching rate is equal to the true hitting probability of the Markov chain: h∗ (x′ | x) = g(x, x′ ), (x, x′ ) ∈ Xo × Xo . [sent-107, score-0.184]
54 For Ld (θ), one natural choice might be a Kullback-Leibler (KL) divergence, ∑ ∑ π ∗ (x) LKL (θ) ≜ π ∗ (x) log = −c f (x) log πθ (x) + o, d πθ (x) x∈Xo x∈Xo where o is a term independent of θ. [sent-121, score-0.112]
55 (8) follows from maximizing the likelihood of θ under the assumption that the observation “log f (i) − log f (j)” has a Gaussian white √ noise N (0, ϵ2 ). [sent-130, score-0.116]
56 2 Cost function for hitting probability function Unlike Ld (θ), there are several options for Lh (θ). [sent-134, score-0.134]
57 (6), )2 1 ∑ ∑( Lh (θ) ≜ log g(i, j) − log hθ (j | i) . [sent-137, score-0.112]
58 (9) follows from maximizing the likelihood of θ under the assumption that the observation log g(i, j) has a Gaussian white noise, as with the case of Ld (θ). [sent-139, score-0.116]
59 (10) are given as )( ) ∑ ∑( f (i) πθ (i) ∇θ Ld (θ) = log − log ∇θ log πθ (j) − ∇θ log πθ (i) , f (j) πθ (j) i∈Xo j∈Xo ∑ ∑( ) ∇θ Lh (θ) = log g(i, j) − log hθ (j | i) ∇θ log hθ (j | i). [sent-153, score-0.392]
60 (10), we need to compute the gradient of the logarithmic stationary probability ∇θ log πθ , the hitting probability hθ , and its gradient ∇θ hθ . [sent-155, score-0.38]
61 Accordingly, the ordinary gradient is generally different from the steepest direction on the manifold, and the optimization process with the ordinary gradient often becomes unstable or falls into a learning plateau [5]. [sent-162, score-0.158]
62 An appropriate Riemannian metric on a statistical model, Y , having parameters, θ, is known to be its Fisher information matrix (FIM):4 ∑ ⊤ y Pr(Y = y | θ)∇θ log Pr(Y = y | θ)∇θ log Pr(Y = y | θ) . [sent-165, score-0.159]
63 (10), Gθ = Fθ + σId , (11) ′ where Fθ is the FIM of pθ (x |x)πθ (x), ( ) ∑ ∑ Fθ ≜ πθ (x) ∇θ log πθ (x)∇θ log πθ (x)⊤ + pθ (x′ |x)∇θ log pθ (x′ |x)∇θ log pθ (x′ |x)⊤ . [sent-168, score-0.224]
64 3 A parameter space is a Riemannian space if the parameter θ ∈ Rd is on a Riemannian manifold defined by a positive definite matrix called a Riemannian metric matrix Rθ ∈ Rd×d . [sent-171, score-0.097]
65 R 4 The FIM is the unique metric matrix of the second-order Taylor expansion of the KL divergence, that is, ∑ Pr(Y=y|θ) 2 1 y Pr(Y = y | θ) log Pr(Y=y|θ+∆θ) ≃ 2 ∥∆θ∥Fθ . [sent-173, score-0.103]
66 Proposition 1 ([7]) If A ∈ Rd×d satisfies limK→∞ AK = 0, then the inverse of (I − A) exists, ∑K and (I − A)−1 = limK→∞ k=0 Ak . [sent-182, score-0.106]
67 (3) with respect to θi , ⊤ ⊤ Diag(πθ ) ∇θi log πθ = (∇θi Pθ )πθ + Pθ Diag(πθ ) ∇θi log πθ is obtained. [sent-186, score-0.112]
68 Though we get the following linear simultaneous equation of ∇θi log πθ , ⊤ ⊤ (13) (Id − Pθ )Diag(πθ ) ∇θi log πθ = (∇θi Pθ )πθ , ⊤ ⊤ the inverse of (Id −Pθ )Diag(πθ ) does not exist. [sent-187, score-0.218]
69 The inverse of (Id − Pθ + πθ 1⊤ ) d d ⊤ ⊤ k ⊤k exists, because of Proposition 1 and the fact limk→∞ (Pθ − πθ 1d ) = limk→∞ Pθ − πθ 1⊤ = 0. [sent-191, score-0.106]
70 d The inverse of Diag(πθ ) also exists, because πθ (x) is positive for any x ∈ X under Assumption 1. [sent-192, score-0.106]
71 , hθ (x | |X |)]⊤ for the hitting probabilities in Eq. [sent-198, score-0.169]
72 The matrix PTθ is defined as (I|X | − ex | ex ⊤ )PTθ . [sent-202, score-0.098]
73 Because |X \x \x the inverse of (I|X | − βPTθ ) exists by Proposition 1 and limk→∞ (βPTθ )k = 0, we get Eq. [sent-208, score-0.106]
74 The initial probability is modeled as exp(sI (x; ν)) pIν (x) ≜ ∑ , y∈X exp(sI (y; ν)) (16) where sI (x; ν) is a state score function with its parameter ν ≜ [ν loc⊤ ν glo⊤ ]⊤ ∈ Rd1 consisting of , a local parameter ν loc ∈ R|X | and a global parameter ν glo ∈ Rd1 −|X | . [sent-215, score-0.389]
75 It is defined as loc sI (x; ν) ≜ νx + ϕI (x)⊤ ν glo , 6 (17) where ϕI (x) ∈ Rd1 −|X | is a feature vector of a state x. [sent-216, score-0.335]
76 In the case of the road network, a state corresponds to a road segment. [sent-217, score-0.339]
77 Then ϕI (x) may, for example [18], be defined with the indicators of whether there are particular types of buildings near the road segment, x. [sent-218, score-0.148]
78 It is defined as glo glo loc sT (x, x′ ; ω) ≜ ω(x,x′ ) + ϕT (x′ )⊤ ω1 + ψ(x, x′ )⊤ ω2 , (x, x′ ) ∈ X × Xx , ∑ loc where ω(x,x′ ) is the element of ω loc (∈ R x∈X |Xx | ) corresponding to transition from x to x′ , and ϕT (x) and ψ(x, x′ ) are feature vectors. [sent-223, score-0.783]
79 For the road network, ϕT (x) may be defined based on the type of the road segment, x, and ψ(x, x′ ) may be defined based on the angle between x and glo glo x′ . [sent-224, score-0.624]
80 Those linear combinations with the global parameters, ω1 and ω2 , can represent drivers’ preferences such as how much the drivers prefer major roads or straight routes to others. [sent-225, score-0.121]
81 First, the basic initial probabilities, pIν , and the basic transition probabilities, pTω , were determined based on Eqs. [sent-235, score-0.102]
82 Then we sampled the visiting frequencies f (x) and the hitting rates g(x, x′ ) for every x, x′ ∈ Xo from this synthesized Markov chain. [sent-241, score-0.186]
83 This is mainly due to the fact that the NWKR assumes that all propagations of the observation from a link to another connected link are equally weighted. [sent-254, score-0.142]
84 The goal is to estimate the traffic volume along an arbitrary road segment (or link of a network), given observed traffic volumes on a limited number of the links, where a link corresponds to the state x of M(θ), and the traffic volume along x corresponds to f (x) of Eq. [sent-259, score-0.485]
85 The traffic volumes along the observable links were reliably estimated from real-world web-camera images captured in Nairobi, Kenya [2, 7 (B) 2. [sent-261, score-0.161]
86 (B) Traffic volumes for a city center map in Nairobi, Kenya, I: Web-camera observations (colored), II: Estimated traffic volumes by our method. [sent-305, score-0.207]
87 15], while we did not use the hitting rate g(x, x′ ) here because of its unavailability. [sent-307, score-0.129]
88 However, unlike network tomography, we need to infer all of the link traffics instead of source-destination demands. [sent-309, score-0.104]
89 Unlike link-cost prediction, our inputs are stationary observations instead of trajectories. [sent-310, score-0.1]
90 The road network and the web-camera observations are shown in Fig. [sent-312, score-0.201]
91 While the total number of links was 1, 497, the number of links with observations was only 52 (about 3. [sent-314, score-0.109]
92 We used the parametric models in Section 5, where ϕT (x) ∈ [−1, 1] was set based on the road category of x such that primary roads have a higher value than secondary roads [22], and ψ(x, x′ ) ∈ [−1, 1] was the cosine of the angle between x and x′ . [sent-316, score-0.326]
93 Figure 2 (B)-II shows an example of our results, where the red and yellow roads are most congested while the traffic on the blue roads is flowing smoothly. [sent-319, score-0.214]
94 The congested roads from our analysis are consistent with those from a local traffic survey report [13]. [sent-320, score-0.125]
95 This is rather surprising, because the rate of observation links is very limited to only 3. [sent-325, score-0.134]
96 7 Conclusion We have defined a novel inverse problem of a Markov chain, where we infer the probabilities about the initial states and the transitions, using a limited amount of information that we can obtain by observing the Markov chain at a small number of states. [sent-327, score-0.517]
97 Using real-world data, we have demonstrated that our approach is useful for a traffic monitoring system that monitors the traffic volume at limited number of locations. [sent-329, score-0.141]
98 From this observation the Markov chain model is inferred, which in turn can be used to deduce the traffic volume at any location. [sent-330, score-0.302]
99 A Google-like model of road network dynamics and its application to regulation and control. [sent-407, score-0.197]
100 An evaluation of blood-inventory policies: A Markov chain application. [sent-485, score-0.231]
wordName wordTfidf (topN-words)
[('xo', 0.638), ('traf', 0.342), ('chain', 0.231), ('pt', 0.228), ('markov', 0.174), ('glo', 0.164), ('road', 0.148), ('loc', 0.128), ('ld', 0.122), ('hitting', 0.111), ('pi', 0.109), ('nwkr', 0.109), ('rmae', 0.109), ('vehicles', 0.106), ('inverse', 0.106), ('lh', 0.097), ('roads', 0.089), ('id', 0.088), ('lkl', 0.08), ('limk', 0.08), ('volumes', 0.076), ('riemannian', 0.074), ('stationary', 0.073), ('transition', 0.071), ('morimura', 0.064), ('kenya', 0.064), ('diag', 0.062), ('monitoring', 0.059), ('probabilities', 0.058), ('pr', 0.057), ('log', 0.056), ('fim', 0.055), ('nairobi', 0.055), ('link', 0.05), ('visiting', 0.049), ('xx', 0.047), ('rd', 0.045), ('state', 0.043), ('ibm', 0.042), ('observation', 0.042), ('links', 0.041), ('ex', 0.039), ('gradient', 0.038), ('chains', 0.037), ('restart', 0.037), ('congested', 0.036), ('osogami', 0.036), ('rfid', 0.036), ('tetsuro', 0.036), ('aaai', 0.033), ('limited', 0.033), ('sensitivities', 0.032), ('trip', 0.032), ('drivers', 0.032), ('reaching', 0.032), ('initial', 0.031), ('ordinary', 0.03), ('went', 0.03), ('states', 0.03), ('manifold', 0.03), ('volume', 0.029), ('city', 0.028), ('infer', 0.028), ('plate', 0.028), ('reinforcement', 0.028), ('analyzing', 0.027), ('segment', 0.027), ('observations', 0.027), ('metric', 0.027), ('synthesized', 0.026), ('network', 0.026), ('tomography', 0.025), ('observable', 0.025), ('st', 0.025), ('social', 0.024), ('watson', 0.024), ('tokyo', 0.024), ('dynamics', 0.023), ('economic', 0.023), ('probability', 0.023), ('formulate', 0.022), ('xt', 0.022), ('steepest', 0.022), ('amari', 0.022), ('frequency', 0.022), ('partial', 0.022), ('si', 0.022), ('transportation', 0.022), ('system', 0.02), ('squared', 0.02), ('matrix', 0.02), ('dir', 0.02), ('reliably', 0.019), ('assumption', 0.018), ('rate', 0.018), ('paths', 0.018), ('kl', 0.018), ('logarithmic', 0.018), ('preference', 0.017), ('gradients', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 299 nips-2013-Solving inverse problem of Markov chain with partial observations
Author: Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide
Abstract: The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. The observations include the frequency of visiting a state and the rate of reaching a state from another. Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. We formulate this task as a regularized optimization problem, which is efficiently solved using the notion of natural gradient. Using synthetic and real-world data sets including city traffic monitoring data, we demonstrate the effectiveness of our method.
2 0.12824649 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.08930929 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
Author: Tzu-Kuo Huang, Jeff Schneider
Abstract: Learning dynamic models from observed data has been a central issue in many scientific studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientific modeling tasks, however, it turns out that reliable sequential data are rather difficult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover firstorder Markov models and hidden Markov models. To the best of our knowledge, this is the first formal guarantee on learning from non-sequence data. Preliminary simulation results confirm our theoretical findings. 1
4 0.088960886 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
Author: Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent
Abstract: Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued. This has led to various proposals for sampling from this implicitly learned density function, using Langevin and Metropolis-Hastings MCMC. However, it remained unclear how to connect the training procedure of regularized auto-encoders to the implicit estimation of the underlying datagenerating distribution when the data are discrete, or using other forms of corruption process and reconstruction errors. Another issue is the mathematical justification which is only valid in the limit of small corruption noise. We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty). 1
5 0.085528791 24 nips-2013-Actor-Critic Algorithms for Risk-Sensitive MDPs
Author: Prashanth L.A., Mohammad Ghavamzadeh
Abstract: In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance-related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application. 1
6 0.082112461 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
7 0.080485448 26 nips-2013-Adaptive Market Making via Online Learning
8 0.073199004 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors
9 0.069071904 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
10 0.066547938 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
11 0.064943343 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
12 0.063613407 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models
13 0.062551789 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
14 0.054439195 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
15 0.052023269 325 nips-2013-The Pareto Regret Frontier
16 0.050729696 253 nips-2013-Prior-free and prior-dependent regret bounds for Thompson Sampling
17 0.050188605 256 nips-2013-Probabilistic Principal Geodesic Analysis
18 0.049738977 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
19 0.049603403 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
20 0.047800928 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
topicId topicWeight
[(0, 0.141), (1, -0.018), (2, 0.013), (3, -0.002), (4, -0.009), (5, 0.035), (6, 0.061), (7, 0.014), (8, 0.032), (9, -0.019), (10, 0.031), (11, -0.076), (12, 0.039), (13, 0.01), (14, 0.033), (15, 0.038), (16, 0.002), (17, -0.005), (18, -0.007), (19, 0.032), (20, -0.022), (21, -0.028), (22, -0.067), (23, -0.058), (24, 0.003), (25, 0.053), (26, 0.018), (27, -0.002), (28, -0.056), (29, -0.027), (30, -0.02), (31, 0.075), (32, -0.1), (33, 0.009), (34, -0.055), (35, -0.026), (36, -0.004), (37, 0.043), (38, -0.0), (39, -0.02), (40, 0.048), (41, 0.13), (42, -0.121), (43, -0.008), (44, -0.137), (45, -0.048), (46, -0.003), (47, 0.018), (48, 0.005), (49, -0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.92619359 299 nips-2013-Solving inverse problem of Markov chain with partial observations
Author: Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide
Abstract: The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. The observations include the frequency of visiting a state and the rate of reaching a state from another. Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. We formulate this task as a regularized optimization problem, which is efficiently solved using the notion of natural gradient. Using synthetic and real-world data sets including city traffic monitoring data, we demonstrate the effectiveness of our method.
2 0.75963503 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.59053987 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models
Author: Anirban Roychowdhury, Ke Jiang, Brian Kulis
Abstract: Small-variance asymptotics provide an emerging technique for obtaining scalable combinatorial algorithms from rich probabilistic models. We present a smallvariance asymptotic analysis of the Hidden Markov Model and its infinite-state Bayesian nonparametric extension. Starting with the standard HMM, we first derive a “hard” inference algorithm analogous to k-means that arises when particular variances in the model tend to zero. This analysis is then extended to the Bayesian nonparametric case, yielding a simple, scalable, and flexible algorithm for discrete-state sequence data with a non-fixed number of states. We also derive the corresponding combinatorial objective functions arising from our analysis, which involve a k-means-like term along with penalties based on state transitions and the number of states. A key property of such algorithms is that— particularly in the nonparametric setting—standard probabilistic inference algorithms lack scalability and are heavily dependent on good initialization. A number of results on synthetic and real data sets demonstrate the advantages of the proposed framework. 1
4 0.56173402 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
Author: Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent
Abstract: Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued. This has led to various proposals for sampling from this implicitly learned density function, using Langevin and Metropolis-Hastings MCMC. However, it remained unclear how to connect the training procedure of regularized auto-encoders to the implicit estimation of the underlying datagenerating distribution when the data are discrete, or using other forms of corruption process and reconstruction errors. Another issue is the mathematical justification which is only valid in the limit of small corruption noise. We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty). 1
5 0.54636705 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
Author: Daniele Durante, Bruno Scarpa, David Dunson
Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application. 1 1.1
6 0.52159411 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
7 0.49519497 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
8 0.49156365 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
9 0.4913342 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors
10 0.48707846 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
11 0.45861325 36 nips-2013-Annealing between distributions by averaging moments
12 0.45156243 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
13 0.44410858 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
14 0.43284872 284 nips-2013-Robust Spatial Filtering with Beta Divergence
15 0.43149367 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
16 0.43054801 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
17 0.42795572 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
18 0.42714643 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
19 0.4258633 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
20 0.42572159 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
topicId topicWeight
[(2, 0.013), (16, 0.02), (33, 0.141), (34, 0.131), (38, 0.283), (41, 0.03), (49, 0.029), (56, 0.124), (70, 0.019), (85, 0.049), (89, 0.031), (93, 0.037), (95, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.77630663 299 nips-2013-Solving inverse problem of Markov chain with partial observations
Author: Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide
Abstract: The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. The observations include the frequency of visiting a state and the rate of reaching a state from another. Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. We formulate this task as a regularized optimization problem, which is efficiently solved using the notion of natural gradient. Using synthetic and real-world data sets including city traffic monitoring data, we demonstrate the effectiveness of our method.
2 0.71540189 247 nips-2013-Phase Retrieval using Alternating Minimization
Author: Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi
Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers). Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i.e. alternating between estimating the missing phase information, and the candidate solution. In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem – finding a vector x from y, A, where y = |AT x| and |z| denotes a vector of element-wise magnitudes of z – under the assumption that A is Gaussian. Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. However, our algorithm is much more efficient and can scale to large problems. Analytically, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the only known theoretical guarantee for alternating minimization for any variant of phase retrieval problems in the non-convex setting. 1
3 0.71031481 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
Author: Yann Dauphin, Yoshua Bengio
Abstract: Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised. Unfortunately, this creates a computational bottleneck for unsupervised feature learning algorithms such as those based on auto-encoders and RBMs, because they involve a reconstruction step where the whole input vector is predicted from the current feature values. An algorithm was recently developed to successfully handle the case of auto-encoders, based on an importance sampling scheme stochastically selecting which input elements to actually reconstruct during training for each particular example. To generalize this idea to RBMs, we propose a stochastic ratio-matching algorithm that inherits all the computational advantages and unbiasedness of the importance sampling scheme. We show that stochastic ratio matching is a good estimator, allowing the approach to beat the state-of-the-art on two bag-of-word text classification benchmarks (20 Newsgroups and RCV1), while keeping computational cost linear in the number of non-zeros. 1
4 0.64591104 15 nips-2013-A memory frontier for complex synapses
Author: Subhaneil Lahiri, Surya Ganguli
Abstract: An incredible gulf separates theoretical models of synapses, often described solely by a single scalar value denoting the size of a postsynaptic potential, from the immense complexity of molecular signaling pathways underlying real synapses. To understand the functional contribution of such molecular complexity to learning and memory, it is essential to expand our theoretical conception of a synapse from a single scalar to an entire dynamical system with many internal molecular functional states. Moreover, theoretical considerations alone demand such an expansion; network models with scalar synapses assuming finite numbers of distinguishable synaptic strengths have strikingly limited memory capacity. This raises the fundamental question, how does synaptic complexity give rise to memory? To address this, we develop new mathematical theorems elucidating the relationship between the structural organization and memory properties of complex synapses that are themselves molecular networks. Moreover, in proving such theorems, we uncover a framework, based on first passage time theory, to impose an order on the internal states of complex synaptic models, thereby simplifying the relationship between synaptic structure and function. 1
5 0.64263535 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
6 0.64261508 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
7 0.64239919 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
8 0.64071381 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
9 0.64063358 173 nips-2013-Least Informative Dimensions
10 0.63995111 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
11 0.63831758 318 nips-2013-Structured Learning via Logistic Regression
12 0.63763005 348 nips-2013-Variational Policy Search via Trajectory Optimization
13 0.63729668 294 nips-2013-Similarity Component Analysis
15 0.63704264 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
16 0.63695788 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
17 0.63694149 75 nips-2013-Convex Two-Layer Modeling
18 0.63679409 158 nips-2013-Learning Multiple Models via Regularized Weighting
19 0.63666242 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
20 0.63623077 280 nips-2013-Robust Data-Driven Dynamic Programming