nips nips2009 nips2009-215 knowledge-graph by maker-knowledge-mining

215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Source: pdf

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Inﬁnitesimal Perturbation Analysis (IPA) on the ﬁltering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the ﬁltering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sensitivity analysis in HMMs with application to likelihood maximization Pierre-Arnaud Coquelin, Vekia, Lille, France Romain Deguest∗ Columbia University, New York City, NY 10027 pacoquelin@vekia. [sent-1, score-0.067]

2 fr Abstract This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. [sent-5, score-0.103]

3 We describe a methodology for using any algorithm that estimates the ﬁltering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. [sent-7, score-0.04]

4 The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. [sent-8, score-0.177]

5 We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. [sent-10, score-0.342]

6 1 Introduction We consider a parameterized hidden Markov model (HMM) deﬁned on continuous state and observation spaces. [sent-12, score-0.108]

7 The HMM is deﬁned by a state process (Xt )t≥0 ∈ X and an observation process (Yt )t≥1 ∈ Y that are parameterized by a continuous parameter θ = (θ1 , . [sent-13, score-0.117]

8 The state process is a Markov chain taking its values in a (measurable) state space X, with initial probability measure µ ∈ M(X) (i. [sent-17, score-0.129]

9 X0 ∼ µ) and Markov transition kernel K(θ, xt , dxt+1 ). [sent-19, score-0.448]

10 We assume that we can sample this Markov chain using a transition function F and independent random numbers, i. [sent-20, score-0.073]

11 For simplicity, we adopt the notations F (θ, x−1 , u) Fµ (θ, u), where Fµ is the ﬁrst transition function (i. [sent-27, score-0.03]

12 Since the transition and observation processes are parameterized by the parameter θ, the state Xt and the observation Yt processes depend explicitly on θ. [sent-32, score-0.121]

13 One of the main interest in HMMs is to recover the state at time n given a sequence of past observations (y1 , . [sent-37, score-0.062]

14 The ﬁltering distribution (or belief state) πn (dxn ) P(Xn ∈ dxn |Y1:n = y1:n ) is the distribution of Xn conditioned on the information y1:n . [sent-41, score-0.065]

15 Our contribution is an Inﬁnitesimal Perturbation Analysis (IPA) that estimates the gradient ∇πn (where ∇ refers to the derivative with respect to the ∫ parameter θ) of the ﬁltering distribution πn . [sent-43, score-0.152]

16 We also consider as application, the problem of parameter identiﬁcation in HMMs which consists in estimating the (unknown) parameter θ∗ of the model that has served to generate the sequence of observations. [sent-45, score-0.054]

17 In a Maximum Likelihood (ML) approach, one searches for the parameter θ that maximizes the likelihood (or its logarithm) given the sequence of observations. [sent-46, score-0.067]

18 The log-likelihood of parameter θ is deﬁned by ln (θ) log pθ (y1:n ) where pθ (y1:n ) dy1:n P(Y1:n (θ) ∈ dy1:n ). [sent-47, score-0.094]

19 ˆ The Maximum Likelihood (ML) estimator θn arg maxθ∈Θ ln (θ) is asymptotically consistent ˆ (in the sense that θn converges almost surely to the true parameter θ∗ when n → ∞ under some identiﬁably conditions and mild assumptions on the model, see Theorem 2 of [DM01]). [sent-48, score-0.29]

20 Thus, using the ML approach, the parameter identiﬁcation problem reduces to an optimization problem. [sent-49, score-0.027]

21 Our second contribution is a sensitivity analysis of the predictive distribution ∇πt+1|t , for t < n, which enables to estimate the gradient ∇ln (θ) of the log-likelihood function, which may be used in a (stochastic) gradient method for the purpose of optimizing the likelihood. [sent-50, score-0.208]

22 The approach is numerically illustrated on two parameter identiﬁcation problems (autoregressive model and a stochastic volatility model) and compared to other approaches (EM algorithm, the Kalman ﬁlter, and the Likelihood ratio approach) when these latter apply. [sent-51, score-0.147]

23 2 Links with other works First, let us mention that we are interested in the continuous state case since numerous applications in signal processing, ﬁnance, robotics, or telecommunications naturally ﬁt in this framework. [sent-52, score-0.096]

24 For illustration, a challenging example in ﬁnance is the problem of parameter estimation in the stochastic volatility model, which is a non-linear nonGaussian continuous space HMM parameterized by three continuous parameters (see e. [sent-57, score-0.244]

25 A usual approach for parameter estimation consists in performing a maximum likelihood estimation (MLE), i. [sent-60, score-0.133]

26 For ﬁnite state space problems, the Expectation Maximization (EM) algorithm is a popular method for solving the MLE problem. [sent-63, score-0.043]

27 However, in continuous space problems, see [CM05], the EM algorithm is difﬁcult to use mainly because the Expectation part relies on the estimation of the posterior path measure which is intractable in many situations. [sent-64, score-0.05]

28 An alternative method consists in using brute force optimization methods based on the evaluation of the likelihood such as grid-based or simulated annealing methods. [sent-66, score-0.04]

29 These approaches, which can be seen as black-box optimization are not very efﬁcient in high dimensional parameter spaces. [sent-67, score-0.027]

30 2 Another approach is to treat the parameter as part of the state variable and then compute the optimal ﬁlter (see [DFG01] and [Sto02]). [sent-68, score-0.07]

31 In this case, the Bayesian posterior distribution of the parameter is a marginal of the optimal ﬁlter. [sent-69, score-0.027]

32 A last solution consists in using an optimization procedure based on the evaluation of the gradient of the log-likelihood function with respect to the parameter. [sent-71, score-0.076]

33 The idea was to use a likelihood ratio approach (also called score method) to evaluate the gradient of the likelihood. [sent-75, score-0.166]

34 This approach suffers from high variance of the estimator, in particular for problems with small noise in the dynamic. [sent-76, score-0.043]

35 To tackle this issue, [PDS05] proposed to use a marginal particle ﬁlter instead of a simple path-based particle ﬁlter as Monte Carlo approximation method. [sent-77, score-0.244]

36 This approach is efﬁcient in terms of variance reduction but its computational complexity becomes quadratic in the number of particles instead of being linear, like in path-based particle methods. [sent-78, score-0.248]

37 The IPA approach proposed in this paper is an alternative gradient-based maximum likelihood approach. [sent-79, score-0.04]

38 Compared with works on gradient approaches previously cited, the IPA provides usually a lower variance estimators than the likelihood ratio methods, and its numerical complexity is linear in the number of particles. [sent-80, score-0.23]

39 Other works related to ours are the so-called tangent ﬁlter approach described in [CGN01] for dynamics coming from a discretization of a diffusion process, and the Finite-Difference (FD) approach described in a different setting (i. [sent-81, score-0.041]

40 policy gradient in Partially Observable Markov Decision Processes) in [CDM08]. [sent-83, score-0.076]

41 A similar FD estimator could be designed in our setting too but the resulting FD estimator would be biased (like usual FD schemes) whereas the IPA estimator is not. [sent-84, score-0.402]

42 3 Sequential Monte Carlo methods (SMC) Given a measurable test function f : X → R, we have: ∫ ∏n ∏n f (xn ) t=0 K(xt−1 , dxt )Gt (xt ) E[f (Xn ) t=0 Gt (Xt )] ∫ ∏n ∏n . [sent-85, score-0.073]

43 = πn (f ) E[f (Xn )|Y1:n = y1:n ] = E[ t=0 Gt (Xt )] t=0 K(xt−1 , dxt )Gt (xt ) (2) where we used the simpliﬁed notation: Gt (xt ) g(xt , yt ) and G0 (x0 ) 1. [sent-86, score-0.116]

44 But it should be mentioned that other methods (such as Extended Kalman ﬁlter, quantization methods, Markov Chain Monte Carlo methods) may be used as well to build the IPA estimator that we propose in the next section. [sent-89, score-0.128]

45 The basic SMC method, called Bootstrap Filter, see [DFG01] for details, approximates πn (f ) by an ∑N 1 N empirical distribution πn (f ) N i=1 f (xi ) made of N particles x1:N . [sent-90, score-0.083]

46 n n Algorithm 1 Generic Sequential Monte Carlo for t = 1 to n do iid i i i Sampling: Sample ui t−1 ∼ ν and set xt = F (xt−1 , ut−1 ), ∀i ∈ {1, . [sent-91, score-0.499]

47 Then deﬁne the i importance sampling weights wt = Gt (ei ) xt PN , xj j=1 Gt (et ) Resampling: Set xi = xki , ∀i ∈ {1, . [sent-95, score-0.535]

48 , N }, where k1:N are indices selected from the weights t t 1:N wt . [sent-98, score-0.09]

49 end for ∑N 1 N RETURN: πn (f ) = N i=1 f (xi ) n The sampling (or transition) step generates a successor particle population x1:N according to the t 1:N state dynamics from the previous population x1:N . [sent-99, score-0.189]

50 The importance sampling weights wt are evalt−1 uated, and the resampling (or selection) step resamples (with replacement) N particles x1:N from t 1:N the set x1:N according to the weights wt . [sent-100, score-0.252]

51 t t 1:N The simplest version introduced in [GSS93] chooses the selection indices kt by an independent 1:N sampling from the set {1, . [sent-107, score-0.042]

52 , N } according to a multinomial distribution with parameters wt , j i i. [sent-110, score-0.067]

53 The idea is to replicate the particles in proportion to their weights. [sent-113, score-0.083]

54 Many variants have been proposed in the literature, among which the stratiﬁed resampling method [Kit96] which is optimal in terms of variance minimization. [sent-114, score-0.078]

55 For our purpose we note that under mild conditions on f , πn (f ) is an asymptotically unbiased (see [DMDP07] for the asymptotic expression of the bias) and consistent estimator of πn (f ). [sent-118, score-0.249]

56 Although IPA is known for having a lower variance than SF in general, as far as we know, it has never been used in this context. [sent-123, score-0.043]

57 (5) E[ t=0 Gt (Xt )] We now state some sufﬁcient conditions under which the previous derivations are sound. [sent-128, score-0.043]

58 continuously differentiable at Xn (θ), and for all 1 ≤ t ≤ n, Gt is a. [sent-134, score-0.053]

59 continuously differentiable at (θ, Xt (θ)), • θ → f (Xn (θ)) and for all 1 ≤ t ≤ n, θ → Gt (θ, Xt (θ)) are a. [sent-136, score-0.053]

60 continuous and piecewise differentiable throughout Θ, • Let D be the random subset of Θ at which f (Xn (θ)) or one Gt (θ, Xt (θ)) fails to be differ∏n entiable. [sent-138, score-0.056]

61 differentiability of the path θ → (X0 , X1 , · · · , Xn )(θ) is equivalent to requiring that for all θ ∈ Θ, the transition function F is a. [sent-143, score-0.03]

62 Under the assumptions of Proposition 1, the estimator In deﬁned by (6) has a bias −1 N N O(N ) and is consistent with ∇πn (f ), i. [sent-151, score-0.146]

63 Applying those results to the test function H f (Xn )Zn + Rn (f (Xn ) − πn (f )), using the representation (5) of the gradient, we deduce that the SMC estimator (6) is asymptotically unbiased and consistent with ∇πn (f ). [sent-158, score-0.237]

64 Now the asymptotic variance is O(N −1 ) since the Central Limit Theorem (see e. [sent-159, score-0.065]

65 [Del04, DM08]) applies to the IPA estimator (6) of (5). [sent-161, score-0.128]

66 Notice that the computation of the gradient estimator requires O(nN md) (where m is the dimension of X) elementary operations, which is linear in the number of particles N and linear in the number of parameters d, and has memory requirement O(N md). [sent-163, score-0.287]

67 Using similar arguments as those detailed in proofs of Propositions 1 and 2, we have that this estimator is asymptotically unbiased and consistent with ∇ln (θ). [sent-167, score-0.205]

68 The resulting gradient algorithm is described in Algorithm 3. [sent-168, score-0.076]

69 Algorithm 3 Likelihood Maximization by gradient ascent using the IPA estimator of ∇ln (θ) for k = 1, 2, . [sent-174, score-0.221]

70 , Number of gradient steps do N Initialize J0 = 0 for t = 1 to n do For all i ∈ {1, . [sent-177, score-0.076]

71 i and compute the weights wt = i i Set (xi , zt , rt ) = where k1:N are indices selected t end for N Perform a gradient ascent step: θk = θk−1 + γk Jn (θk−1 ) end for 6 , t 1 0. [sent-181, score-0.435]

72 where Ut ∼ N (0, 1) and Vt ∼ N (0, 1) are independent sequences of random variables, and θ = (φ, σ, β) is a three-dimensional parameter in (R+ )3 . [sent-218, score-0.027]

73 Stochastic volatility model is very popular in the ﬁeld of quantitative ﬁnance [ME07] to evaluate derivative securities, such as options. [sent-219, score-0.114]

74 Xt Yt = φXt−1 + σUt , = β exp (Xt /2) Vt , (8) where again Ut ∼ N (0, 1) and Vt ∼ N (0, 1) and the parameter θ = (φ, σ, β) ∈ (R+ ) . [sent-227, score-0.027]

75 1 Parameter identiﬁcation Figure 1 shows the results of our IPA gradient estimator for the AR1 parameter identiﬁcation problem and compares those with two other methods: Kalman ﬁlter (K) and EM (which apply since the model is linear-Gaussian). [sent-229, score-0.231]

76 Notice the apparent bias of the three methods in the estimation of θ∗ (even for Kalman which provides here the exact ﬁltering distribution) since the number of observations n = 500 is ﬁnite. [sent-234, score-0.043]

77 For IPA, we used N = 102 particles and 150 gradient iterations. [sent-235, score-0.159]

78 The IPA method applies to general models, for example, to the stochastic volatility model. [sent-244, score-0.12]

79 0) using IPA with n = 103 observations and N = 102 particles (no comparison is made here since Kalman does not apply and EM becomes more complicated). [sent-248, score-0.102]

80 2 Variance study for Score and IPA algorithms IPA and Score methods provide gradient estimators for general models. [sent-250, score-0.121]

81 We compare the variance of the corresponding estimators of the gradient ∇ln for the AR1 since for this model we know its exact value (using Kalman). [sent-251, score-0.164]

82 8 1 2 3 Parameter number Figure 2: Box-and-whiskers plots of the three parameters (φ, σ, β) estimates for the IPA method applied to the stochastic volatility model with θ = (0. [sent-261, score-0.14]

83 The IPA estimator performs better than the Score estimator for small values of σ. [sent-268, score-0.256]

84 On the other hand, in case of huge variance in the state model, it is better to use the Score estimator. [sent-269, score-0.086]

85 4 Figure 3: Variance of the log-likelihood derivative ∂σ ln computed with both the IPA and Score methods. [sent-276, score-0.096]

86 Let us mention that the variance of the IPA (as well as Score) estimator increases when the number of observations n increases. [sent-283, score-0.217]

87 However, under weak conditions on the HMM [LM00], the ﬁltering distribution and its gradient forget exponentially fast the initial distribution. [sent-284, score-0.076]

88 This property has already been used for EM estimators in [CM05] to show that ﬁxed-lag smoothing drastically reduces the variance without signiﬁcantly raising the bias. [sent-285, score-0.088]

89 Similar smoothing (either ﬁxed-lag or discounted) would provide efﬁcient variance reduction techniques for the IPA estimator as well. [sent-286, score-0.171]

90 6 Conclusions We proposed a sensitivity analysis in HMMs based on an Inﬁnitesimal Perturbation Analysis and provided a computationally efﬁcient gradient estimator that provides an interesting alternative to the usual Score method. [sent-287, score-0.256]

91 We showed how this analysis may be used for estimating the gradient of the log-likelihood in a gradient-based likelihood maximization approach for the purpose of parameter identiﬁcation. [sent-288, score-0.192]

92 Finally let us mention that estimators of higher-order derivatives (e. [sent-289, score-0.072]

93 On the use of particle ﬁltering for maximum likelihood parameter estimation. [sent-316, score-0.189]

94 Asymptotics of the maximum likelihood estimator for general hidden markov models. [sent-331, score-0.232]

95 Limit theorems for weighted samples with applications to sequential monte carlo methods. [sent-336, score-0.11]

96 Parameter estimation in general state-space models using particle methods. [sent-347, score-0.146]

97 Particle-based methods for parameter estimation and tracking : numerical experiments. [sent-356, score-0.077]

98 Novel approach to nonlinear and nongaussian bayesian state estimation. [sent-368, score-0.075]

99 Monte-Carlo ﬁlter and smoother for non-Gaussian nonlinear state space models. [sent-372, score-0.043]

100 Exponential forgetting and geometric ergodicity in hidden markov models. [sent-387, score-0.064]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gt', 0.54), ('ipa', 0.485), ('xt', 0.418), ('xn', 0.148), ('estimator', 0.128), ('rt', 0.127), ('kalman', 0.126), ('zt', 0.125), ('particle', 0.122), ('smc', 0.121), ('ut', 0.086), ('ltering', 0.085), ('gk', 0.085), ('volatility', 0.085), ('particles', 0.083), ('xk', 0.08), ('gradient', 0.076), ('zn', 0.071), ('hmms', 0.069), ('ln', 0.067), ('wt', 0.067), ('yt', 0.067), ('dxn', 0.065), ('ui', 0.057), ('nitesimal', 0.055), ('lter', 0.054), ('perturbation', 0.05), ('score', 0.05), ('xi', 0.05), ('em', 0.049), ('dxt', 0.049), ('markov', 0.046), ('estimators', 0.045), ('doucet', 0.044), ('fd', 0.044), ('rn', 0.043), ('state', 0.043), ('variance', 0.043), ('chain', 0.043), ('carlo', 0.041), ('monte', 0.041), ('likelihood', 0.04), ('hmm', 0.036), ('resampling', 0.035), ('stochastic', 0.035), ('sensitivity', 0.034), ('nance', 0.033), ('sf', 0.033), ('coquelin', 0.032), ('deguest', 0.032), ('douc', 0.032), ('dyt', 0.032), ('legland', 0.032), ('nongaussian', 0.032), ('deduce', 0.032), ('identi', 0.031), ('asymptotically', 0.031), ('vt', 0.03), ('differentiable', 0.03), ('transition', 0.03), ('derivative', 0.029), ('lille', 0.028), ('sequential', 0.028), ('proposition', 0.028), ('unbiased', 0.028), ('maximization', 0.027), ('mention', 0.027), ('parameter', 0.027), ('continuous', 0.026), ('numerical', 0.026), ('jt', 0.026), ('france', 0.025), ('measurable', 0.024), ('iid', 0.024), ('estimation', 0.024), ('del', 0.024), ('dynamics', 0.024), ('indices', 0.023), ('jn', 0.023), ('autoregressive', 0.023), ('ml', 0.023), ('augmented', 0.023), ('continuously', 0.023), ('asymptotic', 0.022), ('purpose', 0.022), ('parameterized', 0.021), ('mle', 0.021), ('estimates', 0.02), ('observations', 0.019), ('surely', 0.019), ('kluwer', 0.019), ('kt', 0.019), ('hidden', 0.018), ('pn', 0.018), ('consistent', 0.018), ('usual', 0.018), ('md', 0.017), ('tangent', 0.017), ('fk', 0.017), ('ascent', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

2 0.22421494 178 nips-2009-On Stochastic and Worst-case Models for Investing

Author: Elad Hazan, Satyen Kale

Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best ﬁxed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1

3 0.21854854 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Author: Chonghai Hu, Weike Pan, James T. Kwok

Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1

4 0.21492775 220 nips-2009-Slow Learners are Fast

Author: Martin Zinkevich, John Langford, Alex J. Smola

Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1

5 0.18805781 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We deﬁne conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efﬁciently perform model selection to decide whether observations are from one or multiple categories.

6 0.16637769 45 nips-2009-Beyond Convexity: Online Submodular Minimization

7 0.16406868 27 nips-2009-Adaptive Regularization of Weight Vectors

8 0.16078553 177 nips-2009-On Learning Rotations

9 0.11996707 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization

10 0.11127857 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

11 0.11054564 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

12 0.10924574 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

13 0.099735945 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems

14 0.096666507 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

15 0.094221808 13 nips-2009-A Neural Implementation of the Kalman Filter

16 0.089546755 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

17 0.085998967 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

18 0.085394323 246 nips-2009-Time-Varying Dynamic Bayesian Networks

19 0.080660194 11 nips-2009-A General Projection Property for Distribution Families

20 0.080142044 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.19), (1, 0.184), (2, 0.155), (3, -0.111), (4, 0.334), (5, 0.184), (6, 0.089), (7, 0.092), (8, -0.025), (9, -0.057), (10, 0.114), (11, 0.113), (12, -0.051), (13, -0.06), (14, -0.001), (15, -0.082), (16, -0.191), (17, -0.029), (18, 0.035), (19, 0.015), (20, 0.081), (21, 0.003), (22, 0.064), (23, 0.109), (24, -0.035), (25, 0.026), (26, 0.069), (27, 0.018), (28, -0.056), (29, -0.031), (30, 0.083), (31, 0.016), (32, 0.025), (33, -0.038), (34, 0.03), (35, 0.028), (36, 0.012), (37, -0.003), (38, 0.039), (39, 0.029), (40, 0.029), (41, 0.074), (42, -0.033), (43, -0.017), (44, 0.003), (45, -0.003), (46, 0.026), (47, 0.015), (48, 0.025), (49, -0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97876543 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

2 0.82414895 178 nips-2009-On Stochastic and Worst-case Models for Investing

Author: Elad Hazan, Satyen Kale

3 0.79730046 27 nips-2009-Adaptive Regularization of Weight Vectors

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, conﬁdence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent conﬁdence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1

4 0.76655519 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Author: Chonghai Hu, Weike Pan, James T. Kwok

5 0.71017522 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

6 0.69427639 177 nips-2009-On Learning Rotations

7 0.648996 220 nips-2009-Slow Learners are Fast

8 0.49193838 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization

9 0.48455518 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

10 0.47526231 11 nips-2009-A General Projection Property for Distribution Families

11 0.4551686 45 nips-2009-Beyond Convexity: Online Submodular Minimization

12 0.41552117 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

13 0.41279054 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

14 0.41194108 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

15 0.36838314 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems

16 0.3535122 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

17 0.34842163 246 nips-2009-Time-Varying Dynamic Bayesian Networks

18 0.32965925 187 nips-2009-Particle-based Variational Inference for Continuous Systems

19 0.31515458 69 nips-2009-Discrete MDL Predicts in Total Variation

20 0.30506659 76 nips-2009-Efficient Learning using Forward-Backward Splitting

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.033), (24, 0.039), (25, 0.06), (35, 0.072), (36, 0.091), (39, 0.049), (48, 0.191), (58, 0.107), (61, 0.059), (62, 0.023), (66, 0.021), (71, 0.098), (86, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83350384 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1

same-paper 2 0.83298469 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

3 0.80488271 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

Author: Amarnag Subramanya, Jeff A. Bilmes

Abstract: We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that is cache cognizant and leads to a linear speedup in parallel computations. This ensures that the algorithm scales to large data sets. By making use of empirical evaluation on the TIMIT and Switchboard I corpora, we show this approach is able to outperform other state-of-the-art SSL approaches. In one instance, we solve a problem on a 120 million node graph. 1

4 0.72891098 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

Author: Liefeng Bo, Cristian Sminchisescu

Abstract: In visual recognition, the images are frequently modeled as unordered collections of local features (bags). We show that bag-of-words representations commonly used in conjunction with linear classiﬁers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Despite its simplicity, this quantization is too coarse, motivating research into the design of match kernels that more accurately measure the similarity between local features. However, it is impractical to use such kernels for large datasets due to their signiﬁcant computational cost. To address this problem, we propose efﬁcient match kernels (EMK) that map local features to a low dimensional feature space and average the resulting vectors to form a setlevel feature. The local feature maps are learned so their inner products preserve, to the best possible, the values of the speciﬁed kernel function. Classiﬁers based on EMK are linear both in the number of images and in the number of local features. We demonstrate that EMK are extremely efﬁcient and achieve the current state of the art in three difﬁcult computer vision datasets: Scene-15, Caltech-101 and Caltech-256. 1

5 0.70826292 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

Author: Peter Carbonetto, Matthew King, Firas Hamze

Abstract: We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean ﬁeld and expectation propagation it is not restricted to tractable classes of approximating distributions. Our approach can also be viewed as a “population-based” sequential Monte Carlo (SMC) method, but unlike existing SMC methods there is no need to design the artiﬁcial sequence of distributions. Signiﬁcantly, our framework oﬀers a principled means to exchange the variance of an importance sampling estimate for the bias incurred through variational approximation. We conduct experiments on a diﬃcult inference problem in population genetics, a problem that is related to inference for LDA. The results of these experiments suggest that our method can oﬀer improvements in stability and accuracy over existing methods, and at a comparable cost. 1

6 0.69960845 113 nips-2009-Improving Existing Fault Recovery Policies

7 0.69428378 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

8 0.69247538 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

9 0.68881828 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

10 0.68096846 3 nips-2009-AUC optimization and the two-sample problem

11 0.679672 97 nips-2009-Free energy score space

12 0.6795854 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

13 0.67637438 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

14 0.67516178 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

15 0.67457181 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

16 0.67354119 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning

17 0.67352283 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

18 0.67303771 187 nips-2009-Particle-based Variational Inference for Continuous Systems

19 0.67269212 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

20 0.67241943 100 nips-2009-Gaussian process regression with Student-t likelihood