nips nips2012 nips2012-136 knowledge-graph by maker-knowledge-mining

136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models


Source: pdf

Author: Kei Wakabayashi, Takao Miura

Abstract: Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. We present some experiments to demonstrate that our proposed method works more efficiently to estimate HHMM parameters than do some existing methods such as the flattening method and Gibbs sampling method. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. [sent-6, score-0.097]

2 However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. [sent-7, score-0.181]

3 In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). [sent-8, score-0.073]

4 A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. [sent-9, score-0.355]

5 The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. [sent-10, score-0.33]

6 HMMs assume a single Markov chain of hidden states as the latent structure of sequence data. [sent-14, score-0.237]

7 Hierarchical Hidden Markov Models (HHMMs) are stochastic models which assume hierarchical Markov chains of hidden states as the latent structure of sequence data [3]. [sent-16, score-0.272]

8 HHMMs have a hierarchical state transition mechanism that yields the capability of capturing global and local sequence patterns in various granularities. [sent-17, score-0.275]

9 For conventional HMMs, we can conduct unsupervised learning efficiently using the forwardbackward algorithm, which is a kind of dynamic programming [9]. [sent-19, score-0.066]

10 We introduce a key notion, activation probability, to formalize the hierarchical transition mechanism naturally. [sent-23, score-0.427]

11 Using this notion, we propose a new exact inference algorithm which has less time complexity than existing methods have. [sent-24, score-0.073]

12 (bottom-right) State identification by the absolute path of the tree. [sent-33, score-0.105]

13 , OT } be a sequence of observations in which subscript t denotes the time in the sequence. [sent-40, score-0.052]

14 HHMMs define Qd for 1 ≤ t ≤ T, 1 ≤ d ≤ D as a hidden state at time t and t level d, where d = 1 represents the top level and d = D represents the bottom level. [sent-42, score-0.318]

15 If Ftd = 1, then it is indicated that the Markov chain of level d terminates at time t. [sent-44, score-0.194]

16 In HHMMs, a state transition at level d is permitted d+1 only when the Markov chain of level d + 1 terminates, i. [sent-45, score-0.39]

17 A terminated t t−1 Markov chain is initialized again at the next time. [sent-48, score-0.069]

18 Figure 1 (left) presents a Dynamic Bayesian Network (DBN) expression for an HHMM of hierarchical depth D = 3. [sent-49, score-0.137]

19 Probabilities of the initializat t t tion and the state transition of Markov chains at level d depend on all higher states Q1:d−1 . [sent-55, score-0.329]

20 Ad (i, j) k is a model parameter of the transition probability at level d from state i to j when Q1:d−1 = k. [sent-56, score-0.27]

21 t Ad (i, end) denotes a termination probability that state i terminates the Markov chain at level d k d when Q1:d−1 = k. [sent-57, score-0.323]

22 πk (j) is an initial state probability of state j at level d when Q1:d−1 = k. [sent-58, score-0.264]

23 t A state space of HHMM is expressed as a tree structure [3]. [sent-60, score-0.116]

24 Figure 1 (top-right) presents a tree expression of state space of an HHMM for which the depth D = 3 and the number of states in each level N = 3. [sent-61, score-0.316]

25 The level of the tree corresponds to the level of HHMM states. [sent-62, score-0.168]

26 Each node at level d corresponds to a combination of states Q1:d . [sent-63, score-0.129]

27 Each node has N children because there are N possible states for each level. [sent-64, score-0.06]

28 The rectangles in the figure denote local HMMs in which nodes can mutually transit directly using the transition probability A. [sent-65, score-0.147]

29 However, arbitrary state space trees do not change the substance of what follows. [sent-67, score-0.086]

30 The behavior of Markov chain at level d depends on the combination of all higher-up states Q1:d−1 , not only on the individual Qd . [sent-68, score-0.182]

31 In the tree structure, the absolute path which corresponds to Q1:d is meaningful, rather than the relative path which corresponds to Qd . [sent-69, score-0.193]

32 We refer to Q1:d as Z d and call it absolute path state. [sent-70, score-0.105]

33 Figure 1 (bottom-right) presents an absolute path state identification. [sent-71, score-0.231]

34 The set of values taken by an absolute path state at level d, denoted by Ωd , contains N d elements in the balanced N-ary tree state space. [sent-72, score-0.376]

35 We define a function to obtain the parent absolute path state of Z d as parent(Z d ). [sent-73, score-0.267]

36 Similarly, we define a function to obtain the set of child absolute path states of Z d as child(Z d ), and a function to obtain the set of siblings of Z d as sib(Z d ) = child(parent(Z d )). [sent-74, score-0.241]

37 We use the notation of the absolute path state Z d rather than Qd throughout the paper. [sent-80, score-0.191]

38 Whereas the conventional notation πk (j) denotes the initial state probability d 1:d−1 d 1:d−1 of Q = j when Q = k, we aggregate Q and Q into Q1:d = Z d and define πdi as the d initial state probability ∑ Z = i. [sent-82, score-0.218]

39 Similarly, we define Adij as the state transition probability from of ∑ Z d = i to j. [sent-83, score-0.201]

40 This method takes O(T 3 ) time complexity, which is not practical for long sequence data. [sent-87, score-0.052]

41 The hierarchical state sequence can be D D reduced to a single sequence of the bottom level absolute path states {Z1 , . [sent-89, score-0.448]

42 If we regard D Z as a flat HMM state, then we can conduct the inference by using the forward-backward algorithm with O(T N 2D ) time complexity since |ΩD | = N D . [sent-93, score-0.103]

43 Notice that the flat state Z D can transit to any other flat state, and we cannot apply efficient algorithms for HMMs of sparse transition matrix. [sent-94, score-0.21]

44 The MinSR constraint enables us to identify the path connecting two flat states uniquely. [sent-98, score-0.141]

45 We also discuss a sampling approach as an alternative parameter estimation technique. [sent-100, score-0.052]

46 The Gibbs sampling is often used for parameter estimation of probabilistic models including latent variables [4]. [sent-101, score-0.072]

47 We can estimate HMM parameters using a Gibbs sampler, which sample each hidden states iteratively. [sent-102, score-0.133]

48 Chib [2] introduced an alternative method, called the Forward-Backward Gibbs Sampler (FBS), which calculates forward probabilities in advance. [sent-106, score-0.113]

49 FBS samples hidden states from the end of the sequence regarding the forward probabilities. [sent-107, score-0.261]

50 FBS method requires larger computations for a single iteration than DGS does, but it can bring a posterior of hidden states to its stationary distribution with fewer iterations [10]. [sent-108, score-0.133]

51 Heller [5] proposed Infinite Hierarchical Hidden Markov Models (IHHMMs) which can have an infinitely large depth by weakening the dependency between the states at different levels. [sent-109, score-0.091]

52 They proposed the inference method for IHHMMs based on a blocked Gibbs sampler of which the sampling unit is a state sequence from t = 1 to T at a single level. [sent-110, score-0.212]

53 This inference takes only O(T D) time for a single iteration. [sent-111, score-0.054]

54 In HHMMs, the states in each level are strongly dependent, so resampling a state at an intermediate level causes all lower states to alter into a state which has a completely different behavior. [sent-112, score-0.43]

55 The basic idea of our algorithm is a decomposition of the flat D D transition probability distribution p(Zt+1 |Zt ), which the flattening method calculates directly for all pairs of the flat states. [sent-115, score-0.134]

56 We can rewrite the flat transition probability distribution into a sum of two cases that the Markov chain at level D terminates or not, as follows. [sent-116, score-0.288]

57 D D D D D p(Zt+1 |Zt ) = p(Zt+1 |Zt , FtD = 0)p(FtD = 0|Zt ) + D−1 D−1 D−1 D D p(Zt+1 |Zt+1 , FtD = 1)p(Zt+1 |Zt , FtD = 1)p(FtD = 1|Zt ) The first term corresponds to the direct transition without the Markov chain termination. [sent-117, score-0.17]

58 The actual computational complexity for calculating this term is O(N D+1 ) because the direct transition is permitted only between the sibling states, i. [sent-118, score-0.176]

59 The second term, cor/ responding to the case in which the Markov chain terminates at level D, contains two factors: The D−1 D−1 upper level transition probability p(Zt+1 |Zt , FtD = 1) and the state initialization probability D−1 D for the terminated Markov chain p(Zt+1 |Zt+1 , FtD = 1). [sent-121, score-0.555]

60 d d The transition probability at level d has the form p(Zt+1 |Zt , Ftd+1 = 1). [sent-123, score-0.184]

61 t d+1 d+1 d The state initialization probability for level d + 1 has the form p(Zt |Zt , Ft−1 = 1). [sent-125, score-0.198]

62 t Using these notations, we can represent the flat transition with propagations of activation probabilD D ities as shown in figure 2 (left) because p(Zt+1 |Zt ) = p(bD |eD ). [sent-127, score-0.389]

63 This representation naturally t+1 t describes the decomposition of the flat transition probability distribution discussed above, and it enables us to apply the decomposition recursively for all levels. [sent-128, score-0.138]

64 1 Inference using Forward and Backward Activation Probabilities We can translate the DBN of HHMMs in figure 1 (left) equivalently into simpler DBN using activation probabilities. [sent-131, score-0.269]

65 The inference algorithm proposed herein is based on a forward-backward calculation over this DBN. [sent-133, score-0.084]

66 We define forward activation probability α and backward activation probability β as follows. [sent-134, score-0.71]

67 αed (i) = p(ed = i, O1:t ) t t αbd (i) = p(bd = i, O1:t−1 ) t t 1 βed (i) = p(Ot+1:T , FT = 1|ed = i) t t 1 βbd (i) = p(Ot:T , FT = 1|bd = i) t t 4 Figure 2: (left) Propagation of activation probabilities for calculating the flat transition probability from time t to t + 1. [sent-135, score-0.462]

68 (right) Equivalent DBN of the HHMM using activation probabilities. [sent-136, score-0.269]

69 Algorithm 1 presents the pseudocodes to calculate whole α. [sent-138, score-0.066]

70 αbd are derived downward from αb1 to αbD by t t t summing up to the initialization probability from the parent and the transition probabilities from the siblings (Line 8 to 11). [sent-139, score-0.275]

71 αed are propagated upward from αeD to αe1 by summing up to the probabilt t t ities of the child Markov chain termination (Line 13 to 16). [sent-140, score-0.172]

72 Algorithm 2 propagates the backward activation probabilities similarly in backward order. [sent-143, score-0.447]

73 1 We can derive the conditional independence of O1:t and {Ot+1:T , FT = 1} given ed = null or t bd = null, because both of ed = null and bd = null indicates that the Markov chains at level t t+1 t+1 d + 1, . [sent-144, score-1.181]

74 On the basis of this conditional independence, the exact inference of a posterior of activation probabilities can be obtained using α and β as presented below. [sent-148, score-0.34]

75 2 Updating Parameters Using the forward and backward activation probabilities, we can estimate HHMM parameters effi¯ ciently in the EM framework. [sent-152, score-0.395]

76 They are calculable using forward and backward activation probabilities. [sent-166, score-0.423]

77 The time complexity for computing a single EM iteration is O(T N D+1 ), which is identical to the calculation of forward and backward activation probabilities. [sent-194, score-0.469]

78 5 Experiments Firstly, we experimentally confirm that the forward-backward activation algorithm yields exactly identical parameter estimation to the flattening method does. [sent-195, score-0.303]

79 We compare three parameter estimation algorithms: our forward-backward activation algorithm for a MinSR HHMM (FBA with MinSR), for a HHMM without MinSR (FBA w/o MinSR), and the flattening method(FFB). [sent-197, score-0.288]

80 Furthermore, the FBA enables us to conduct the parameter estimation of HHMMs which has non-zero self-transition parameters. [sent-203, score-0.072]

81 Two are based on the EM algorithm with inference by the forward-backward activation algorithm (FBA), and by the flattening forward-backward method (FFB). [sent-205, score-0.302]

82 Another two are based on a sampling approach: direct Gibbs sampling for the flat HMMs (DGS) and forward-backward activation sampling (FBAS). [sent-206, score-0.393]

83 FBAS is a straightforward application of the forward-backward sampling scheme to the translated DBN presented in figure 2. [sent-207, score-0.049]

84 Then we sample state activation variables from e1 to b1 in the backward order with 1 T respect to forward activation probabilities. [sent-209, score-0.75]

85 We evaluate four methods based on three aspects: execution time, convergence speed, and scalability of the state space size. [sent-210, score-0.185]

86 The horizontal axis shows the logarithmically scaled execution time. [sent-216, score-0.099]

87 Table 2 presents the average execution time for a single iteration. [sent-217, score-0.16]

88 From these results, we can say primarily that FBA outperforms FFB in terms of execution time. [sent-218, score-0.099]

89 The improvement is remarkable, especially for the HHMMs of large state space size because FBA has less time complexity for N and D than FFB has. [sent-219, score-0.126]

90 Log-likelihood (vertical) is shown against the log-scaled execution time (horizontal) to display the execution time necessary to converge the learning of each algorithm. [sent-221, score-0.24]

91 Table 3: Average execution time for a single iteration (ms). [sent-226, score-0.12]

92 The execution time of DGS is less than that of other methods for a single iteration, but this cannot compensate for the low convergence speed. [sent-244, score-0.12]

93 6 Conclusion In this work, we proposed a new inference algorithm for HHMMs based on the activation probability. [sent-247, score-0.302]

94 The forward-backward activation algorithm described herein enables us to conduct unsupervised parameter learning with a practical computational cost for HHMMs of larger state space size. [sent-249, score-0.46]

95 Calculating posterior distributions and modal estimates in markov mixture models. [sent-256, score-0.075]

96 Learning and detecting activities from movement trajectories using the hierarchical hidden markov models. [sent-295, score-0.214]

97 A tutorial on hidden markov models and selected applications in speech recognition. [sent-300, score-0.148]

98 Bayesian methods for hidden markov models: Recursive computing in the 21st century. [sent-305, score-0.148]

99 Learning musical pitch structures with hierarchical hidden markov models. [sent-318, score-0.26]

100 Learning hierarchical hidden markov models for video structure discovery. [sent-326, score-0.214]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hhmms', 0.365), ('hhmm', 0.333), ('bd', 0.319), ('activation', 0.269), ('ftd', 0.238), ('sib', 0.238), ('zt', 0.222), ('ft', 0.176), ('fba', 0.174), ('minsr', 0.174), ('adij', 0.143), ('attening', 0.14), ('null', 0.134), ('fb', 0.132), ('qd', 0.129), ('dgs', 0.127), ('ot', 0.113), ('fbas', 0.111), ('ffb', 0.111), ('execution', 0.099), ('transition', 0.092), ('state', 0.086), ('gadij', 0.079), ('gbiv', 0.079), ('di', 0.077), ('parent', 0.076), ('markov', 0.075), ('hidden', 0.073), ('hmms', 0.072), ('backward', 0.07), ('level', 0.069), ('hierarchical', 0.066), ('adiend', 0.063), ('biv', 0.063), ('flattening', 0.063), ('dbn', 0.062), ('states', 0.06), ('gibbs', 0.058), ('path', 0.058), ('forward', 0.056), ('chain', 0.053), ('terminates', 0.051), ('child', 0.05), ('adz', 0.048), ('cend', 0.048), ('absolute', 0.047), ('fbs', 0.042), ('termination', 0.041), ('end', 0.041), ('presents', 0.04), ('probabilities', 0.038), ('bt', 0.035), ('ms', 0.034), ('inference', 0.033), ('sampling', 0.033), ('herein', 0.032), ('adji', 0.032), ('biot', 0.032), ('ihhmms', 0.032), ('transit', 0.032), ('depth', 0.031), ('sequence', 0.031), ('conduct', 0.03), ('ad', 0.03), ('tree', 0.03), ('sampler', 0.029), ('calculable', 0.028), ('ities', 0.028), ('calculate', 0.026), ('siblings', 0.026), ('direct', 0.025), ('ed', 0.025), ('pitch', 0.024), ('probability', 0.023), ('reuters', 0.023), ('enables', 0.023), ('chains', 0.022), ('musical', 0.022), ('time', 0.021), ('permitted', 0.021), ('initialization', 0.02), ('heller', 0.02), ('latent', 0.02), ('gure', 0.02), ('unsupervised', 0.02), ('estimation', 0.019), ('complexity', 0.019), ('calculation', 0.019), ('em', 0.019), ('calculates', 0.019), ('calculating', 0.019), ('hmm', 0.017), ('japan', 0.017), ('terminate', 0.016), ('translated', 0.016), ('terminated', 0.016), ('dynamic', 0.016), ('notations', 0.016), ('identical', 0.015), ('hierarchy', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

Author: Kei Wakabayashi, Takao Miura

Abstract: Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. We present some experiments to demonstrate that our proposed method works more efficiently to estimate HHMM parameters than do some existing methods such as the flattening method and Gibbs sampling method. 1

2 0.17799577 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz

Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1

3 0.086080857 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

Author: Jake Bouvrie, Jean-jeacques Slotine

Abstract: To learn reliable rules that can generalize to novel situations, the brain must be capable of imposing some form of regularization. Here we suggest, through theoretical and computational arguments, that the combination of noise with synchronization provides a plausible mechanism for regularization in the nervous system. The functional role of regularization is considered in a general context in which coupled computational systems receive inputs corrupted by correlated noise. Noise on the inputs is shown to impose regularization, and when synchronization upstream induces time-varying correlations across noise variables, the degree of regularization can be calibrated over time. The resulting qualitative behavior matches experimental data from visual cortex. 1

4 0.075515755 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

Author: Emanuele Coviello, Gert R. Lanckriet, Antoni B. Chan

Abstract: In this paper, we derive a novel algorithm to cluster hidden Markov models (HMMs) according to their probability distributions. We propose a variational hierarchical EM algorithm that i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a “cluster center”, i.e., a novel HMM that is representative for the group. We illustrate the benefits of the proposed algorithm on hierarchical clustering of motion capture sequences as well as on automatic music tagging. 1

5 0.074941196 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

6 0.071208403 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

7 0.067707025 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.058658168 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

9 0.058584914 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

10 0.051443428 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

11 0.049882758 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

12 0.04665916 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

13 0.045642216 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

14 0.044538297 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

15 0.043718029 300 nips-2012-Scalable nonconvex inexact proximal splitting

16 0.042349212 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

17 0.042325418 23 nips-2012-A lattice filter model of the visual pathway

18 0.042011302 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

19 0.041327663 32 nips-2012-Active Comparison of Prediction Models

20 0.040883433 195 nips-2012-Learning visual motion in recurrent neural networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.115), (1, -0.006), (2, -0.014), (3, 0.037), (4, -0.081), (5, -0.019), (6, -0.007), (7, -0.034), (8, 0.007), (9, -0.004), (10, 0.004), (11, -0.005), (12, -0.002), (13, -0.009), (14, -0.03), (15, -0.053), (16, -0.034), (17, 0.006), (18, 0.04), (19, -0.062), (20, -0.006), (21, 0.045), (22, -0.069), (23, -0.025), (24, -0.03), (25, -0.005), (26, -0.026), (27, -0.047), (28, -0.008), (29, 0.003), (30, 0.017), (31, 0.033), (32, 0.01), (33, 0.006), (34, 0.014), (35, -0.021), (36, -0.111), (37, -0.035), (38, -0.038), (39, 0.033), (40, 0.112), (41, 0.047), (42, -0.076), (43, -0.088), (44, 0.096), (45, 0.1), (46, 0.066), (47, 0.061), (48, -0.054), (49, 0.115)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90347135 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

Author: Kei Wakabayashi, Takao Miura

Abstract: Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. We present some experiments to demonstrate that our proposed method works more efficiently to estimate HHMM parameters than do some existing methods such as the flattening method and Gibbs sampling method. 1

2 0.55111027 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz

Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1

3 0.53528118 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

4 0.50534922 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1

5 0.47509527 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

Author: Michael C. Hughes, Erik B. Sudderth, Emily B. Fox

Abstract: Applications of Bayesian nonparametric methods require learning and inference algorithms which efficiently explore models of unbounded complexity. We develop new Markov chain Monte Carlo methods for the beta process hidden Markov model (BP-HMM), enabling discovery of shared activity patterns in large video and motion capture databases. By introducing split-merge moves based on sequential allocation, we allow large global changes in the shared feature structure. We also develop data-driven reversible jump moves which more reliably discover rare or unique behaviors. Our proposals apply to any choice of conjugate likelihood for observed data, and we show success with multinomial, Gaussian, and autoregressive emission models. Together, these innovations allow tractable analysis of hundreds of time series, where previous inference required clever initialization and lengthy burn-in periods for just six sequences. 1

6 0.44823304 118 nips-2012-Entangled Monte Carlo

7 0.43797374 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

8 0.42856467 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

9 0.42340311 39 nips-2012-Analog readout for optical reservoir computers

10 0.42184579 205 nips-2012-MCMC for continuous-time discrete-state systems

11 0.41992217 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

12 0.39453608 66 nips-2012-Causal discovery with scale-mixture model for spatiotemporal variance dependencies

13 0.39264828 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

14 0.38917372 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

15 0.3882972 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

16 0.3872003 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

17 0.38341528 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

18 0.38323715 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

19 0.38169914 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

20 0.38132045 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.061), (21, 0.021), (37, 0.368), (38, 0.054), (54, 0.039), (55, 0.021), (74, 0.022), (76, 0.122), (80, 0.145), (92, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69898766 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

Author: Kei Wakabayashi, Takao Miura

Abstract: Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. We present some experiments to demonstrate that our proposed method works more efficiently to estimate HHMM parameters than do some existing methods such as the flattening method and Gibbs sampling method. 1

2 0.50397706 185 nips-2012-Learning about Canonical Views from Internet Image Collections

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

3 0.49257162 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

4 0.4831759 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

5 0.4744069 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

6 0.47257772 197 nips-2012-Learning with Recursive Perceptual Representations

7 0.47249651 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

8 0.47010884 279 nips-2012-Projection Retrieval for Classification

9 0.4667967 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

10 0.46673465 200 nips-2012-Local Supervised Learning through Space Partitioning

11 0.46518463 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

12 0.463503 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

13 0.46331754 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

14 0.46052861 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

15 0.45946205 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

16 0.4587605 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

17 0.45868644 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

18 0.45845711 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

19 0.45842686 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

20 0.45528203 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models