nips nips2013 nips2013-155 knowledge-graph by maker-knowledge-mining

155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

Source: pdf

Author: Tzu-Kuo Huang, Jeff Schneider

Abstract: Learning dynamic models from observed data has been a central issue in many scientiﬁc studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientiﬁc modeling tasks, however, it turns out that reliable sequential data are rather difﬁcult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover ﬁrstorder Markov models and hidden Markov models. To the best of our knowledge, this is the ﬁrst formal guarantee on learning from non-sequence data. Preliminary simulation results conﬁrm our theoretical ﬁndings. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Learning dynamic models from observed data has been a central issue in many scientiﬁc studies or engineering tasks. [sent-5, score-0.186]

2 The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. [sent-6, score-0.177]

3 In quite a few modern scientiﬁc modeling tasks, however, it turns out that reliable sequential data are rather difﬁcult to gather, whereas out-of-order snapshots are much easier to obtain. [sent-7, score-0.234]

4 Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. [sent-8, score-0.225]

5 Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. [sent-9, score-0.123]

6 Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. [sent-10, score-0.176]

7 Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover ﬁrstorder Markov models and hidden Markov models. [sent-11, score-1.155]

8 1 Introduction Learning dynamic models from observed data has been a central issue in many ﬁelds of study, scientiﬁc or engineering tasks. [sent-14, score-0.186]

9 The usual setting is that data are collected sequentially from trajectories of some dynamical system operation, and the goal is to recover parameters of the underlying dynamic model. [sent-15, score-0.314]

10 As pointed out in [7, 8], this situation may appear in the modeling of celestial objects such as galaxies or chronic diseases such as Alzheimer’s, because observations are usually taken from different trajectories (galaxies or patients) at unknown, arbitrary times. [sent-17, score-0.442]

11 Or it may also appear in the study of biological processes, such as cell metabolism under external stimuli, where most measurement techniques are destructive, making it very difﬁcult to repetitively collect observations from the same individual living organisms as they change over time. [sent-18, score-0.192]

12 However, it is much easier to take single snapshots of multiple organisms undergoing the same biological process in a fully asynchronous fashion, hence the lack of timing information. [sent-19, score-0.288]

13 1 As one can imagine, dynamic model learning in a non-sequential setting is much more difﬁcult than in the sequential setting and has not been thoroughly studied. [sent-23, score-0.124]

14 One issue is that the notion of non-sequence data is vague because there can be many different generative processes resulting in non-sequence data. [sent-24, score-0.191]

15 Regardless of the assumptions we make, as long as the resulting optimization problem remains non-convex, formal analysis of learning guarantees is still formidable. [sent-30, score-0.147]

16 We thus propose to take a different approach, based on another long-standing estimation principle: the method of moments (MoM). [sent-31, score-0.206]

17 The basic idea of MoM is to ﬁnd model parameters such that the resulting moments match or resemble the empirical moments. [sent-32, score-0.295]

18 Although many learning algorithms for these models exist, some having been very successful in practice, barely any formal learning guarantee was given until the MoM-based methods were proposed. [sent-35, score-0.124]

19 Such breakthroughs seem surprising, but it turns out that they are mostly based on one crucial property: for quite a few latent variable models, the model parameters can be uniquely determined from spectral decompositions of certain low-order moments of observable quantities. [sent-36, score-0.375]

20 Interestingly, these assumptions bear much similarity to the usual idea behind topic modeling: with the bag-of-words representation which is invariant to word orderings, the task of inferring topics is almost impossible given one single document (no matter how long it is! [sent-39, score-0.214]

21 ), but becomes easier as more documents touching upon various topics become available. [sent-40, score-0.102]

22 For learning dynamic models, what we need in the non-sequence data are multiple sets of observations, where each set contains independent samples generated from its own initial distribution, and the many different initial distributions together cover the entire (hidden) state space. [sent-41, score-0.396]

23 In some of the aforementioned scientiﬁc applications, such as biological studies, this type of assumptions might be realized by running multiple experiments with different initial conﬁgurations or amounts of stimuli. [sent-42, score-0.246]

24 A real p-th p order tensor A is a member of the tensor product space i=1 Rmi of p Euclidean spaces. [sent-47, score-1.022]

25 For a vecp tor x ∈ Rm , we denote by x⊗p := x ⊗ x ⊗ · · · ⊗ x ∈ i=1 Rm its p-th tensor power. [sent-48, score-0.511]

26 A convenient p way to represent A ∈ i=1 Rm is through a p-way array of real numbers [Ai1 i2 ···ip ]1≤i1 ,i2 ,. [sent-49, score-0.118]

27 With this representation, we can view A as a multi-linear map that, given a set of p matrices p mi with {Xi ∈ Rm×mi }p , produces another p-th order tensor A(X1 , X2 , · · · , Xp ) ∈ i=1 i=1 R the following p-way array representation: A(X1 , X2 , · · · , Xp )i1 i2 ···ip := 1≤j1 ,j2 ,. [sent-56, score-0.634]

28 (1) Figure 1: Running example of Markov chain with three states In this work we consider tensors that are up to the third-order (p ≤ 3) and, for most of the time, also symmetric, meaning that their p-way array representations are invariant under permutations of array indices. [sent-60, score-0.402]

29 We solve this estimation problem with the tensor decomposition method recently proposed by Anandkumar et al. [sent-64, score-0.663]

30 The key component of this method is a novel tensor power iteration procedure for factorizing a symmetric orthogonal tensor, which is robust against input perturbation. [sent-67, score-0.511]

31 3 Learning from Non-sequence Data We ﬁrst describe a generative process of non-sequence data for ﬁrst-order Markov models and demonstrate how to apply tensor decomposition methods to perform consistent learning. [sent-68, score-0.838]

32 Then we extend these ideas to hidden Markov models and provide theoretical guarantees on the sample complexity of the proposed learning algorithm. [sent-69, score-0.202]

33 1 First-order Markov Models Let P ∈ [0, 1]m×m be the transition probability matrix of a discrete, ﬁrst-order, ergodic Markov chain with m states and a unique stationary distribution π. [sent-73, score-0.422]

34 Consistent estimation in this case is also easy: the empirical distribution of Dj consistently estimates Pj , the j-th column of P . [sent-88, score-0.153]

35 For 3 example, the Markov chain in Figure 1 may produce the following three samples, whose empirical distributions estimate the three columns of P respectively: D1 = {2, 1, 2, 2, 2, 2, 2, 2, 2, 2} D2 = {3, 3, 2, 3, 2, 3, 3, 2, 3, 3} D3 = {1, 1, 3, 1, 3, 3, 1, 3, 3, 1} ⇒ P1 = [0. [sent-89, score-0.159]

36 A nice property of these estimates is that, unlike in the sequential setting, they do not depend on any particular ordering of the observations in each set. [sent-99, score-0.163]

37 Nevertheless, such data are not quite nonsequenced because all observations are made at exactly the next time step. [sent-100, score-0.137]

38 , independent samples of states drawn at unknown future times {t1 , t2 , . [sent-106, score-0.143]

39 Therefore, as detailed later, we make a speciﬁc distributional assumption on {ti } to enable unique recovery of the transition matrix P from T (Assumption A. [sent-126, score-0.212]

40 Next we consider a further generalization, where the unknowns are not only the time stamps of the observations, but also the initial state j. [sent-128, score-0.198]

41 In other words, we only know each set was generated from the same initial state, but do not know the actual initial state. [sent-129, score-0.224]

42 In this case, the empirical distributions of the sets consistently estimate the columns of T in some unknown permutation Π: DΠ(3) = {1, 1, 3, 1, 2, 3, 2, 3, 3, 2} ⇒ DΠ(2) = {3, 3, 2, 3, 2, 1, 3, 2, 3, 1} ⇒ DΠ(1) = {2, 1, 2, 3, 2, 3, 3, 2, 2, 3} ⇒ TΠ(3) = [0. [sent-130, score-0.245]

43 In order to be able to identify Π, we will again resort to randomness and assume the unknown initial states are random variables following a certain distribution (Assumption A. [sent-141, score-0.255]

44 Finally, we generalize from a single unknown initial state to an unknown (t ) (t ) initial state distribution, where each set of observations D := {x1 1 , x2 2 , . [sent-143, score-0.554]

45 | π (0) } consists of independent samples of states drawn at random times from some unknown initial state distribution π (0) . [sent-146, score-0.341]

46 With this ﬁnal generalization, most would agree that the generated data are non-sequenced and that the generative process is ﬂexible enough to model the real-world situations described in Section 1. [sent-150, score-0.133]

47 However, simple estimation with empirical distributions no longer works because each set may now contain observations from multiple initial states. [sent-151, score-0.219]

48 This is where we take advantage of the tensor 4 decomposition framework outlined in Section 2, which requires proper assumptions on the initial state distribution π (0) (Assumption A. [sent-152, score-0.926]

49 Now we are ready to give the deﬁnition of our entire generative process. [sent-154, score-0.133]

50 Assume we have N sets of non-sequence data each containing n observations, and each set of observations {xi }n were i=1 independently generated by the following: • Draw an initial distribution π (0) ∼ Dirichlet(α), m E[π (0) ] = α/( i=1 αi ) = π, πi = πj ∀ i = j. [sent-155, score-0.174]

51 , n, – Draw a discrete time ti ∼ Geometric(r), ti ∈ {1, 2, 3, . [sent-159, score-0.28]

52 – Draw an initial state si ∼ Multinomial(π 0 ), si ∈ {0, 1}m . [sent-163, score-0.288]

53 – Draw an observation xi ∼ Multinomial(P ti si ), xi ∈ {0, 1}m . [sent-164, score-0.185]

54 First, all the data points in the same set share the same initial state distribution but can have different initial states; the initial state distribution varies across different sets and yet centers at the stationary distribution of the Markov chain. [sent-169, score-0.554]

55 As mentioned in Section 1, this may be achieved in biological studies by running multiple experiments with different input stimuli, so the data collected in the same experiment can be assumed to have the same initial state distribution. [sent-170, score-0.305]

56 To use the tensor decomposition method in Appendix A, we need the tensor structure (2) in certain low-order moments of observed quantities. [sent-173, score-1.38]

57 Deﬁne the expected transition probability matrix T := Et [P t ] = rP (I − (1 − r)P )−1 and let α0 := i αi , C2 := E[x1 x⊤ ] and C3 := E[x1 ⊗ x2 ⊗ x3 ]. [sent-175, score-0.17]

58 1, which relies on the special structure in the moments of the Dirichlet distribution (Assumption A. [sent-178, score-0.206]

59 It is clear that M2 and M3 have the desired tensor structure. [sent-180, score-0.511]

60 Assuming α0 is known, we can form estimates M2 and M3 by computing empirical moments from the data. [sent-181, score-0.314]

61 Interestingly, these low-order moments have a very similar structure to those in Latent Dirichlet Allocation [1]. [sent-183, score-0.206]

62 The last property is what distinguishes our generative process from a general LDA model: because both the words and the topics correspond to the states, the topic matrix is no longer invariant to column permutations. [sent-185, score-0.293]

63 Since the tensor decomposition method may return T under any column permutation, we need to recover the correct matching between its rows and columns. [sent-186, score-0.714]

64 Note that the π returned by the tensor decomposition method undergoes the same permutation as T ’s columns. [sent-187, score-0.783]

65 2, we may recover the correct matching by sorting both the returned π and the mean π of all data. [sent-189, score-0.155]

66 However, if the true transition matrix P has at least one zero entry, then unique recovery is possible: 5 Theorem 2. [sent-192, score-0.17]

67 Let P ∗ , r∗ , T ∗ and π ∗ denote the true values of the transition probability matrix, the success probability, the expected transition matrix, and the stationary distribution, respectively. [sent-193, score-0.32]

68 We give sample complexity bound on estimation error in the next section for hidden Markov models. [sent-205, score-0.16]

69 2 Hidden Markov Models Let P and π now be deﬁned over the hidden discrete state space of size k and have the same properties as the ﬁrst-order Markov model. [sent-207, score-0.246]

70 The generative process here is almost identical to (and therefore share the same interpretation with) the one in Section 3. [sent-208, score-0.133]

71 1, except for an extra mapping from the discrete hidden state to a continuous observation space: • Draw a state indicator vector hi ∼ Multinomial(P ti si ), hi ∈ {0, 1}k . [sent-209, score-0.611]

72 • Draw an observation: xi = U hi + ǫi , where U ∈ Rm×k denotes a rank-k matrix of mean observation vectors for the k hidden states, and the random noise vectors ǫi ’s are i. [sent-210, score-0.261]

73 Note that a spherical covariance2 is required for the tensor decomposition method to be applicable. [sent-213, score-0.727]

74 The low-order moments that lead to the desired tensor structure are given in the following: Theorem 3. [sent-214, score-0.717]

75 Deﬁne the expected hidden state transition matrix T := Et [P t ] = rP (I − (1 − r)P )−1 ⊗3 ⊤ ⊤ and let α0 := i αi , V1 := E[x1 ], V2 := E[x1 x1 ], V3 := E[x1 ], C2 := E[x1 x2 ] and C3 := E[x1 ⊗ x2 ⊗ x3 ]. [sent-215, score-0.416]

76 i Intuitively the jump should be easier to locate as P gets sparser, but we do not have a formal result. [sent-217, score-0.176]

77 2 We may allow different covariances σj I for different hidden states. [sent-218, score-0.16]

78 6 Algorithm 1 Tensor decomposition method for learning HMM from non-sequence data input N sets of non-sequence data points, the success probability r, the Dirichlet parameter α0 , the number of hidden states k, and numbers of iterations L and N. [sent-221, score-0.487]

79 output Estimates π, P and U possibly under permutation of state labels. [sent-222, score-0.144]

80 1 (Appendix A) on M2 and M3 with the number of hidden states k to obtain a symmetric tensor T ∈ Rk×k×k and a whitening transformation W ∈ Rm×k . [sent-225, score-0.815]

81 2 (Appendix A) k times each with numbers of iterations L and N, the input tensor in the ﬁrst run set to T and in each subsequent run set to the deﬂated tensor returned by ˆ the previous run, resulting in k pairs of eigenvalue/eigenvector {(λi , vi )}k . [sent-227, score-1.172]

82 ˆi 5: Repeat Steps 4 and 5 on M2 3 i i=1 ˆ ′ ˆ ˆ ˆ 6: Match {(λi , vi )}k with {(λ′ , vi )}k by sorting {λi }k and {λ′ }k . [sent-229, score-0.142]

83 This theorem suggests that, unlike ﬁrst-order Markov models, HMMs require two applications of the tensor decomposition methods, one on M2 and M3 for extracting ′ ′ the mean observation vectors U , and the other on M2 and M3 for extracting the matrix product U T . [sent-234, score-0.754]

84 Another issue is that the estimates for M2 and M3 require an estimate for the noise variance σ 2 , which is not directly observable. [sent-235, score-0.121]

85 Nevertheless, since M2 and M3 are in the form of low-order moments of spherical Gaussian mixtures, we may use the existing result (Theorem 3. [sent-236, score-0.27]

86 The situation regarding permutations of the estimates is also different here. [sent-238, score-0.147]

87 First note that P = (rU +(1−r)U T )† U T, which implies that permuting the columns of U and the columns of U T in the same manner has the effect of permuting both the rows and the columns of P , essentially re-labeling the hidden states. [sent-239, score-0.411]

88 By the assumption that πi ’s are all different, we can sort the two estimates π and π ′ to match the columns of U and U T , and obtain P if r is known. [sent-241, score-0.198]

89 Combining the perturbation bounds of the tensor decomposition method (Appendix A), the whitening procedure (Appendix D. [sent-244, score-0.772]

90 1) and the matrix pseudoinverse [10], and concentration bounds on empirical moments (Appendix D. [sent-245, score-0.305]

91 1 (Appendix A), and the number of hidden states k, the success probability r, and the Dirichlet parameter α0 are all given. [sent-249, score-0.297]

92 As shown in our proof, as long as the operator norm of the tensor perturbation is sufﬁciently smaller than δmin , which measures the gaps between different πi ’s, we can correctly match the two sets of estimated tensor eigenvalues. [sent-256, score-1.126]

93 For the generative process, we set α0 = 1, r = 0. [sent-265, score-0.133]

94 Figure 2(a) plots the relative matrix estimation error (in spectral norm) against the sample size N for P , U , and U T obtained by Algorithm 1 given the true r. [sent-272, score-0.142]

95 5 Conclusions We have demonstrated that under reasonable assumptions, tensor decomposition methods can provably learn ﬁrst-order Markov models and hidden Markov models from non-sequence data. [sent-281, score-0.948]

96 We believe this is the ﬁrst formal guarantee on learning dynamic models in a non-sequential setting. [sent-282, score-0.21]

97 No matter what distribution generates the random time steps, tensor decomposition methods can always learn the expected transition probability matrix T . [sent-284, score-0.833]

98 The proposed algorithm can be modiﬁed to learn discrete HMMs under a similar generative process. [sent-286, score-0.133]

99 A method of moments for mixture models and hidden Markov models. [sent-315, score-0.408]

100 Learning mixtures of spherical gaussians: moment methods and spectral decompositions. [sent-334, score-0.152]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tensor', 0.511), ('moments', 0.206), ('hidden', 0.16), ('markov', 0.158), ('ru', 0.153), ('decomposition', 0.152), ('ti', 0.14), ('galaxies', 0.133), ('generative', 0.133), ('transition', 0.116), ('mom', 0.113), ('rabbat', 0.113), ('initial', 0.112), ('appendix', 0.112), ('anandkumar', 0.101), ('snapshots', 0.1), ('states', 0.095), ('hmm', 0.094), ('alzheimer', 0.092), ('spectral', 0.088), ('ip', 0.087), ('dynamic', 0.086), ('state', 0.086), ('dj', 0.085), ('ijk', 0.082), ('formal', 0.082), ('diag', 0.082), ('array', 0.08), ('dirichlet', 0.079), ('nonsequenced', 0.075), ('hsu', 0.07), ('rm', 0.07), ('biological', 0.069), ('chronic', 0.066), ('assumptions', 0.065), ('chain', 0.065), ('spherical', 0.064), ('estimates', 0.063), ('topic', 0.062), ('observations', 0.062), ('returned', 0.062), ('draw', 0.062), ('organisms', 0.061), ('perturbation', 0.06), ('scienti', 0.059), ('permutation', 0.058), ('easier', 0.058), ('issue', 0.058), ('xp', 0.057), ('prob', 0.055), ('matrix', 0.054), ('huang', 0.053), ('permuting', 0.052), ('diseases', 0.052), ('recover', 0.051), ('jp', 0.05), ('hmms', 0.05), ('vi', 0.05), ('columns', 0.049), ('whitening', 0.049), ('arxiv', 0.049), ('multinomial', 0.048), ('trajectories', 0.048), ('unknown', 0.048), ('dynamical', 0.048), ('hi', 0.047), ('stationary', 0.046), ('ergodic', 0.046), ('si', 0.045), ('consistently', 0.045), ('empirical', 0.045), ('ge', 0.045), ('pij', 0.045), ('match', 0.044), ('latent', 0.044), ('topics', 0.044), ('situation', 0.043), ('usual', 0.043), ('mi', 0.043), ('pittsburgh', 0.042), ('assumption', 0.042), ('models', 0.042), ('success', 0.042), ('sorting', 0.042), ('provably', 0.041), ('simulation', 0.041), ('tensors', 0.041), ('permutations', 0.041), ('sequential', 0.038), ('numbers', 0.038), ('modeling', 0.038), ('collected', 0.038), ('perturbed', 0.038), ('preprint', 0.037), ('mostly', 0.037), ('ri', 0.037), ('theorem', 0.037), ('lda', 0.036), ('gets', 0.036), ('projection', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

Author: Tzu-Kuo Huang, Jeff Schneider

2 0.39066082 11 nips-2013-A New Convex Relaxation for Tensor Completion

Author: Bernardino Romera-Paredes, Massimiliano Pontil

Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves signiﬁcantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1

3 0.29696161 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

Author: Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu

Abstract: Tensor completion from incomplete observations is a problem of signiﬁcant practical interest. However, it is unlikely that there exists an efﬁcient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Speciﬁcally, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from O(nr log2 (n)) observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative ﬁltering task and obtain state-of-the-art results. 1

4 0.29619852 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

Author: Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma

Abstract: In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are interdependent, we introduce auxiliary variables and relax the hard equality constraints by the augmented Lagrange multiplier method. To improve the computational efﬁciency, we introduce a proximal gradient step to the alternating direction minimization method. We have provided proof for the convergence of the linearized version of the problem which is the inner loop of the overall algorithm. Both simulations and experiments show that our methods are more efﬁcient and effective than previous work. The proposed method can be easily applied to simultaneously rectify and align multiple images or videos frames. In this context, the state-of-the-art algorithms “RASL” and “TILT” can be viewed as two special cases of our work, and yet each only performs part of the function of our method.

5 0.27843812 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

Author: Akshay Krishnamurthy, Aarti Singh

Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1

6 0.27058598 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

7 0.22773308 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series

8 0.19771312 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

9 0.19210991 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

10 0.17237426 70 nips-2013-Contrastive Learning Using Spectral Methods

11 0.15258385 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

12 0.14871103 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

13 0.14700586 66 nips-2013-Computing the Stationary Distribution Locally

14 0.11844852 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

15 0.11505794 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series

16 0.096359812 185 nips-2013-Matrix Completion From any Given Set of Observations

17 0.093654156 292 nips-2013-Sequential Transfer in Multi-armed Bandit with Finite Set of Models

18 0.091105901 300 nips-2013-Solving the multi-way matching problem by permutation synchronization

19 0.090966448 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

20 0.090433337 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.297), (1, 0.122), (2, 0.09), (3, 0.335), (4, 0.007), (5, -0.277), (6, 0.174), (7, -0.059), (8, 0.216), (9, -0.007), (10, 0.034), (11, -0.075), (12, -0.009), (13, 0.051), (14, 0.009), (15, 0.03), (16, 0.054), (17, -0.048), (18, 0.029), (19, 0.032), (20, 0.014), (21, 0.082), (22, -0.039), (23, 0.017), (24, -0.03), (25, 0.015), (26, -0.065), (27, 0.114), (28, -0.118), (29, -0.07), (30, 0.031), (31, 0.122), (32, -0.092), (33, -0.008), (34, -0.057), (35, -0.019), (36, -0.072), (37, 0.025), (38, 0.061), (39, -0.034), (40, 0.031), (41, 0.029), (42, -0.056), (43, 0.017), (44, -0.041), (45, -0.013), (46, 0.0), (47, -0.018), (48, -0.023), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93782139 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

Author: Tzu-Kuo Huang, Jeff Schneider

2 0.89246488 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series

Author: Mark Rogers, Lei Li, Stuart Russell

Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artiﬁcial and real datasets. 1

3 0.82790577 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

Author: Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma

4 0.79616761 11 nips-2013-A New Convex Relaxation for Tensor Completion

Author: Bernardino Romera-Paredes, Massimiliano Pontil

5 0.78373969 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

Author: Ryota Tomioka, Taiji Suzuki

Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a speciﬁc unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We conﬁrm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1

6 0.75283062 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

7 0.66390145 70 nips-2013-Contrastive Learning Using Spectral Methods

8 0.60516596 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

9 0.58814335 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

10 0.51321197 299 nips-2013-Solving inverse problem of Markov chain with partial observations

11 0.50655282 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

12 0.45957261 66 nips-2013-Computing the Stationary Distribution Locally

13 0.45737198 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

14 0.45236748 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

15 0.45041049 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

16 0.43998429 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series

17 0.4380061 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis

18 0.41755554 352 nips-2013-What do row and column marginals reveal about your dataset?

19 0.40251195 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers

20 0.3951917 134 nips-2013-Graphical Models for Inference with Missing Data

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.031), (33, 0.122), (34, 0.078), (41, 0.014), (49, 0.022), (56, 0.57), (70, 0.022), (85, 0.033), (89, 0.022), (93, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99641764 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression

Author: Samory Kpotufe, Vikas Garg

Abstract: We present the ﬁrst result for kernel regression where the procedure adapts locally at a point x to both the unknown local dimension of the metric space X and the unknown H¨ lder-continuity of the regression function at x. The result holds with o high probability simultaneously at all points x in a general metric space X of unknown structure. 1

2 0.99413192 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

Author: Amit Daniely, Nati Linial, Shai Shalev-Shwartz

Abstract: The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the ﬁrst positive answer to this question for a natural supervised learning problem — we consider agnostic PAC learning of halfspaces over 3-sparse vectors in {−1, 1, 0}n . This class is inefﬁciently learnable using O n/ 2 examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random 3CNF formulas is hard, it is impossible to efﬁciently learn this class using only O n/ 2 examples. We further show that under stronger hardness assumptions, even O n1.499 / 2 examples do not sufﬁce. On the other hand, we show a new algorithm that learns this class efﬁciently ˜ using Ω n2 / 2 examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem. 1

3 0.98358554 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

Author: Franz Kiraly, Louis Theran

Abstract: We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries. We describe: effective algorithms for deciding if and entry can be reconstructed and, if so, for reconstructing and denoising it; and a priori bounds on the error of each entry, individually. In the noiseless case our algorithm is exact. For rank-one matrices, the new algorithm is fast, admits a highly-parallel implementation, and produces an error minimizing estimate that is qualitatively close to our theoretical and the state-of-the-art Nuclear Norm and OptSpace methods. 1

same-paper 4 0.97758609 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

Author: Tzu-Kuo Huang, Jeff Schneider

5 0.9753235 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration

Author: Dan Russo, Benjamin Van Roy

Abstract: This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper conﬁdence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, can be analyzed in the same manner as optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for speciﬁc model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models. 1

6 0.96386492 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

7 0.95312279 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables

8 0.94586307 269 nips-2013-Regression-tree Tuning in a Streaming Setting

9 0.93521518 103 nips-2013-Efficient Exploration and Value Function Generalization in Deterministic Systems

10 0.90782845 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

11 0.89997768 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

12 0.88918787 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

13 0.87934661 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

14 0.87498122 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

15 0.87170106 66 nips-2013-Computing the Stationary Distribution Locally

16 0.87160027 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries

17 0.87036473 230 nips-2013-Online Learning with Costly Features and Labels

18 0.8636232 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

19 0.8592577 1 nips-2013-(More) Efficient Reinforcement Learning via Posterior Sampling

20 0.84928906 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence