nips nips2013 nips2013-203 knowledge-graph by maker-knowledge-mining

203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series


Source: pdf

Author: Mark Rogers, Lei Li, Stuart Russell

Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artificial and real datasets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 How can hidden, evolving trends in such data be extracted while preserving the tensor structure? [sent-5, score-0.391]

2 The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. [sent-6, score-0.129]

3 We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. [sent-7, score-0.587]

4 The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. [sent-8, score-0.624]

5 The latent tensors are again evolving with respect to a multilinear projection. [sent-9, score-0.331]

6 This is, perhaps, due to the lack of an agreed-upon tensor model. [sent-15, score-0.374]

7 , temperature, humidity, and wind speed for k=3—are made, then a time series of n × m × l × k tensors is constructed. [sent-19, score-0.196]

8 The daily high, low, opening, closing, adjusted closing, and volume of the stock prices of n multiple companies comprise a time series of 6 × n tensors. [sent-20, score-0.121]

9 A grayscale video sequence is a two-dimensional tensor time series because each frame is a two-dimensional array of pixels. [sent-21, score-0.453]

10 Several queries can be made when one is presented with a tensor time series. [sent-22, score-0.374]

11 The relationships between particular subsets of tensor elements could be of significance. [sent-26, score-0.39]

12 For stock price data, one may investigate how the stock prices of electric car companies affect those of oil companies. [sent-28, score-0.127]

13 Another way to describe the relationships among tensor elements is in terms of their covariances. [sent-30, score-0.39]

14 Equipped with a tabulation of the covariances, one may read off how a given tensor element affects others. [sent-31, score-0.374]

15 Later in this paper, we will define a tensor time series model and a covariance tensor that permits the modeling of general noise relationships among tensor elements. [sent-32, score-1.201]

16 More formally, a tensor X ∈ RI1 ×···×IM is a multidimensional array with elements that can each be indexed by a vector of positive integers. [sent-33, score-0.389]

17 That is, every element Xi1 ···iM ∈ R is uniquely addressed 1 by a vector (i1 , · · · , iM ) such that 1 ≤ im ≤ Im for all m. [sent-34, score-0.204]

18 The simplest tensors are vectors and matrices: vectors are tensors with only a single mode, while matrices are tensors with two modes. [sent-36, score-0.498]

19 We will consider the tensor time series, which is an ordered, finite collection of tensors that all share the same dimensionality. [sent-37, score-0.532]

20 In practice, each member of an observed tensor time series reflects the state of a dynamical system that is measured at discrete epochs. [sent-38, score-0.491]

21 We propose a novel model for tensor time series: the multilinear dynamical system (MLDS). [sent-39, score-0.549]

22 The MLDS explicitly incorporates the dynamics, noise, and tensor structure of the data by juxtaposing concepts in probabilistic graphical models and multilinear algebra. [sent-40, score-0.518]

23 Specifically, the MLDS generalizes the states of the linear dynamical system (LDS) to tensors via a probabilistic variant of the Tucker decomposition. [sent-41, score-0.25]

24 The LDS tracks latent vector states and observed vector sequences; this permits forecasting, estimation of latent states, and modeling of noise but only for vector objects. [sent-42, score-0.085]

25 Meanwhile, the Tucker decomposition of a single tensor computes a latent “core” tensor but has no dynamics or noise capabilities. [sent-43, score-0.825]

26 We show that the MLDS, in fact, generalizes LDS and other well-known vector models to tensors of arbitrary dimensionality. [sent-45, score-0.171]

27 Then a tensor X ∈ RI1 ×···×IM is an element of a tensor-product space. [sent-52, score-0.374]

28 A tensor X may be referenced by either a full vector (i1 , . [sent-53, score-0.374]

29 The vectorization vec(X) ∈ RI1 ···IM is obtained by shaping the tensor into a vector. [sent-73, score-0.374]

30 The matricization mat(A) ∈ RI1 ···IM ×J1 ···JM of a tensor A ∈ RIJ M m−1 is given by mat(A)kl = Ai1 ···iM j1 ···jM , where k = 1 + m=1 n=1 In (im − 1) and l = 1 + M m−1 m=1 n=1 Jn (jm − 1). [sent-77, score-0.391]

31 15  16 The vec and mat operators put tensors in bijective correspondence with vectors and matrices. [sent-81, score-0.871]

32 The factorization of a tensor A ∈ RIJ is given by Ai1 ···iM j1 ···jM = (m) M (m) ∈ RIm ×Jm for all m. [sent-85, score-0.402]

33 The factorization exponentially reduces the m=1 Aim jm , where A 2 M M number of parameters needed to express A from m=1 Im Jm to m=1 Im Jm . [sent-86, score-0.271]

34 Note that tensors in RIJ are not factorizable in general [2]. [sent-88, score-0.217]

35 The product A X of two tensors A ∈ RIJ and X ∈ RJ , where I, J ∈ NM , M ∈ N, is given by (A X)i1 ···iM = j1 ···jM Ai1 ···iM j1 ···jM Xj1 ···jM . [sent-89, score-0.174]

36 The tensor A is called a multilinear operator when it appears in a tensor product as above. [sent-90, score-0.884]

37 Note that this tensor product generalizes the standard matrix-vector product in the case M = 1. [sent-92, score-0.419]

38 We shall primarily work with tensors in their vector and matrix representations. [sent-93, score-0.176]

39 m−1 n=1 In (im − 1) and l = 1 + Ai1 ···iM j1 ···jM Xj1 ···jM = j1 ···jM M m=1 m−1 n=1 (2) Jn (jm − 1) for some mat(A)kl vec(X)l = (mat(A) vec(X))k , l which holds for all 1 ≤ im ≤ Im , 1 ≤ m ≤ M . [sent-103, score-0.204]

40 The Tucker decomposition models a given tensor X ∈ RI1 ×···×IM as the result of a multilinZ. [sent-110, score-0.402]

41 ear transformation that is applied to a latent core tensor Z ∈ RJ1 ×···×JM : X = A The multilinear operator A is a factorizable tensor such that A(3) mat(A) = A(M ) ⊗A(M −1) ⊗· · ·⊗A(1) ,. [sent-111, score-0.963]

42 , J1 = A Figure 1: The Tucker decomposi- · · · = JM = R and only the Zj1 ···jM such that j1 = · · · = jM tion of a third-order tensor X. [sent-118, score-0.374]

43 The CP decomposition expresses X as a sum (m) (M ) (1) R X = r=1 ur ◦ · · · ◦ ur , where ur ∈ RIm for all m and r and ◦ denotes the tensor outer product [3]. [sent-120, score-0.526]

44 3 Random tensors Given I ∈ NM , M ∈ N, we define a random tensor X ∈ RI1 ×···×IM as follows. [sent-126, score-0.532]

45 The definition of the normal distribution on tensors can thus be restated more succinctly as X ∼ N (U, S) ⇐⇒ vec(X) ∼ N (vec U, mat S) . [sent-129, score-0.564]

46 3 We will make use of an important special case of the normal distribution defined on tensors: the multilinear Gaussian distribution. [sent-131, score-0.147]

47 Let I, J ∈ NM , M ∈ N, and suppose the joint distribution of random tensors X ∈ RI and Z ∈ RJ is given by (4). [sent-135, score-0.158]

48 4 Multilinear dynamical system The aim is to develop a model of a tensor time series X1 , . [sent-143, score-0.467]

49 In defining the MLDS, we build upon the results of previous sections by treating each Xn as a random tensor and relating the model components with multilinear transformations. [sent-147, score-0.507]

50 When the MLDS components are vectorized and matricized, an LDS with factorized transition and projection matrices is revealed. [sent-148, score-0.135]

51 Each latent tensor Zn emits an observation Xn ∈ RI1 ×···×IM . [sent-156, score-0.41]

52 The system is initialized by a latent tensor Z1 distributed as Z1 ∼ N (U0 , Q0 ) . [sent-157, score-0.44]

53 (7) Given Zn , 1 ≤ n ≤ N − 1, we generate Zn+1 according to the conditional distribution Zn+1 | Zn ∼ N (A Zn , Q) , (8) where Q is the conditional covariance shared by all Zn , 2 ≤ n ≤ N , and A is the transition tensor which describes the dynamics of the evolving sequence Z1 , . [sent-158, score-0.433]

54 The transition tensor A is factorized into M matrices A(m) , each of which acts on a mode of Zn . [sent-162, score-0.467]

55 XN Xn+1 where R is the covariance shared by all Xn and C is the projection tensor which multilinearly transforms the latent tensor Zn . [sent-170, score-0.828]

56 Like the transition tensor A, the projection tensor C is factorizable, i. [sent-171, score-0.81]

57 By vectorizing each Xn and Zn , the MLDS becomes an LDS with factorized transition and projection matrices mat(A) and mat(C). [sent-175, score-0.1]

58 For the LDS, the transition and projection operators are not factorizable in general [2]. [sent-176, score-0.144]

59 The normal distribution of tensors (3) will facilitate matrix and vector computations rather than compel us to work directly with tensors. [sent-189, score-0.203]

60 , vec ZN , vec XN ) , (10) where vec θ = (vec U0 , mat Q0 , mat Q, mat A, mat R, mat C). [sent-196, score-2.783]

61 It follows that the vectorized MLDS is an LDS that inherits the Kalman filter updates for the E-step and the M-step for all parameters except mat A and mat C. [sent-197, score-0.78]

62 We locally maximize the expected, complete log-likelihood by computing the gradient with respect to the vector v = [vec C (1)T · · · vec C (M )T ]T ∈ R m Im Jm , which is obtained by concatenating the vectorizations of the projection matrices C (m) . [sent-200, score-0.372]

63 The expected, complete log-likelihood (with terms constant with respect to C deleted) can be written as T l(v) =tr Ωmat(C) Ψmat(C) − 2ΦT , (11) N N ˆ where Ω = mat(R)−1 , Ψ = n=1 E(vec Zn vec ZT ), and Φ = n=1 vec (Xn )E(vec Zn )T . [sent-201, score-0.592]

64 In other words, the original tensors Xn are “rotated” so that the mth mode becomes the M th mode. [sent-218, score-0.197]

65 Certain constraints on the MLDS also lead to generalizations of factor analysis, probabilistic principal components analysis (PPCA), the CP decomposition, and the matrix factorization model of collaborative filtering (MF). [sent-222, score-0.095]

66 If A = 0, U0 = 0, and Q0 = Q, then the Xn of the MLDS become m=1 Im and q = independent and identically distributed draws from the multilinear Gaussian distribution. [sent-224, score-0.12]

67 A further constraint on R, mat(R) = ρ2 Idp , yields a multilinear extension of PPCA. [sent-226, score-0.12]

68 Removing the constraints on R and forcing mat(Zn ) = Idq for all n results in a probabilistic CP decomposition in which the tensor elements have general covariances. [sent-227, score-0.426]

69 5 Experimental results To determine how well the MLDS could model tensor time series, the fits of the MLDS were compared to those of the LDS for both synthetic and real data. [sent-229, score-0.39]

70 To avoid unnecessary complexity and highlight the difference between the two models—namely, how the transition and projection operators are defined—the noises in the models are isotropic. [sent-230, score-0.085]

71 The prediction error M of a given n model M for the nth member of a tensor time series X1 , . [sent-235, score-0.46]

72 Each estimate XM is given by XM = n n n n , where E ZM is the estimate of the latent state vec−1 mat CM mat AM vec E ZM Ntrain Ntrain I of the last member of the training sequence. [sent-239, score-1.114]

73 Tesla: Opening, closing, high, low, and volume of the stock prices of 12 car and oil companies (e. [sent-255, score-0.099]

74 6 Related work Several existing models can be fitted to tensor time series. [sent-293, score-0.374]

75 An obvious limitation of the LDS for modeling tensor time series is that the tensor structure is not preserved. [sent-297, score-0.786]

76 Thus, it is less clear how the latent vector space of the LDS relates to the various tensor modes. [sent-298, score-0.41]

77 The net result, as we have shown, is that the LDS requires more parameters than the MLDS to model a given system (assuming it does have tensor structure). [sent-300, score-0.392]

78 7 Dynamic tensor analysis (DTA) and Bayesian probabilistic tensor factorization (BPTF) are explicit models of tensor time series [9, 10]. [sent-301, score-1.212]

79 For DTA, a latent, low-dimensional “core” tensor and a set of projection matrices are learned by processing each member Xn ∈ RI1 ×···×IM of the sequence as (m) follows. [sent-302, score-0.454]

80 For each mode m, the tensor is flattened into a matrix Xn ∈ R( k=m Ik )×Im and then (m)T (m) multiplied by its transpose. [sent-303, score-0.417]

81 The eigenvalue decomposition U ΛU T of the updated S (m) is then computed and the mth projection matrix is given by the first rank S (m) columns of U . [sent-305, score-0.092]

82 After this procedure is carried out for each mode, the core tensor is updated via the multilinear transformation given by the Tucker decomposition. [sent-306, score-0.494]

83 An advantage of DTA over the LDS is that the tensor structure of the data is preserved. [sent-308, score-0.374]

84 A disadvantage is that there is no straightforward way to predict future terms of the tensor time series. [sent-309, score-0.387]

85 Another disadvantage is that there is no mechanism that allows for arbitrary noise relationships among the tensor elements. [sent-310, score-0.416]

86 Other families of isotropic models have been devised that “tensorize” the time dimension by concatenating the tensors in the time series to yield a single new tensor with an additional temporal mode. [sent-312, score-0.57]

87 These models include multilinear principal components analysis [11], the memory-efficient Tucker algorithm [12], and Bayesian tensor analysis [13]. [sent-313, score-0.507]

88 BPTF is a multilinear extension of collaborative filtering models [14, 15, 16] that concatenates the members of the tensor time series (Xn ), Xn ∈ RI1 ×···×IM , to yield a higher-order tensor R ∈ RI1 ×···×IM ×K , where K is the sequence length. [sent-316, score-0.918]

89 , · denotes the tensor inner product and α is a global precision parameter. [sent-323, score-0.39]

90 Bayesian methods are then used to compute the canonical(M ) (1) R decomposition/parallel-factors (CP) decomposition of R: R = r=1 ur ◦· · ·◦ur ◦Tr , where ◦ is (m) the tensor outer product. [sent-324, score-0.438]

91 Each ur is independently drawn from a normal distribution with expectation µm and precision matrix Λm , while each Tr is recursively drawn from a normal distribution with expectation Tr−1 and precision matrix ΛT . [sent-325, score-0.126]

92 Though BPTF supports prediction and general noise models, the latent tensor structure is limited. [sent-327, score-0.447]

93 Other models with anisotropic noise include probabilistic tensor factorization (PTF) [17], tensor probabilistic independent component analysis (TPICA) [18], and generalized coupled tensor factorization (GCTF) [19]. [sent-328, score-1.239]

94 PTF is fit to tensor data by minimizing a heuristic loss function that is expressed as a sum of tensor inner products. [sent-330, score-0.748]

95 TPICA iteratively flattens the tensor of data, executes a matrix model called probabilistic ICA (PICA) as a subroutine, and decouples the factor matrices of the CP decomposition that are embedded in the “mixing matrix” of PICA. [sent-331, score-0.468]

96 GCTF relates a collection of tensors by a hidden layer of disconnected tensors via tensor inner products, drawing analogies to probabilistic graphical models. [sent-332, score-0.714]

97 7 Conclusion We have proposed a novel probabilistic model of tensor time series called the multilinear dynamical system (MLDS), based on a tensor normal distribution. [sent-333, score-1.012]

98 By putting tensors and multilinear operators in bijective correspondence with vectors and matrices in a way that preserves tensor structure, the MLDS is formulated so that it becomes an LDS when its components are vectorized and matricized. [sent-334, score-0.749]

99 In matrix form, the transition and projection tensors can each be written as the Kronecker product of M smaller matrices and thus yield an exponential reduction in model complexity compared to the unfactorized transition and projection matrices of the LDS. [sent-335, score-0.364]

100 A normal distribution for tensor-valued random variables: applications to diffusion tensor MRI. [sent-352, score-0.401]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mlds', 0.547), ('mat', 0.379), ('tensor', 0.374), ('lds', 0.312), ('vec', 0.296), ('jm', 0.231), ('im', 0.204), ('tensors', 0.158), ('zn', 0.129), ('multilinear', 0.12), ('xn', 0.065), ('tesla', 0.06), ('factorizable', 0.059), ('nm', 0.049), ('tucker', 0.046), ('cp', 0.046), ('rij', 0.046), ('bptf', 0.043), ('dta', 0.039), ('slice', 0.038), ('series', 0.038), ('dynamical', 0.037), ('latent', 0.036), ('ur', 0.036), ('jn', 0.034), ('projection', 0.032), ('companies', 0.032), ('ocean', 0.032), ('sst', 0.032), ('ntrain', 0.032), ('em', 0.032), ('transition', 0.03), ('jimeng', 0.029), ('ptf', 0.029), ('tpica', 0.029), ('likelihood', 0.029), ('decomposition', 0.028), ('closing', 0.028), ('factorization', 0.028), ('video', 0.028), ('stock', 0.028), ('normal', 0.027), ('mode', 0.025), ('matrices', 0.024), ('member', 0.024), ('prediction', 0.024), ('rim', 0.024), ('probabilistic', 0.024), ('operators', 0.023), ('prices', 0.023), ('temperature', 0.023), ('vectorized', 0.022), ('opening', 0.021), ('rj', 0.021), ('ij', 0.02), ('dacheng', 0.02), ('gctf', 0.02), ('idq', 0.02), ('matricized', 0.02), ('vectorizations', 0.02), ('epochs', 0.02), ('ri', 0.019), ('matrix', 0.018), ('system', 0.018), ('xm', 0.018), ('dimensionality', 0.018), ('matricization', 0.017), ('attens', 0.017), ('rii', 0.017), ('evolving', 0.017), ('synthetic', 0.016), ('product', 0.016), ('oil', 0.016), ('ppca', 0.016), ('relationships', 0.016), ('bijective', 0.015), ('kronecker', 0.015), ('multidimensional', 0.015), ('marginal', 0.014), ('andriy', 0.014), ('tamara', 0.014), ('mth', 0.014), ('factorized', 0.014), ('climate', 0.014), ('temperatures', 0.014), ('dimensionalities', 0.014), ('components', 0.013), ('vk', 0.013), ('generalizes', 0.013), ('christos', 0.013), ('disadvantage', 0.013), ('grayscale', 0.013), ('noise', 0.013), ('initialized', 0.012), ('covariance', 0.012), ('trained', 0.012), ('ct', 0.012), ('tr', 0.012), ('express', 0.012), ('collaborative', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series

Author: Mark Rogers, Lei Li, Stuart Russell

Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artificial and real datasets. 1

2 0.30125532 11 nips-2013-A New Convex Relaxation for Tensor Completion

Author: Bernardino Romera-Paredes, Massimiliano Pontil

Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1

3 0.23268056 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

Author: Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma

Abstract: In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are interdependent, we introduce auxiliary variables and relax the hard equality constraints by the augmented Lagrange multiplier method. To improve the computational efficiency, we introduce a proximal gradient step to the alternating direction minimization method. We have provided proof for the convergence of the linearized version of the problem which is the inner loop of the overall algorithm. Both simulations and experiments show that our methods are more efficient and effective than previous work. The proposed method can be easily applied to simultaneously rectify and align multiple images or videos frames. In this context, the state-of-the-art algorithms “RASL” and “TILT” can be viewed as two special cases of our work, and yet each only performs part of the function of our method.

4 0.22773308 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

Author: Tzu-Kuo Huang, Jeff Schneider

Abstract: Learning dynamic models from observed data has been a central issue in many scientific studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientific modeling tasks, however, it turns out that reliable sequential data are rather difficult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover firstorder Markov models and hidden Markov models. To the best of our knowledge, this is the first formal guarantee on learning from non-sequence data. Preliminary simulation results confirm our theoretical findings. 1

5 0.22141899 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

Author: Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu

Abstract: Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from O(nr log2 (n)) observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results. 1

6 0.20745918 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

7 0.18104354 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

8 0.12527311 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

9 0.11872735 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations

10 0.10196126 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

11 0.090795688 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

12 0.072389863 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

13 0.070992395 96 nips-2013-Distributed Representations of Words and Phrases and their Compositionality

14 0.059542205 70 nips-2013-Contrastive Learning Using Spectral Methods

15 0.051887266 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

16 0.051434506 105 nips-2013-Efficient Optimization for Sparse Gaussian Process Regression

17 0.049951844 205 nips-2013-Multisensory Encoding, Decoding, and Identification

18 0.041330133 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

19 0.040510498 300 nips-2013-Solving the multi-way matching problem by permutation synchronization

20 0.039791815 75 nips-2013-Convex Two-Layer Modeling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.113), (1, 0.084), (2, 0.08), (3, 0.248), (4, -0.017), (5, -0.27), (6, 0.1), (7, -0.035), (8, 0.185), (9, 0.012), (10, -0.025), (11, -0.092), (12, -0.04), (13, 0.006), (14, -0.001), (15, 0.013), (16, -0.009), (17, -0.028), (18, 0.029), (19, 0.015), (20, 0.054), (21, 0.118), (22, -0.035), (23, 0.036), (24, -0.046), (25, 0.057), (26, -0.044), (27, 0.135), (28, 0.047), (29, -0.029), (30, 0.012), (31, 0.054), (32, -0.014), (33, 0.083), (34, -0.009), (35, 0.056), (36, -0.041), (37, -0.059), (38, 0.046), (39, -0.005), (40, 0.045), (41, -0.017), (42, 0.039), (43, 0.046), (44, -0.063), (45, 0.061), (46, -0.001), (47, 0.008), (48, -0.045), (49, 0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96913469 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series

Author: Mark Rogers, Lei Li, Stuart Russell

Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artificial and real datasets. 1

2 0.82290566 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

Author: Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma

Abstract: In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are interdependent, we introduce auxiliary variables and relax the hard equality constraints by the augmented Lagrange multiplier method. To improve the computational efficiency, we introduce a proximal gradient step to the alternating direction minimization method. We have provided proof for the convergence of the linearized version of the problem which is the inner loop of the overall algorithm. Both simulations and experiments show that our methods are more efficient and effective than previous work. The proposed method can be easily applied to simultaneously rectify and align multiple images or videos frames. In this context, the state-of-the-art algorithms “RASL” and “TILT” can be viewed as two special cases of our work, and yet each only performs part of the function of our method.

3 0.78194141 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

Author: Ryota Tomioka, Taiji Suzuki

Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1

4 0.78068125 11 nips-2013-A New Convex Relaxation for Tensor Completion

Author: Bernardino Romera-Paredes, Massimiliano Pontil

Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1

5 0.70351249 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

Author: Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu

Abstract: Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from O(nr log2 (n)) observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results. 1

6 0.6725598 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

7 0.46396297 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

8 0.42966855 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

9 0.41296127 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

10 0.34308794 70 nips-2013-Contrastive Learning Using Spectral Methods

11 0.28753725 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

12 0.26864761 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

13 0.25940365 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis

14 0.25929779 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

15 0.25412911 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series

16 0.22041622 336 nips-2013-Translating Embeddings for Modeling Multi-relational Data

17 0.21140213 205 nips-2013-Multisensory Encoding, Decoding, and Identification

18 0.21097937 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

19 0.20473246 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

20 0.20383273 300 nips-2013-Solving the multi-way matching problem by permutation synchronization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.014), (16, 0.04), (33, 0.082), (34, 0.104), (41, 0.058), (47, 0.322), (49, 0.022), (56, 0.096), (70, 0.029), (85, 0.029), (89, 0.046), (93, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74498147 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series

Author: Mark Rogers, Lei Li, Stuart Russell

Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artificial and real datasets. 1

2 0.60121685 232 nips-2013-Online PCA for Contaminated Data

Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan

Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1

3 0.56796652 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe

Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1

4 0.49345267 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

Author: Cristina Savin, Peter Dayan, Mate Lengyel

Abstract: It has long been recognised that statistical dependencies in neuronal activity need to be taken into account when decoding stimuli encoded in a neural population. Less studied, though equally pernicious, is the need to take account of dependencies between synaptic weights when decoding patterns previously encoded in an auto-associative memory. We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. We derive optimal network dynamics for recall in the face of synaptic correlations caused by a range of synaptic plasticity rules. These dynamics involve well-studied circuit motifs, such as forms of feedback inhibition and experimentally observed dendritic nonlinearities. We therefore show how addressing the problem of synaptic correlations leads to a novel functional account of key biophysical features of the neural substrate. 1

5 0.49041155 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models

Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf

Abstract: Causal inference uses observational data to infer the causal structure of the data generating system. We study a class of restricted Structural Equation Models for time series that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. This work contains two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we provide general identifiability results. They cover lagged and instantaneous effects that can be nonlinear and unfaithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. We show empirically that when the data are causally insufficient or the model is misspecified, the method avoids incorrect answers. We extend the theoretical and the algorithmic part to situations in which the time series have been measured with different time delays. TiMINo is applied to artificial and real data and code is provided. 1

6 0.48239124 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

7 0.48148397 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

8 0.48124349 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing

9 0.47888505 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

10 0.47881398 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

11 0.47842324 249 nips-2013-Polar Operators for Structured Sparse Estimation

12 0.47615325 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

13 0.47521624 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

14 0.47479254 280 nips-2013-Robust Data-Driven Dynamic Programming

15 0.4745779 184 nips-2013-Marginals-to-Models Reducibility

16 0.47393808 11 nips-2013-A New Convex Relaxation for Tensor Completion

17 0.47391447 86 nips-2013-Demixing odors - fast inference in olfaction

18 0.47374481 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models

19 0.47372028 173 nips-2013-Least Informative Dimensions

20 0.4732908 318 nips-2013-Structured Learning via Logistic Regression