nips nips2011 nips2011-179 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp 2 Abstract A multilinear subspace regression model based on so called latent variable decomposition is introduced. [sent-7, score-0.497]
2 Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. [sent-8, score-0.9]
3 It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. [sent-10, score-0.041]
4 Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. [sent-11, score-0.046]
5 The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). [sent-12, score-0.237]
6 1 Introduction The recent progress in sensor technology has made possible a plethora of novel applications, which typically require increasingly large amount of multidimensional data, such as large-scale images, 3D video sequences, and neuroimaging data. [sent-13, score-0.062]
7 To match the data dimensionality, tensors (also called multiway arrays) have been proven to be a natural and efficient representation for such massive data. [sent-14, score-0.178]
8 These desirable properties have made tensor decomposition becoming a promising tool in exploratory data analysis [8, 9, 10, 11]. [sent-16, score-0.427]
9 Its optimization objective is to maximize pairwise covariance of a set of latent variables (also called latent vectors, score vectors) by projecting both X and Y onto a new subspace. [sent-18, score-0.424]
10 A popular way 1 to estimate the model parameters is the Non-linear Iterative Partial Least Squares (NIPALS) [13], an iterative procedure similar to the power method; for an overview of PLS and its applications in multivariate regression analysis see [14, 15, 16]. [sent-19, score-0.041]
11 As an extension of PLS to multiway data, the N way PLS (NPLS) decomposes the independent and dependent data into rank-one tensors, subject to maximum pairwise covariance of the latent vectors [17]. [sent-20, score-0.37]
12 The widely reported sensitivity to noise of PLS is attributed to redundant (irrelevant) latent variables, whose selection remains an open problem. [sent-21, score-0.221]
13 The number of latent variables also dependents on the rank of independent data, resulting in overfitting when the number of observations is smaller than the number of latent variables. [sent-22, score-0.406]
14 Although the standard PLS can also handle an N -way tensor dataset differently, e. [sent-23, score-0.382]
15 applied on a mode-1 matricization of X and Y, this would make it difficult to interpret the loadings as the physical meaning would be lost due to the unfolding. [sent-25, score-0.137]
16 To alleviate these issues, in this study, a new tensor subspace regression model, called the HigerOrder Partial Least Squares (HOPLS), is proposed to predict an M th-order tensor Y from an N thorder tensor X. [sent-26, score-1.366]
17 It considers each data sample as a higher order tensor represented as a linear combination of tensor subspace bases. [sent-27, score-0.896]
18 In addition, the latent variables and tensor subspace can be optimized to ensure a maximum correlation between the latent variables of X and Y with a constraint imposed to ensure a special structure of the core tensor. [sent-29, score-0.967]
19 , KM ) decomposition of Y [18], using multiway singular value decomposition (MSVD) [19]. [sent-36, score-0.185]
20 1 Preliminaries Notation and definitions We denote N th-order tensors (multi-way arrays) by underlined boldface capital letters, matrices (two-way arrays) by boldface capital letters, and vectors by boldface lower-case letters, e. [sent-38, score-0.37]
21 , iN ) of an N th-order tensor X ∈ RI1 ×I2 ×···×IN by xi1 i2 . [sent-44, score-0.382]
22 Indices typically range from 1 to their capital version, e. [sent-51, score-0.039]
23 The nth-mode matricization of a tensor X is denoted by X(n) . [sent-60, score-0.504]
24 The n-mode product of a tensor X ∈ RI1 ×···×In ×···×IN and matrix A ∈ RJn ×In is denoted by Y = X ×n A ∈ RI1 ×···×In−1 ×Jn ×In+1 ×···×IN and is defined as: yi1 i2 . [sent-61, score-0.402]
25 , tR ] ∈ RI×R is a matrix of R extracted orthogonal latent variables from X, that is, TT T = I, and U = [u1 , u2 , . [sent-103, score-0.241]
26 , uR ] ∈ RI×R are latent variables from Y that have maximum covariance with T column-wise. [sent-106, score-0.215]
27 The matrices P and C represent loadings (vector subspace bases) and E and F are residuals. [sent-107, score-0.163]
28 A useful property is that the relation between T and U can be approximated linearly by U ≈ TD, (5) where D is an (R × R) diagonal matrix, and scalars drr = uT tr /tT tr play the role of regression r r coefficients. [sent-108, score-0.217]
29 For an N th-order independent tensor X ∈ RI1 ×···×IN and an M th-order dependent tensor Y ∈ RJ1 ×···×JM , having the same size on the first mode1 , i. [sent-114, score-0.806]
30 , I1 = J1 , similar to PLS, our objective is to find the optimal subspace approximation of X and Y, in which the latent vectors of independent and dependent variables have maximum pairwise covariance. [sent-116, score-0.449]
31 1 Proposed model The new tensor subspace represented by the Tucker model can be obtained by approximating X with a sum of rank-(1, L2 , . [sent-118, score-0.514]
32 1), while dependent data Y are approximated by a sum of rank-(1, K2 , . [sent-122, score-0.042]
33 From the relation between the 1 The first mode is usually associated with the sample mode or time mode, and for each sample, we have an independent data represented by an (N − 1)th-order tensor and a dependent data represented by an (M − 1)thorder tensor. [sent-126, score-0.508]
34 Note that the new tensor subspace for X is spanned by R tensor bases represented by Tucker model {Pr }R = Gr ×2 P(1) ×3 · · ·×N P(N −1) , r=1 r r (7) while the new subspace for Y is represented by Tucker model {Qr }R = Dr ×2 Q(1) ×3 · · ·×N Q(M −1) . [sent-128, score-1.068]
35 , LN ) decomposition in (6) is not unique, however, since MSVD generates both an all-orthogonal core [19] and column-wise orthogonal factors, these can be applied to obtain the unique components of the Tucker decomposition. [sent-132, score-0.116]
36 , tR ] ∈ RI1 ×R , mode-n loading matrix P (n) (n) [P1 , . [sent-139, score-0.116]
37 , QR ] R×RL2 ×···×RLN ∈ R , mode-m loading matrix Q ∈ R , D = and core tensor G = blockdiag(G1 , . [sent-145, score-0.543]
38 The core tensors G and D have a special block-diagonal structure (see Fig. [sent-152, score-0.15]
39 1) whose elements indicate the level of interactions between the corresponding latent vectors and loading matrices. [sent-153, score-0.371]
40 On the other hand, for ∀n : {Ln } = rankn (X) and ∀m : {Km } = rankm (Y)2 , HOPLS obtains the same solution as the standard PLS performed on a mode-1 matricization of X and Y. [sent-155, score-0.136]
41 This is obvious from a matricized form of (6), given by (N X(1) ≈ tr Gr(1) Pr −1) ⊗ · · · ⊗ P(1) r (N −1) where Gr(1) Pr from X(1) . [sent-156, score-0.088]
42 Since the latent vectors can be optimized sequentially with the same 2 rankn (X) = rank X(n) . [sent-162, score-0.289]
43 4 Algorithm 1 The Higher-order Partial Least Squares (HOPLS) Algorithm Input: X ∈ RI1 ×···×IN , Y ∈ RJ1 ×···×JM with I1 = J1 Number of latent vectors is R and number of loading vectors are {Ln }N and {Km }M . [sent-163, score-0.435]
44 , KM ) decomposition of Cr by HOOI [8] as (1) (N −1) (1) (M −1) Cr ≈ [[Hr ; Pr , . [sent-181, score-0.045]
45 , Qr ]]; tr ← the first leading left singular vector by SVD (1)T E r ×2 P r (N −1)T ×3 · · · ×N Pr (1) (N −1)T (1)T ]]; Gr ← [[Er ; tT , Pr , . [sent-187, score-0.11]
46 , Qr ]]; Deflation: (N −1) (1) ]]; Er+1 ← Er − [[Gr ; tr , Pr , . [sent-193, score-0.088]
47 , Pr (M −1) (1) ]]; Fr+1 ← Fr − [[Dr ; tr , Qr , . [sent-196, score-0.088]
48 criteria based on deflation3 , we shall simplify the problem to that of the first latent vector t1 and two (n) (m) groups of loading matrices P1 and Q1 . [sent-200, score-0.325]
49 An objective function employed to determine the tensor bases, represented by P(n) and Q(m) , can be defined as X − [[G; t, P(1) , . [sent-202, score-0.422]
50 {P(n)T P(n) } = ILn+1 , {Q(m)T Q(m) } = IKm+1 , (11) and yields the common latent vector t that best approximates X and Y. [sent-210, score-0.191]
51 The solution can be obtained by maximizing the norm of the core tensors G and D simultaneously. [sent-211, score-0.15]
52 (12) We now define a mode-1 cross-covariance tensor C = COV{1;1} (X, Y) ∈ RI2 ×···×IN ×J2 ×···×JM . [sent-219, score-0.382]
53 According to (11), for a given set of loading matrices {P(n) }, the latent vector t must explain variance of X as much as possible, that is t = arg min X − [[G; t, P(1) , . [sent-235, score-0.325]
54 3 Prediction Predictions of the new observations are performed using the matricization form of data tensors X and Y. [sent-242, score-0.207]
55 Figure 2: Performance comparison between HOPLS, NPLS and PLS, for a varying number of latent vectors under the conditions of noise free (A) and SNR=10dB (B). [sent-244, score-0.311]
56 4 Experimental results We performs two case studies, one on synthetic data which illustrates the benefits of HOPLS, and the other on real-life electrophysiological data. [sent-245, score-0.047]
57 To quantify the predictability the index Q2 was defined I I as Q2 = 1 − i=1 (yi − yi )2 / i=1 (yi − y )2 , where yi denotes the prediction of yi using a model ˆ ¯ ˆ created with the ith sample omitted. [sent-246, score-0.075]
58 1 Simulations on synthetic datasets A simulation study on synthetic datasets was undertaken to evaluate the HOPLS regression method in terms of its predictive ability and effectiveness under different conditions related to small number of samples and noise levels. [sent-248, score-0.185]
59 The HOPLS and NPLS were performed on tensor datasets whereas Figure 3: The optimal performance after choosing an appropriate number of latent vectors. [sent-249, score-0.598]
60 6 PLS was performed on a mode-1 matricization of the corresponding datasets (i. [sent-252, score-0.127]
61 The tensor X was generated from a full-rank standard normal distribution and the tensor Y as a linear combination of X. [sent-255, score-0.764]
62 Noise was added to both independent and dependent datasets to evaluate performance at different noise levels. [sent-256, score-0.097]
63 We considered a 3th-order tensor X and a 3th-order tensor Y, for the case where the sample size was much smaller than the number of predictors, i. [sent-258, score-0.764]
64 2 illustrates the predictive performances on the validation datasets for a varying number of latent vectors. [sent-262, score-0.299]
65 Observe that when the number of latent vectors was equal to the number of samples, both PLS and NPLS had the tendency to be unstable, while HOPLS had no such problems. [sent-263, score-0.255]
66 With an increasing number of latent vectors, HOPLS exhibited enhanced performance while the performance of NPLS and PLS deteriorated due to the noise introduced by excess latent vectors (see Fig. [sent-264, score-0.497]
67 3 illustrates the optimal prediction performances obtained by selecting an appropriate number of latent vectors. [sent-267, score-0.241]
68 The HOPLS outperformed the NPLS and PLS at different noise levels and the superiority of HOPLS was more pronounced in the presence of noise, indicating its enhanced robustness to noise. [sent-268, score-0.093]
69 Figure 4: Stability of the performance of HOPLS, NPLS and PLS for a varying number of latent vectors, under the conditions of (A) SNR=5dB and (B) SNR=0dB. [sent-269, score-0.217]
70 Observe that PLS was sensitive to the number of latent vectors, indicating that the selection of latent vectors is a crucial issue for obtaining an optimal model. [sent-270, score-0.446]
71 Finding the optimal number of latent vectors for unseen test data remains a challenging problem, implying that the stability of prediction performance for a varying number of latent vectors is essential for alleviating the sensitivity of the model. [sent-271, score-0.557]
72 4 illustrates the stable predictive performance of HOPLS for a varying number of latent vectors, this behavior was more pronounced for higher noise levels. [sent-273, score-0.323]
73 2 Decoding ECoG from EEG In the last decade, considerable progress has been made in decoding the movement kinematics (e. [sent-275, score-0.07]
74 trajectories or velocity) from neuronal signals recorded both invasively, such as spiking activity [20] and electrocorticogram (ECoG) [21, 22], and noninvasively- from scalp electroencephalography (EEG) [23]. [sent-277, score-0.185]
75 To extract more information from brain activities, neuroimaging data fusion has also been investigated, whereby mutimodal brain activities were recorded continuously and synchronously. [sent-278, score-0.185]
76 In contrast to the task of decoding the behavioral data from brain activity, in this study, our aim was to decode intracranial ECoG from scalp EEG. [sent-279, score-0.215]
77 Assuming that both ECoG and EEG are related to the same brain sources, we set out to extract the common latent components between EEG and ECoG and examined whether ECoG can be decoded from the corresponding EEG by employing our proposed HOPLS method. [sent-280, score-0.23]
78 ECoG (8×8 grid) and EEG (21 electrodes) were recorded simultaneously at a sample rate of 1024Hz from a human subject during relaxed state. [sent-281, score-0.031]
79 After the preprocessing by spatial filter of common aver7 age reference (CAR), ECoG and EEG signals were transformed into a time-frequency representation and downsampled to 8 Hz by the continuous complex Morlet wavelet transformation with frequency range of 2-150Hz and 2-40Hz, respectively. [sent-282, score-0.045]
80 Thus, our objective was to decode the ECoG dataset comprised in a 4th-order tensor Y (trial × channel × frequency × time) from an EEG dataset contained in a 4th-order tensor X (trial × channel × frequency × time). [sent-284, score-0.879]
81 According to the HOPLS model, the common latent vectors in T can be regarded as brain source components that establish a bridge between EEG and ECoG, while the loading tensors Pr and Qr , r = 1, . [sent-285, score-0.515]
82 , R can be regarded as a set of tensor bases, as shown in Fig. [sent-288, score-0.382]
83 These bases are computed from the training dataset and explain the relationship of spatio-temporal frequency patterns between EEG and ECoG. [sent-290, score-0.06]
84 The decoding model was calibrated from 30 second datasets and was applied to predict the subsequent 30 second datasets. [sent-291, score-0.095]
85 The quality of prediction was evaluated by the values of total correlation coefficients between the predicted and actual time-frequency representation of ECoG, denoted by rvec(Y),vec(Y) . [sent-292, score-0.041]
86 5(B) illustrates the prediction performance by using a different number of latent vectors, ranging from 1 to 8 and compared with the standard PLS performed on a mode-1 matricization of tensors X and Y. [sent-294, score-0.448]
87 The optimal number of latent vectors for HOPLS and PLS were 4 and 1, respectively. [sent-295, score-0.255]
88 Conforming with analysis, HOPLS was more stable for a varying number of latent vectors and outperformed the standard PLS in terms of its predictive ability. [sent-296, score-0.309]
89 Figure 5: (A) The basis of the tensor subspace computed from the spatial, temporal, and spectral representation of EEG and ECoG. [sent-297, score-0.492]
90 (B) The correlation coefficient r between predicted and actual spatio-temporal-frequency representation of ECoG signals for a varying number of latent vectors. [sent-298, score-0.242]
91 5 Conclusion We have introduced the Higher-order Partial Least Squares (HOPLS) framework for tensor subspace regression, whereby data samples are represented in a tensor form, thus providing an natural generalization of the existing Partial Least Squares (PLS) and N -way PLS (NPLS) approaches. [sent-299, score-0.916]
92 Simulation results have demonstrated the superiority and effectiveness of HOPLS over the existing algorithms for different noise levels. [sent-301, score-0.053]
93 A challenging application of decoding intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalography (EEG) (both from human brain) has been studied and the results have demonstrated the large potential of HOPLS for multi-way correlated datasets. [sent-302, score-0.263]
94 General tensor discriminant analysis and Gabor features for gait recognition. [sent-346, score-0.4]
95 Soft modeling by latent variables: The nonlinear iterative partial least squares approach. [sent-405, score-0.315]
96 Partial least squares (PLS) methods for neuroimaging: A tutorial and review. [sent-414, score-0.071]
97 Partial least squares regression and projection on latent structure regression (PLS Regression). [sent-418, score-0.344]
98 Decompositions of a higher-order tensor in block terms - Part II: Definitions and uniqueness. [sent-433, score-0.382]
99 Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys. [sent-459, score-0.103]
100 Prediction of arm movement trajectories from ECoG-recordings in humans. [sent-467, score-0.044]
wordName wordTfidf (topN-words)
[('hopls', 0.477), ('pls', 0.419), ('tensor', 0.382), ('ecog', 0.225), ('latent', 0.191), ('qr', 0.187), ('eeg', 0.171), ('npls', 0.17), ('pr', 0.13), ('gr', 0.119), ('loading', 0.116), ('subspace', 0.11), ('multilinear', 0.11), ('dr', 0.105), ('jn', 0.105), ('tensors', 0.105), ('matricization', 0.102), ('tr', 0.088), ('km', 0.087), ('multiway', 0.073), ('chemometrics', 0.069), ('vectors', 0.064), ('ln', 0.062), ('fr', 0.061), ('tucker', 0.055), ('jm', 0.054), ('partial', 0.053), ('squares', 0.052), ('decoding', 0.052), ('scalp', 0.052), ('electrocorticogram', 0.051), ('intracranial', 0.051), ('msvd', 0.051), ('rkm', 0.051), ('rln', 0.051), ('thorder', 0.051), ('core', 0.045), ('decomposition', 0.045), ('snr', 0.042), ('dependent', 0.042), ('regression', 0.041), ('bases', 0.04), ('brain', 0.039), ('capital', 0.039), ('neuroimaging', 0.037), ('decompositions', 0.037), ('arrays', 0.035), ('boldface', 0.035), ('loadings', 0.035), ('blockdiag', 0.034), ('ikm', 0.034), ('iln', 0.034), ('rankn', 0.034), ('rjn', 0.034), ('tnew', 0.034), ('xnew', 0.034), ('cr', 0.031), ('recorded', 0.031), ('noise', 0.03), ('ation', 0.03), ('tt', 0.029), ('illustrates', 0.029), ('predictive', 0.028), ('orthogonal', 0.026), ('varying', 0.026), ('ur', 0.026), ('electroencephalography', 0.026), ('arm', 0.026), ('cov', 0.025), ('multidimensional', 0.025), ('datasets', 0.025), ('signals', 0.025), ('variables', 0.024), ('superiority', 0.023), ('kolda', 0.023), ('china', 0.023), ('singular', 0.022), ('represented', 0.022), ('letters', 0.021), ('enhanced', 0.021), ('decode', 0.021), ('prediction', 0.021), ('mode', 0.02), ('frequency', 0.02), ('td', 0.02), ('whereby', 0.02), ('denoted', 0.02), ('pronounced', 0.019), ('activities', 0.019), ('least', 0.019), ('synthetic', 0.018), ('movement', 0.018), ('discriminant', 0.018), ('covariates', 0.018), ('objective', 0.018), ('yi', 0.018), ('matrices', 0.018), ('er', 0.018), ('predict', 0.018), ('channel', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
2 0.32291171 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.
3 0.20493533 102 nips-2011-Generalised Coupled Tensor Factorisation
Author: Kenan Y. Yılmaz, Ali T. Cemgil, Umut Simsekli
Abstract: We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedie’s distributions corresponding to βdivergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem. 1
4 0.11890607 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
Author: Le Song, Eric P. Xing, Ankur P. Parikh
Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
5 0.080848046 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
6 0.073970959 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
7 0.071160771 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data
8 0.070414014 68 nips-2011-Demixed Principal Component Analysis
9 0.066785097 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
10 0.059385858 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems
11 0.050750617 258 nips-2011-Sparse Bayesian Multi-Task Learning
12 0.048511066 75 nips-2011-Dynamical segmentation of single trials from population neural data
13 0.047998786 302 nips-2011-Variational Learning for Recurrent Spiking Networks
14 0.047233354 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
15 0.046622753 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
16 0.046240821 86 nips-2011-Empirical models of spiking in neural populations
17 0.044996664 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
18 0.042405568 217 nips-2011-Practical Variational Inference for Neural Networks
19 0.04209324 38 nips-2011-Anatomically Constrained Decoding of Finger Flexion from Electrocorticographic Signals
20 0.041638013 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
topicId topicWeight
[(0, 0.124), (1, 0.017), (2, 0.027), (3, -0.092), (4, -0.065), (5, -0.034), (6, 0.049), (7, -0.03), (8, 0.136), (9, 0.064), (10, 0.059), (11, -0.089), (12, -0.043), (13, 0.005), (14, 0.052), (15, -0.255), (16, -0.242), (17, 0.008), (18, -0.089), (19, -0.104), (20, -0.188), (21, -0.239), (22, -0.073), (23, 0.066), (24, 0.104), (25, 0.003), (26, 0.069), (27, -0.007), (28, -0.037), (29, -0.035), (30, 0.045), (31, -0.063), (32, -0.028), (33, 0.102), (34, -0.112), (35, -0.031), (36, 0.046), (37, -0.051), (38, 0.046), (39, 0.149), (40, 0.1), (41, 0.042), (42, -0.049), (43, 0.04), (44, -0.044), (45, -0.084), (46, 0.004), (47, -0.06), (48, 0.004), (49, 0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.95840931 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
2 0.79737169 102 nips-2011-Generalised Coupled Tensor Factorisation
Author: Kenan Y. Yılmaz, Ali T. Cemgil, Umut Simsekli
Abstract: We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedie’s distributions corresponding to βdivergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem. 1
3 0.79294139 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.
4 0.40809619 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems
Author: Ambuj Tewari, Pradeep K. Ravikumar, Inderjit S. Dhillon
Abstract: A hallmark of modern machine learning is its ability to deal with high dimensional problems by exploiting structural assumptions that limit the degrees of freedom in the underlying model. A deep understanding of the capabilities and limits of high dimensional learning methods under specific assumptions such as sparsity, group sparsity, and low rank has been attsined. Efforts [1,2] are now underway to distill this valuable experience by proposing general unified frameworks that can achieve the twio goals of summarizing previous analyses and enabling their application to notions of structure hitherto unexplored. Inspired by these developments, we propose and analyze a general computational scheme based on a greedy strategy to solve convex optimization problems that arise when dealing with structurally constrained high-dimensional problems. Our framework not only unifies existing greedy algorithms by recovering them as special cases but also yields novel ones. Finally, we extend our results to infinite dimensional settings by using interesting connections between smoothness of norms and behavior of martingales in Banach spaces.
5 0.38691449 86 nips-2011-Empirical models of spiking in neural populations
Author: Jakob H. Macke, Lars Buesing, John P. Cunningham, Byron M. Yu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Neurons in the neocortex code and compute as part of a locally interconnected population. Large-scale multi-electrode recording makes it possible to access these population processes empirically by fitting statistical models to unaveraged data. What statistical structure best describes the concurrent spiking of cells within a local network? We argue that in the cortex, where firing exhibits extensive correlations in both time and space and where a typical sample of neurons still reflects only a very small fraction of the local population, the most appropriate model captures shared variability by a low-dimensional latent process evolving with smooth dynamics, rather than by putative direct coupling. We test this claim by comparing a latent dynamical model with realistic spiking observations to coupled generalised linear spike-response models (GLMs) using cortical recordings. We find that the latent dynamical approach outperforms the GLM in terms of goodness-offit, and reproduces the temporal correlations in the data more accurately. We also compare models whose observations models are either derived from a Gaussian or point-process models, finding that the non-Gaussian model provides slightly better goodness-of-fit and more realistic population spike counts. 1
6 0.37316337 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
7 0.34673601 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
8 0.33924508 301 nips-2011-Variational Gaussian Process Dynamical Systems
9 0.33238336 225 nips-2011-Probabilistic amplitude and frequency demodulation
10 0.33091033 75 nips-2011-Dynamical segmentation of single trials from population neural data
11 0.32320347 68 nips-2011-Demixed Principal Component Analysis
12 0.30972263 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data
14 0.30629149 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
15 0.29623157 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
16 0.26962885 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
17 0.26772657 5 nips-2011-A Denoising View of Matrix Completion
18 0.26479325 107 nips-2011-Global Solution of Fully-Observed Variational Bayesian Matrix Factorization is Column-Wise Independent
19 0.25939736 176 nips-2011-Multi-View Learning of Word Embeddings via CCA
20 0.25343776 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
topicId topicWeight
[(0, 0.027), (4, 0.023), (20, 0.03), (26, 0.425), (31, 0.075), (33, 0.01), (43, 0.038), (45, 0.063), (57, 0.039), (74, 0.038), (83, 0.039), (84, 0.011), (99, 0.083)]
simIndex simValue paperId paperTitle
1 0.91242939 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
2 0.87749821 162 nips-2011-Lower Bounds for Passive and Active Learning
Author: Maxim Raginsky, Alexander Rakhlin
Abstract: We develop unified information-theoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the so-called Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient.” For passive learning, our lower bounds match the upper bounds of Gin´ and Koltchinskii up to constants and generalize analogous results of Mase sart and N´ d´ lec. For active learning, we provide first known lower bounds based e e on the capacity function rather than the disagreement coefficient. 1
3 0.86366016 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
same-paper 4 0.84604353 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
5 0.8131606 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1
6 0.62377536 22 nips-2011-Active Ranking using Pairwise Comparisons
7 0.55944443 21 nips-2011-Active Learning with a Drifting Distribution
8 0.54037499 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
9 0.53717703 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
10 0.53497326 29 nips-2011-Algorithms and hardness results for parallel large margin learning
11 0.52250409 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
12 0.52010632 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits
13 0.51432586 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
14 0.50618541 226 nips-2011-Projection onto A Nonnegative Max-Heap
15 0.50209087 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
16 0.50001365 158 nips-2011-Learning unbelievable probabilities
17 0.49374568 229 nips-2011-Query-Aware MCMC
18 0.49208689 231 nips-2011-Randomized Algorithms for Comparison-based Search
19 0.49122337 256 nips-2011-Solving Decision Problems with Limited Information
20 0.48239434 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning