nips nips2004 nips2004-124 knowledge-graph by maker-knowledge-mining

124 nips-2004-Multiple Alignment of Continuous Time Series

Source: pdf

Author: Jennifer Listgarten, Radford M. Neal, Sam T. Roweis, Andrew Emili

Abstract: Multiple realizations of continuous-valued time series from a stochastic process often contain systematic variations in rate and amplitude. To leverage the information contained in such noisy replicate sets, we need to align them in an appropriate way (for example, to allow the data to be properly combined by adaptive averaging). We present the Continuous Proﬁle Model (CPM), a generative model in which each observed time series is a non-uniformly subsampled version of a single latent trace, to which local rescaling and additive noise are applied. After unsupervised training, the learned trace represents a canonical, high resolution fusion of all the replicates. As well, an alignment in time and scale of each observation to this trace can be found by inference in the model. We apply CPM to successfully align speech signals from multiple speakers and sets of Liquid Chromatography-Mass Spectrometry proteomic data. 1 A Proﬁle Model for Continuous Data When observing multiple time series generated by a noisy, stochastic process, large systematic sources of variability are often present. For example, within a set of nominally replicate time series, the time axes can be variously shifted, compressed and expanded, in complex, non-linear ways. Additionally, in some circumstances, the scale of the measured data can vary systematically from one replicate to the next, and even within a given replicate. We propose a Continuous Proﬁle Model (CPM) for simultaneously analyzing a set of such time series. In this model, each time series is generated as a noisy transformation of a single latent trace. The latent trace is an underlying, noiseless representation of the set of replicated, observable time series. Output time series are generated from this model by moving through a sequence of hidden states in a Markovian manner and emitting an observable value at each step, as in an HMM. Each hidden state corresponds to a particular location in the latent trace, and the emitted value from the state depends on the value of the latent trace at that position. To account for changes in the amplitude of the signals across and within replicates, the latent time states are augmented by a set of scale states, which control how the emission signal will be scaled relative to the value of the latent trace. During training, the latent trace is learned, as well as the transition probabilities controlling the Markovian evolution of the scale and time states and the overall noise level of the observed data. After training, the latent trace learned by the model represents a higher resolution ’fusion’ of the experimental replicates. Figure 1 illustrate the model in action. Unaligned, Linear Warp Alignment and CPM Alignment Amplitude 40 30 20 10 0 50 Amplitude 40 30 20 10 Amplitude 0 30 20 10 0 Time a) b) Figure 1: a) Top: ten replicated speech energy signals as described in Section 4), Middle: same signals, aligned using a linear warp with an offset, Bottom: aligned with CPM (the learned latent trace is also shown in cyan). b) Speech waveforms corresponding to energy signals in a), Top: unaligned originals, Bottom: aligned using CPM. 2 Deﬁning the Continuous Proﬁle Model (CPM) The CPM is generative model for a set of K time series, xk = (xk , xk , ..., xk k ). The 1 2 N temporal sampling rate within each xk need not be uniform, nor must it be the same across the different xk . Constraints on the variability of the sampling rate are discussed at the end of this section. For notational convenience, we henceforth assume N k = N for all k, but this is not a requirement of the model. The CPM is set up as follows: We assume that there is a latent trace, z = (z1 , z2 , ..., zM ), a canonical representation of the set of noisy input replicate time series. Any given observed time series in the set is modeled as a non-uniformly subsampled version of the latent trace to which local scale transformations have been applied. Ideally, M would be inﬁnite, or at least very large relative to N so that any experimental data could be mapped precisely to the correct underlying trace point. Aside from the computational impracticalities this would pose, great care to avoid overﬁtting would have to be taken. Thus in practice, we have used M = (2 + )N (double the resolution, plus some slack on each end) in our experiments and found this to be sufﬁcient with < 0.2. Because the resolution of the latent trace is higher than that of the observed time series, experimental time can be made effectively to speed up or slow down by advancing along the latent trace in larger or smaller jumps. The subsampling and local scaling used during the generation of each observed time series are determined by a sequence of hidden state variables. Let the state sequence for observation k be π k . Each state in the state sequence maps to a time state/scale state pair: k πi → {τik , φk }. Time states belong to the integer set (1..M ); scale states belong to an i ordered set (φ1 ..φQ ). (In our experiments we have used Q=7, evenly spaced scales in k logarithmic space). States, πi , and observation values, xk , are related by the emission i k probability distribution: Aπi (xk |z) ≡ p(xk |πi , z, σ, uk ) ≡ N (xk ; zτik φk uk , σ), where σ k i i i i is the noise level of the observed data, N (a; b, c) denotes a Gaussian probability density for a with mean b and standard deviation c. The uk are real-valued scale parameters, one per observed time series, that correct for any overall scale difference between time series k and the latent trace. To fully specify our model we also need to deﬁne the state transition probabilities. We deﬁne the transitions between time states and between scale states separately, so that k Tπi−1 ,πi ≡ p(πi |πi−1 ) = p(φi |φi−1 )pk (τi |τi−1 ). The constraint that time must move forward, cannot stand still, and that it can jump ahead no more than Jτ time states is enforced. (In our experiments we used Jτ = 3.) As well, we only allow scale state transitions between neighbouring scale states so that the local scale cannot jump arbitrarily. These constraints keep the number of legal transitions to a tractable computational size and work well in practice. Each observed time series has its own time transition probability distribution to account for experiment-speciﬁc patterns. Both the time and scale transition probability distributions are given by multinomials:  dk , if a − b = 1  1  k  d2 , if a − b = 2   k . p (τi = a|τi−1 = b) = . .  k d , if a − b = J  τ  Jτ  0, otherwise p(φi = a|φi−1  s0 , if D(a, b) = 0   s1 , if D(a, b) = 1 = b) =  s1 , if D(a, b) = −1  0, otherwise where D(a, b) = 1 means that a is one scale state larger than b, and D(a, b) = −1 means that a is one scale state smaller than b, and D(a, b) = 0 means that a = b. The distributions Jτ are constrained by: i=1 dk = 1 and 2s1 + s0 = 1. i Jτ determines the maximum allowable instantaneous speedup of one portion of a time series relative to another portion, within the same series or across different series. However, the length of time for which any series can move so rapidly is constrained by the length of the latent trace; thus the maximum overall ratio in speeds achievable by the model between any two entire time series is given by min(Jτ , M ). N After training, one may examine either the latent trace or the alignment of each observable time series to the latent trace. Such alignments can be achieved by several methods, including use of the Viterbi algorithm to ﬁnd the highest likelihood path through the hidden states [1], or sampling from the posterior over hidden state sequences. We found Viterbi alignments to work well in the experiments below; samples from the posterior looked quite similar. 3 Training with the Expectation-Maximization (EM) Algorithm As with HMMs, training with the EM algorithm (often referred to as Baum-Welch in the context of HMMs [1]), is a natural choice. In our model the E-Step is computed exactly using the Forward-Backward algorithm [1], which provides the posterior probability over k states for each time point of every observed time series, γs (i) ≡ p(πi = s|x) and also the pairwise state posteriors, ξs,t (i) ≡ p(πi−1 = s, πi = t|xk ). The algorithm is modiﬁed only in that the emission probabilities depend on the latent trace as described in Section 2. The M-Step consists of a series of analytical updates to the various parameters as detailed below. Given the latent trace (and the emission and state transition probabilities), the complete log likelihood of K observed time series, xk , is given by Lp ≡ L + P. L is the likelihood term arising in a (conditional) HMM model, and can be obtained from the Forward-Backward algorithm. It is composed of the emission and state transition terms. P is the log prior (or penalty term), regularizing various aspects of the model parameters as explained below. These two terms are: K N N L≡ log Aπi (xk |z) + i log p(π1 ) + τ −1 K (zj+1 − zj )2 + P ≡ −λ (1) i=2 i=1 k=1 k log Tπi−1 ,πi j=1 k log D(dk |{ηv }) + log D(sv |{ηv }), v (2) k=1 where p(π1 ) are priors over the initial states. The ﬁrst term in Equation 2 is a smoothing k penalty on the latent trace, with λ controlling the amount of smoothing. ηv and ηv are Dirichlet hyperprior parameters for the time and scale state transition probability distributions respectively. These ensure that all non-zero transition probabilities remain non-zero. k For the time state transitions, v ∈ {1, Jτ } and ηv corresponds to the pseudo-count data for k the parameters d1 , d2 . . . dJτ . For the scale state transitions, v ∈ {0, 1} and ηv corresponds to the pseudo-count data for the parameters s0 and s1 . Letting S be the total number of possible states, that is, the number of elements in the cross-product of possible time states and possible scale states, the expected complete log likelihood is: K S K p k k γs (1) log T0,s

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca † Abstract Multiple realizations of continuous-valued time series from a stochastic process often contain systematic variations in rate and amplitude. [sent-7, score-0.471]

2 To leverage the information contained in such noisy replicate sets, we need to align them in an appropriate way (for example, to allow the data to be properly combined by adaptive averaging). [sent-8, score-0.291]

3 We present the Continuous Proﬁle Model (CPM), a generative model in which each observed time series is a non-uniformly subsampled version of a single latent trace, to which local rescaling and additive noise are applied. [sent-9, score-0.913]

4 After unsupervised training, the learned trace represents a canonical, high resolution fusion of all the replicates. [sent-10, score-0.621]

5 As well, an alignment in time and scale of each observation to this trace can be found by inference in the model. [sent-11, score-0.764]

6 We apply CPM to successfully align speech signals from multiple speakers and sets of Liquid Chromatography-Mass Spectrometry proteomic data. [sent-12, score-0.286]

7 1 A Proﬁle Model for Continuous Data When observing multiple time series generated by a noisy, stochastic process, large systematic sources of variability are often present. [sent-13, score-0.514]

8 For example, within a set of nominally replicate time series, the time axes can be variously shifted, compressed and expanded, in complex, non-linear ways. [sent-14, score-0.447]

9 Additionally, in some circumstances, the scale of the measured data can vary systematically from one replicate to the next, and even within a given replicate. [sent-15, score-0.337]

10 We propose a Continuous Proﬁle Model (CPM) for simultaneously analyzing a set of such time series. [sent-16, score-0.12]

11 In this model, each time series is generated as a noisy transformation of a single latent trace. [sent-17, score-0.788]

12 The latent trace is an underlying, noiseless representation of the set of replicated, observable time series. [sent-18, score-0.995]

13 Output time series are generated from this model by moving through a sequence of hidden states in a Markovian manner and emitting an observable value at each step, as in an HMM. [sent-19, score-0.735]

14 Each hidden state corresponds to a particular location in the latent trace, and the emitted value from the state depends on the value of the latent trace at that position. [sent-20, score-1.524]

15 To account for changes in the amplitude of the signals across and within replicates, the latent time states are augmented by a set of scale states, which control how the emission signal will be scaled relative to the value of the latent trace. [sent-21, score-1.536]

16 During training, the latent trace is learned, as well as the transition probabilities controlling the Markovian evolution of the scale and time states and the overall noise level of the observed data. [sent-22, score-1.442]

17 After training, the latent trace learned by the model represents a higher resolution ’fusion’ of the experimental replicates. [sent-23, score-0.9]

18 b) Speech waveforms corresponding to energy signals in a), Top: unaligned originals, Bottom: aligned using CPM. [sent-26, score-0.272]

19 2 Deﬁning the Continuous Proﬁle Model (CPM) The CPM is generative model for a set of K time series, xk = (xk , xk , . [sent-27, score-0.63]

20 The 1 2 N temporal sampling rate within each xk need not be uniform, nor must it be the same across the different xk . [sent-31, score-0.599]

21 Constraints on the variability of the sampling rate are discussed at the end of this section. [sent-32, score-0.077]

22 For notational convenience, we henceforth assume N k = N for all k, but this is not a requirement of the model. [sent-33, score-0.068]

23 The CPM is set up as follows: We assume that there is a latent trace, z = (z1 , z2 , . [sent-34, score-0.36]

24 , zM ), a canonical representation of the set of noisy input replicate time series. [sent-37, score-0.36]

25 Any given observed time series in the set is modeled as a non-uniformly subsampled version of the latent trace to which local scale transformations have been applied. [sent-38, score-1.428]

26 Ideally, M would be inﬁnite, or at least very large relative to N so that any experimental data could be mapped precisely to the correct underlying trace point. [sent-39, score-0.415]

27 Because the resolution of the latent trace is higher than that of the observed time series, experimental time can be made effectively to speed up or slow down by advancing along the latent trace in larger or smaller jumps. [sent-43, score-1.93]

28 The subsampling and local scaling used during the generation of each observed time series are determined by a sequence of hidden state variables. [sent-44, score-0.7]

29 Let the state sequence for observation k be π k . [sent-45, score-0.182]

30 Each state in the state sequence maps to a time state/scale state pair: k πi → {τik , φk }. [sent-46, score-0.604]

31 M ); scale states belong to an i ordered set (φ1 . [sent-49, score-0.334]

32 (In our experiments we have used Q=7, evenly spaced scales in k logarithmic space). [sent-52, score-0.031]

33 The uk are real-valued scale parameters, one per observed time series, that correct for any overall scale difference between time series k and the latent trace. [sent-54, score-1.235]

34 To fully specify our model we also need to deﬁne the state transition probabilities. [sent-55, score-0.262]

35 We deﬁne the transitions between time states and between scale states separately, so that k Tπi−1 ,πi ≡ p(πi |πi−1 ) = p(φi |φi−1 )pk (τi |τi−1 ). [sent-56, score-0.683]

36 The constraint that time must move forward, cannot stand still, and that it can jump ahead no more than Jτ time states is enforced. [sent-57, score-0.573]

37 ) As well, we only allow scale state transitions between neighbouring scale states so that the local scale cannot jump arbitrarily. [sent-59, score-0.902]

38 These constraints keep the number of legal transitions to a tractable computational size and work well in practice. [sent-60, score-0.133]

39 Each observed time series has its own time transition probability distribution to account for experiment-speciﬁc patterns. [sent-61, score-0.66]

40 Both the time and scale transition probability distributions are given by multinomials:  dk , if a − b = 1  1  k  d2 , if a − b = 2   k . [sent-62, score-0.362]

41 The distributions Jτ are constrained by: i=1 dk = 1 and 2s1 + s0 = 1. [sent-66, score-0.093]

42 i Jτ determines the maximum allowable instantaneous speedup of one portion of a time series relative to another portion, within the same series or across different series. [sent-67, score-0.772]

43 However, the length of time for which any series can move so rapidly is constrained by the length of the latent trace; thus the maximum overall ratio in speeds achievable by the model between any two entire time series is given by min(Jτ , M ). [sent-68, score-1.18]

44 N After training, one may examine either the latent trace or the alignment of each observable time series to the latent trace. [sent-69, score-1.674]

45 Such alignments can be achieved by several methods, including use of the Viterbi algorithm to ﬁnd the highest likelihood path through the hidden states [1], or sampling from the posterior over hidden state sequences. [sent-70, score-0.598]

46 We found Viterbi alignments to work well in the experiments below; samples from the posterior looked quite similar. [sent-71, score-0.104]

47 In our model the E-Step is computed exactly using the Forward-Backward algorithm [1], which provides the posterior probability over k states for each time point of every observed time series, γs (i) ≡ p(πi = s|x) and also the pairwise state posteriors, ξs,t (i) ≡ p(πi−1 = s, πi = t|xk ). [sent-73, score-0.643]

48 The algorithm is modiﬁed only in that the emission probabilities depend on the latent trace as described in Section 2. [sent-74, score-0.988]

49 The M-Step consists of a series of analytical updates to the various parameters as detailed below. [sent-75, score-0.258]

50 Given the latent trace (and the emission and state transition probabilities), the complete log likelihood of K observed time series, xk , is given by Lp ≡ L + P. [sent-76, score-1.719]

51 L is the likelihood term arising in a (conditional) HMM model, and can be obtained from the Forward-Backward algorithm. [sent-77, score-0.034]

52 It is composed of the emission and state transition terms. [sent-78, score-0.434]

53 P is the log prior (or penalty term), regularizing various aspects of the model parameters as explained below. [sent-79, score-0.116]

54 These two terms are: K N N L≡ log Aπi (xk |z) + i log p(π1 ) + τ −1 K (zj+1 − zj )2 + P ≡ −λ (1) i=2 i=1 k=1 k log Tπi−1 ,πi j=1 k log D(dk |{ηv }) + log D(sv |{ηv }), v (2) k=1 where p(π1 ) are priors over the initial states. [sent-80, score-0.307]

55 The ﬁrst term in Equation 2 is a smoothing k penalty on the latent trace, with λ controlling the amount of smoothing. [sent-81, score-0.445]

56 ηv and ηv are Dirichlet hyperprior parameters for the time and scale state transition probability distributions respectively. [sent-82, score-0.554]

57 These ensure that all non-zero transition probabilities remain non-zero. [sent-83, score-0.152]

58 k For the time state transitions, v ∈ {1, Jτ } and ηv corresponds to the pseudo-count data for k the parameters d1 , d2 . [sent-84, score-0.271]

59 For the scale state transitions, v ∈ {0, 1} and ηv corresponds to the pseudo-count data for the parameters s0 and s1 . [sent-88, score-0.282]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('trace', 0.415), ('cpm', 0.373), ('latent', 0.36), ('series', 0.258), ('xk', 0.255), ('emission', 0.172), ('states', 0.166), ('state', 0.151), ('replicate', 0.148), ('scale', 0.131), ('time', 0.12), ('transition', 0.111), ('transitions', 0.1), ('alignment', 0.098), ('amplitude', 0.097), ('subsampled', 0.093), ('dk', 0.093), ('resolution', 0.089), ('pro', 0.085), ('le', 0.082), ('fusion', 0.081), ('unaligned', 0.074), ('signals', 0.072), ('alignments', 0.069), ('warp', 0.069), ('uk', 0.064), ('observable', 0.063), ('systematic', 0.062), ('jump', 0.062), ('replicated', 0.062), ('align', 0.059), ('speech', 0.058), ('zj', 0.057), ('aligned', 0.056), ('hidden', 0.056), ('hmms', 0.055), ('continuous', 0.053), ('markovian', 0.053), ('viterbi', 0.053), ('observed', 0.051), ('log', 0.05), ('noisy', 0.05), ('controlling', 0.047), ('ik', 0.047), ('variability', 0.046), ('portion', 0.045), ('canonical', 0.042), ('toronto', 0.042), ('probabilities', 0.041), ('stand', 0.041), ('emitting', 0.041), ('proteomic', 0.041), ('hyperprior', 0.041), ('banting', 0.041), ('emili', 0.041), ('jennifer', 0.041), ('multinomials', 0.041), ('radford', 0.041), ('penalty', 0.038), ('replicates', 0.037), ('spectrometry', 0.037), ('noiseless', 0.037), ('henceforth', 0.037), ('cyan', 0.037), ('belong', 0.037), ('learned', 0.036), ('energy', 0.036), ('posterior', 0.035), ('leverage', 0.034), ('zm', 0.034), ('waveforms', 0.034), ('likelihood', 0.034), ('allowable', 0.033), ('aside', 0.033), ('liquid', 0.033), ('ahead', 0.033), ('subsampling', 0.033), ('speeds', 0.033), ('legal', 0.033), ('posteriors', 0.033), ('sequence', 0.031), ('compressed', 0.031), ('evenly', 0.031), ('emitted', 0.031), ('notational', 0.031), ('realizations', 0.031), ('neal', 0.031), ('rescaling', 0.031), ('sampling', 0.031), ('move', 0.031), ('across', 0.03), ('systematically', 0.03), ('neighbouring', 0.03), ('lp', 0.028), ('speakers', 0.028), ('regularizing', 0.028), ('sam', 0.028), ('multiple', 0.028), ('em', 0.028), ('within', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 124 nips-2004-Multiple Alignment of Continuous Time Series

Author: Jennifer Listgarten, Radford M. Neal, Sam T. Roweis, Andrew Emili

2 0.17444342 163 nips-2004-Semi-parametric Exponential Family PCA

Author: Sajama Sajama, Alon Orlitsky

Abstract: We present a semi-parametric latent variable model based technique for density modelling, dimensionality reduction and visualization. Unlike previous methods, we estimate the latent distribution non-parametrically which enables us to model data generated by an underlying low dimensional, multimodal distribution. In addition, we allow the components of latent variable models to be drawn from the exponential family which makes the method suitable for special data types, for example binary or count data. Simulations on real valued, binary and count data show favorable comparison to other related schemes both in terms of separating different populations and generalization to unseen samples. 1

3 0.16866222 125 nips-2004-Multiple Relational Embedding

Author: Roland Memisevic, Geoffrey E. Hinton

Abstract: We describe a way of using multiple different types of similarity relationship to learn a low-dimensional embedding of a dataset. Our method chooses different, possibly overlapping representations of similarity by individually reweighting the dimensions of a common underlying latent space. When applied to a single similarity relation that is based on Euclidean distances between the input data points, the method reduces to simple dimensionality reduction. If additional information is available about the dataset or about subsets of it, we can use this information to clean up or otherwise improve the embedding. We demonstrate the potential usefulness of this form of semi-supervised dimensionality reduction on some simple examples. 1

4 0.12363402 113 nips-2004-Maximum-Margin Matrix Factorization

Author: Nathan Srebro, Jason Rennie, Tommi S. Jaakkola

Abstract: We present a novel approach to collaborative prediction, using low-norm instead of low-rank factorizations. The approach is inspired by, and has strong connections to, large-margin linear discrimination. We show how to learn low-norm factorizations by solving a semi-deﬁnite program, and discuss generalization error bounds for them. 1

5 0.11426613 52 nips-2004-Discrete profile alignment via constrained information bottleneck

Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin

Abstract: Amino acid proﬁles, which capture position-speciﬁc mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, proﬁle comparisons are much more computationally expensive than discrete symbol comparisons, making proﬁles impractical for many large datasets. Furthermore, because they are such a rich representation, proﬁles can be diﬃcult to visualize. To overcome these problems, we propose a discretization for proﬁles using an expanded alphabet representing not just individual amino acids, but common proﬁles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we ﬁnd an informationally optimal alphabet. This discretization yields a concise, informative textual representation for proﬁle sequences. Also alignments between these sequences, while nearly as accurate as the full proﬁleproﬁle alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using proﬁles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which proﬁle information can be applied. 1

6 0.10770315 50 nips-2004-Dependent Gaussian Processes

7 0.09360303 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

8 0.08825814 66 nips-2004-Exponential Family Harmoniums with an Application to Information Retrieval

9 0.075335532 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces

10 0.072279498 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech

11 0.070910811 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

12 0.06897556 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

13 0.068578266 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account

14 0.065315969 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

15 0.064133018 28 nips-2004-Bayesian inference in spiking neurons

16 0.062906057 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

17 0.059410449 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

18 0.059381004 64 nips-2004-Experts in a Markov Decision Process

19 0.059052192 107 nips-2004-Making Latin Manuscripts Searchable using gHMM's

20 0.058166437 7 nips-2004-A Large Deviation Bound for the Area Under the ROC Curve

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.179), (1, -0.019), (2, 0.012), (3, -0.147), (4, -0.088), (5, -0.002), (6, -0.068), (7, 0.059), (8, 0.194), (9, -0.01), (10, 0.083), (11, -0.112), (12, 0.033), (13, 0.15), (14, 0.205), (15, -0.032), (16, -0.033), (17, -0.042), (18, -0.093), (19, -0.158), (20, -0.144), (21, 0.088), (22, -0.069), (23, 0.009), (24, 0.129), (25, 0.102), (26, 0.009), (27, -0.093), (28, -0.008), (29, -0.128), (30, 0.103), (31, 0.11), (32, -0.189), (33, -0.18), (34, -0.02), (35, -0.021), (36, 0.085), (37, -0.02), (38, -0.031), (39, 0.026), (40, -0.092), (41, -0.1), (42, 0.03), (43, -0.124), (44, -0.027), (45, -0.002), (46, -0.045), (47, -0.021), (48, 0.052), (49, -0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97974986 124 nips-2004-Multiple Alignment of Continuous Time Series

Author: Jennifer Listgarten, Radford M. Neal, Sam T. Roweis, Andrew Emili

2 0.59283859 66 nips-2004-Exponential Family Harmoniums with an Application to Information Retrieval

Author: Max Welling, Michal Rosen-zvi, Geoffrey E. Hinton

Abstract: Directed graphical models with one layer of observed random variables and one or more layers of hidden random variables have been the dominant modelling paradigm in many research ﬁelds. Although this approach has met with considerable success, the causal semantics of these models can make it difﬁcult to infer the posterior distribution over the hidden variables. In this paper we propose an alternative two-layer model based on exponential family distributions and the semantics of undirected models. Inference in these “exponential family harmoniums” is fast while learning is performed by minimizing contrastive divergence. A member of this family is then studied as an alternative probabilistic model for latent semantic indexing. In experiments it is shown that they perform well on document retrieval tasks and provide an elegant solution to searching with keywords.

3 0.58400351 163 nips-2004-Semi-parametric Exponential Family PCA

Author: Sajama Sajama, Alon Orlitsky

4 0.57752228 125 nips-2004-Multiple Relational Embedding

Author: Roland Memisevic, Geoffrey E. Hinton

5 0.53259474 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account

Author: Aaron C. Courville, Nathaniel D. Daw, David S. Touretzky

Abstract: We propose a probabilistic, generative account of conﬁgural learning phenomena in classical conditioning. Conﬁgural learning experiments probe how animals discriminate and generalize between patterns of simultaneously presented stimuli (such as tones and lights) that are differentially predictive of reinforcement. Previous models of these issues have been successful more on a phenomenological than an explanatory level: they reproduce experimental ﬁndings but, lacking formal foundations, provide scant basis for understanding why animals behave as they do. We present a theory that clariﬁes seemingly arbitrary aspects of previous models while also capturing a broader set of data. Key patterns of data, e.g. concerning animals’ readiness to distinguish patterns with varying degrees of overlap, are shown to follow from statistical inference.

6 0.46914965 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

7 0.42546085 52 nips-2004-Discrete profile alignment via constrained information bottleneck

8 0.41967756 74 nips-2004-Harmonising Chorales by Probabilistic Inference

9 0.378748 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

10 0.36367041 50 nips-2004-Dependent Gaussian Processes

11 0.36128578 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity

12 0.34347728 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces

13 0.33706886 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models

14 0.32928136 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

15 0.32402009 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data

16 0.31646669 185 nips-2004-The Convergence of Contrastive Divergences

17 0.30677092 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process

18 0.30156425 180 nips-2004-Synchronization of neural networks by mutual learning and its application to cryptography

19 0.29058194 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

20 0.28504047 193 nips-2004-Theories of Access Consciousness

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.126), (15, 0.131), (25, 0.03), (26, 0.045), (31, 0.03), (33, 0.174), (35, 0.014), (39, 0.083), (50, 0.03), (71, 0.013), (76, 0.015), (84, 0.2)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85904729 124 nips-2004-Multiple Alignment of Continuous Time Series

Author: Jennifer Listgarten, Radford M. Neal, Sam T. Roweis, Andrew Emili

2 0.79100937 163 nips-2004-Semi-parametric Exponential Family PCA

Author: Sajama Sajama, Alon Orlitsky

3 0.78607941 42 nips-2004-Computing regularization paths for learning multiple kernels

Author: Francis R. Bach, Romain Thibaux, Michael I. Jordan

Abstract: The problem of learning a sparse conic combination of kernel functions or kernel matrices for classiﬁcation or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity of solving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, we show empirically that the effect of the block 1-norm regularization differs notably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case. 1

4 0.78051895 70 nips-2004-Following Curved Regularized Optimization Solution Paths

Author: Saharon Rosset

Abstract: Regularization plays a central role in the analysis of modern data, where non-regularized ﬁtting is likely to lead to over-ﬁtted models, useless for both prediction and interpretation. We consider the design of incremental algorithms which follow paths of regularized solutions, as the regularization varies. These approaches often result in methods which are both efﬁcient and highly ﬂexible. We suggest a general path-following algorithm based on second-order approximations, prove that under mild conditions it remains “very close” to the path of optimal solutions and illustrate it with examples.

5 0.77558351 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

6 0.77080286 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models

7 0.76896626 178 nips-2004-Support Vector Classification with Input Data Uncertainty

8 0.76741964 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

9 0.7662192 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

10 0.76572347 102 nips-2004-Learning first-order Markov models for control

11 0.76460856 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

12 0.76441193 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

13 0.76391309 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

14 0.76317823 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

15 0.76312834 113 nips-2004-Maximum-Margin Matrix Factorization

16 0.76258528 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

17 0.75928187 71 nips-2004-Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices

18 0.7580716 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms

19 0.75793689 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

20 0.75734037 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection