nips nips2012 nips2012-270 knowledge-graph by maker-knowledge-mining

270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System


Source: pdf

Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo

Abstract: For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 kr Abstract For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). [sent-12, score-0.807]

2 The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. [sent-13, score-0.415]

3 The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. [sent-14, score-0.608]

4 In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. [sent-15, score-0.203]

5 The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. [sent-16, score-0.127]

6 1 Introduction Automatic speech recognition (ASR), the process of automatically translating spoken words into text, has been an important research topic for several decades owing to its wide array of potential applications in the area of human-computer interaction (HCI). [sent-17, score-0.336]

7 The state-of-the-art ASR systems typically use hidden Markov models (HMMs) [1] to model the sequential articulator structure of speech signals. [sent-18, score-0.24]

8 1) An HMM with a first-order Markovian structure is suitable for capturing short-range dependency in observations and speech requires a more flexible model that can capture long-range dependency in speech. [sent-20, score-0.486]

9 For example, the stochastic segment model [2] is a well-known generalization of the HMM that represents long-range dependency over observations using a time-dependent emission function. [sent-24, score-0.452]

10 And the hidden dynamical model [3] is used for modeling the complex nonlinear dynamics of a physiological articulator. [sent-25, score-0.311]

11 Another promising research direction is to consider a nonparametric Bayesian model for nonlinear probabilistic modeling of speech. [sent-26, score-0.125]

12 Owing to the fact that nonparametric models do not assume any 1 fixed model structure, they are generally more flexible than parametric models and can allow dependency among observations naturally. [sent-27, score-0.222]

13 The Gaussian process (GP) [4], a stochastic process over a real-valued function, has been a key ingredient in solving such problems as nonlinear regression and classification. [sent-28, score-0.114]

14 As a standard supervised learning task using the GP, Gaussian process regression (GPR) offers a nonparametric Bayesian framework to infer the nonlinear latent function relating the input and the output data. [sent-29, score-0.211]

15 Recently, researchers have begun focusing on applying the GP to unsupervised learning tasks with high-dimensional data, such as the Gaussian process latent variable model (GP-LVM) for reduction of dimensionality [5-6]. [sent-30, score-0.131]

16 The variational approach is one of the sparse approximation approaches [8]. [sent-32, score-0.098]

17 The framework was extended to the variational Gaussian process dynamical system (VGPDS) in [9] by augmenting latent dynamics for modeling high-dimensional time series data. [sent-33, score-0.371]

18 However, no previous work has considered the GP-based approach for speech recognition tasks that involve high-dimensional time series data. [sent-35, score-0.222]

19 In this paper, we propose a GP-based acoustic model for phoneme classification. [sent-36, score-0.628]

20 The proposed model is based on the assumption that the continuous dynamics and nonlinearity of the VGPDS can be better represent the statistical characteristic of real speech than an HMM. [sent-37, score-0.304]

21 The GP prior over the emission function allows the model to represent long-range dependency over the observations of speech, while the HMM does not. [sent-38, score-0.354]

22 Furthermore, the GP prior over the dynamics function enables the model to capture the nonlinear dynamics of a physiological articulator. [sent-39, score-0.262]

23 1 Acoustic modeling using Gaussian Processes Variational Gaussian Process Dynamical System The VGPDS [9] models time series data by assuming that there exist latent states that govern the data. [sent-44, score-0.104]

24 Although the Gaussian process dynamical model (GPDM) [10], which involves an auto-regressive dynamics function, is also a GP-based model for time-series, it is not considered in this paper. [sent-51, score-0.228]

25 2 (2) Figure 1: Graphical representations of (left) the left-to-right HMM and (right) the VGPDS: In the left figure, yn ∈ RD and xn ∈ {1, · · · , C} are observations and discrete latent states. [sent-53, score-0.162]

26 In the right figure, yni , fni , xnj , gnj , and tn are observations, emission function points, latent states, dynamics function points, and times, respectively. [sent-54, score-0.464]

27 (2) is not tractable, a variational method is used by introducing a variational distribution q(X). [sent-57, score-0.158]

28 (4) i=1 In [9], a variational approach which involves sparse approximation of the covariance matrix obtained from GP is proposed. [sent-61, score-0.125]

29 Here, Ki ∈ RM ×M is a kernel matrix calcu1i ˜ lated using the i-th kernel function and inducing input variables X ∈ RM ×Q that are used for sparse approximation of the full kernel matrix Ki . [sent-63, score-0.423]

30 The closed-form of the statistics {ψ0i , Ψ1i , Ψ2i }D , i=1 which are functions of variational parameters and inducing points, can be found in [9]. [sent-64, score-0.195]

31 (3), p(X|t) = j=1 p(xj ) and q(X) = n j=1 N (µnj , snj ) are the prior for the latent state and the variational distribution that is used for approximating the posterior of the latent state, respectively. [sent-66, score-0.229]

32 2 Acoustic modeling using VGPDS For several decades, HMM has been the predominant model for acoustic speech modeling. [sent-70, score-0.408]

33 However, as we mentioned in Section 1, the model suffers from two major limitations: discrete state variables and first-order Markovian structure which can model short-range dependency over the observations. [sent-71, score-0.142]

34 3 To overcome such limitations of the HMM, we propose an acoustic speech model based on the VGPDS, which is a nonlinear and nonparametric model that can be used to represent the complex dynamic structure of speech and long-range dependency over observations of speech. [sent-72, score-0.936]

35 In addition, to fit the model to large-scale speech data, we describe various implementation issues. [sent-73, score-0.216]

36 1 Time scale modification The time length of each phoneme segment in an utterance varies with various conditions such as position of the phoneme segment in the utterance, emotion, gender, and other speaker and environment conditions. [sent-76, score-1.12]

37 To incorporate this fact into the proposed acoustic model, the time points tn are modified as follows: n−1 tn = , (6) N −1 where n and N are the observation index and the number of observations in a phoneme segment, respectively. [sent-77, score-0.842]

38 This time scale modification makes all phoneme signals have unit time length. [sent-78, score-0.445]

39 We use the radial basis function (RBF) kernel for the emission function f as follows:   Q f f ωj (xj − xj )2  , f k (x, x ) = α exp − (7) j=1 f where αf and ωj are the RBF kernel variance and the j-th inverse length scale, respectively. [sent-83, score-0.406]

40 The RBF kernel function is adopted for representing smoothness of speech. [sent-84, score-0.114]

41 For the dynamics function g, the following kernel function is used: k g (t, t ) = αg exp −ω g (t − t )2 + λtt + b, (8) where λ and b are linear kernel variance and bias, respectively. [sent-85, score-0.328]

42 The above dynamics kernel, which consists of both linear and nonlinear components, is used for representing the complex dynamics of the articulator. [sent-86, score-0.238]

43 However, this extensive sharing of the hyperparameters is unsuitable for speech modeling. [sent-89, score-0.295]

44 To handle this problem, this paper considers each dimension to be modeled independently using different kernel function parameters. [sent-91, score-0.118]

45 3 Priors on the hyperparameters In the parameter estimation of the VGPDS, the SCG algorithm does not guarantee the optimal solution. [sent-95, score-0.103]

46 To overcome this problem, we place the following prior on the hyperparameters of the kernel functions as given below p(γ) ∝ exp(−γ 2 /¯ ), γ f (9) g where γ ∈ {θ , θ } and γ are the hyper-parameter and the model parameter of the prior, respec¯ tively. [sent-96, score-0.214]

47 In this paper, γ is set to the sample variance for the hyperparameters of the emission kernel ¯ functions, and γ is set to 1 for the hyperparameters of the dynamics kernel functions. [sent-97, score-0.648]

48 (5), the second term on the right-hand side is the regularization term that represents the sparse approximation error of the full kernel matrix Ki . [sent-102, score-0.107]

49 Note that with more inducing 4 input points, approximation error becomes smaller. [sent-103, score-0.14]

50 However, only a small number of inducing input points can be used owing to the limited availability of computational power, which increases the effect of the regularization term. [sent-104, score-0.243]

51 This constraint is designed so that the variance of each observation calculated from the estimated model is equal to the sample variance. [sent-106, score-0.145]

52 1, the effectiveness of the variance constraint is demonstrated empirically. [sent-110, score-0.128]

53 Parameter estimation: validating the effectiveness of the proposed variance constraint (Section 2. [sent-116, score-0.128]

54 Two-class classification using synthetic data: demonstrating explicitly the advantages of the proposed model over the HMM with respect to the degree of dependency over the observations 3. [sent-119, score-0.258]

55 Phoneme classification: evaluating the performance of the proposed model on real speech data Each experiment is described in detail in the following subsections. [sent-120, score-0.216]

56 1 Parameter estimation In this subsection, the experiments of parameter estimation on synthetic data are described. [sent-123, score-0.101]

57 Synthetic data are generated by using a phoneme model that is selected from the trained models in Section 3. [sent-124, score-0.465]

58 The RBF kernel variances of the emission functions and the emission noise variances are modified from the selected model. [sent-126, score-0.573]

59 In this experiment, the emission noise variances and inducing input points are estimated, while all other parameters are fixed to the true values used in generating the data. [sent-127, score-0.423]

60 The estimates of the 39-dimensional noise variance of the emission functions are shown with the true noise variances, the true RBF kernel variances, and the sample variances of the synthetic data. [sent-130, score-0.484]

61 The top row denotes the estimation results without the variance constraint, and the bottom row with the variance constraint. [sent-131, score-0.154]

62 Remarkably, the estimation result of the CVGPDS with M = 5 inducing input points is much better than the result of the VGPDS with M = 30. [sent-136, score-0.193]

63 2 Two-class classification using synthetic data This section aims to show that when there is strong dependency over the observations, the proposed CVGPDS is a more appropriate model than the HMM for the classification task. [sent-141, score-0.171]

64 To this end, we first generated several sets of two-class classification datasets with different degrees of dependency over the observations. [sent-142, score-0.125]

65 The considered classification task is to map each input segment to one of two class labels. [sent-143, score-0.122]

66 , S} as the segment index, the synthetic dataset D = {Ys , ts , ls }S s=1 consists of S segments, where the s-th segment has Ns samples. [sent-147, score-0.357]

67 Here, Ys ∈ RNs ×D , ts ∈ RNs , and ls are the observation data, time, and class label of the s-th segment, respectively. [sent-148, score-0.116]

68 i i Note that parameter ωi controls the degree of dependency over the observations. [sent-169, score-0.123]

69 For instance, if ωi decreases, the off-diagonal terms of the emission kernel matrix Kf increase, which means stronger i correlations over the observations. [sent-170, score-0.254]

70 The synthesized dataset consists of 200 segments in total (100 segments per class). [sent-172, score-0.181]

71 The dimensions of the latent space and observation space are set to Q = 2 and D = 5, respectively. [sent-173, score-0.104]

72 We use 6(= Zi ) components for the mean function of the emission kernel function. [sent-174, score-0.254]

73 As a result, the degree of correlation between the observations is the only factor that distinguishes the two classes. [sent-178, score-0.115]

74 Apparently, the HMM failed to distinguish the two classes with different degree of dependency over the observations. [sent-202, score-0.123]

75 In contrast, the proposed CVGPDS distinguishes the two classes more effectively by capturing the different degrees of inter-dependencies over the observations incorporated in each class. [sent-203, score-0.117]

76 3 Phoneme classification In this section, phoneme classification experiments is described on real speech data from the TIMIT database. [sent-205, score-0.641]

77 The TIMIT database contains a total of 6300 phonetically rich utterances, each of which is manually segmented based on 61 phoneme transcriptions. [sent-206, score-0.445]

78 Following the standard regrouping of phoneme labels [11], 61 phonemes are reduced to 48 phonemes selected for modeling. [sent-207, score-0.613]

79 As observations, 39-dimensional Mel-frequency cepstral coefficients (MFCCs) (13 static coefficients, ∆, and 7 ∆∆) extracted from the speech signals with standard 25 ms frame size, and 10 ms frame shifts are used. [sent-208, score-0.196]

80 The dimension of the latent space is set to Q = 2. [sent-209, score-0.105]

81 For the first phoneme classification experiment, 100 segments per phoneme are randomly selected using the phoneme boundary provided information in the TIMIT database. [sent-210, score-1.4]

82 The number of inducing input points is set to M = 10. [sent-211, score-0.167]

83 Table 2: Classification accuracy on the 48-phoneme dataset (10-fold CV average [%]): 100 segments are used for training and testing each phoneme model HMM VGPDS CVGPDS 49. [sent-214, score-0.555]

84 For the second phoneme classification experiment, the TIMIT core test set consisting of 192 sentences is used for evaluation. [sent-219, score-0.466]

85 We use the same 100 segments for training the phoneme models as in the first phoneme classification experiment. [sent-220, score-0.955]

86 When evaluating the models, we merge the labels of 48 phonemes into the commonly used 39 phonemes [11]. [sent-222, score-0.168]

87 Given speech observations with boundary information, a sequence of log-likelihoods is obtained, and then a bigram is constructed to incorporate linguistic information into the classification score. [sent-223, score-0.262]

88 In this experiment, the number of inducing input points is set to M = 5. [sent-224, score-0.167]

89 Table 3: Classification accuracy on the TIMIT core test set [%]: 100 segments are used for training each phoneme model HMM VGPDS CVGPDS 57. [sent-225, score-0.551]

90 54 Table 3 shows the experimental results of phoneme classification for the TIMIT core test set. [sent-228, score-0.466]

91 However, the classification accuracies in Table 3 are lower than the state-of-the-art phoneme classification results [12-13]. [sent-230, score-0.445]

92 The reasons for low accuracy are as follows: 1) insufficient amount of data is used for training the model owing to limited availability of computational power; 2) a mixture model for the emission is not considered. [sent-231, score-0.282]

93 4 Conclusion In this paper, a VGPDS-based acoustic model for phoneme classification was considered. [sent-233, score-0.628]

94 The proposed acoustic model can represent the nonlinear latent dynamics and dependency among observations by GP priors. [sent-234, score-0.556]

95 Although the proposed model could not achieve the state-of-the-art performance of phoneme classification, the experimental results showed that the proposed acoustic model has potential for speech modeling. [sent-236, score-0.844]

96 Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of the IEEE, Vol. [sent-241, score-0.222]

97 Rohlicek, “From HMMs to segment models: A unified view of stochastic modeling for speech recognition,” IEEE Trans. [sent-247, score-0.323]

98 Lawrence, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,” Journal of Machine Learning Research (JMLR), Vol. [sent-266, score-0.111]

99 Lawrence, “Learning for larger datasets with the Gaussian process latent variable model,” International Conference on Artificial Intelligence and Statistics (AISTATS), pp. [sent-271, score-0.111]

100 Saul, “Large margin hidden markov models for automatic speech recognition,” Advances in Neural Information Processing Systems (NIPS), 2007. [sent-320, score-0.22]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vgpds', 0.462), ('phoneme', 0.445), ('cvgpds', 0.357), ('hmm', 0.262), ('speech', 0.196), ('emission', 0.166), ('acoustic', 0.163), ('korea', 0.117), ('inducing', 0.116), ('ki', 0.11), ('gp', 0.102), ('dependency', 0.102), ('segment', 0.098), ('timit', 0.096), ('kernel', 0.088), ('dynamics', 0.088), ('phonemes', 0.084), ('variational', 0.079), ('hyperparameters', 0.077), ('south', 0.076), ('latent', 0.075), ('kaist', 0.074), ('daejeon', 0.074), ('asr', 0.074), ('classi', 0.066), ('observations', 0.066), ('segments', 0.065), ('rbf', 0.064), ('variance', 0.064), ('dynamical', 0.064), ('variances', 0.063), ('rns', 0.063), ('owing', 0.056), ('tn', 0.056), ('ys', 0.053), ('xs', 0.053), ('ee', 0.05), ('synthetic', 0.049), ('ls', 0.049), ('ns', 0.049), ('gj', 0.049), ('limitations', 0.048), ('hmms', 0.048), ('gaussian', 0.046), ('kf', 0.044), ('fi', 0.042), ('sungrack', 0.042), ('yni', 0.042), ('cation', 0.042), ('nonlinear', 0.042), ('ts', 0.038), ('lawrence', 0.037), ('scg', 0.037), ('wz', 0.037), ('xnj', 0.037), ('process', 0.036), ('mz', 0.034), ('utterance', 0.034), ('nonparametric', 0.034), ('effectiveness', 0.032), ('constraint', 0.032), ('dimension', 0.03), ('audio', 0.03), ('dx', 0.03), ('bj', 0.03), ('observation', 0.029), ('kg', 0.029), ('kj', 0.029), ('overcome', 0.029), ('modeling', 0.029), ('distinguishes', 0.028), ('noise', 0.027), ('titsias', 0.027), ('covariance', 0.027), ('points', 0.027), ('adopted', 0.026), ('recognition', 0.026), ('estimation', 0.026), ('synthesized', 0.026), ('dataset', 0.025), ('nj', 0.025), ('input', 0.024), ('physiological', 0.024), ('hidden', 0.024), ('degrees', 0.023), ('cv', 0.023), ('rasmussen', 0.022), ('decades', 0.022), ('extensive', 0.022), ('yn', 0.021), ('degree', 0.021), ('markovian', 0.021), ('remarkably', 0.021), ('core', 0.021), ('modi', 0.021), ('complex', 0.02), ('park', 0.02), ('model', 0.02), ('availability', 0.02), ('sparse', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo

Abstract: For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. 1

2 0.19975181 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

Author: Emanuele Coviello, Gert R. Lanckriet, Antoni B. Chan

Abstract: In this paper, we derive a novel algorithm to cluster hidden Markov models (HMMs) according to their probability distributions. We propose a variational hierarchical EM algorithm that i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a “cluster center”, i.e., a novel HMM that is representative for the group. We illustrate the benefits of the proposed algorithm on hierarchical clustering of motion capture sequences as well as on automatic music tagging. 1

3 0.1094766 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

Author: Jasper Snoek, Hugo Larochelle, Ryan P. Adams

Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a “black art” requiring expert experience, rules of thumb, or sometimes bruteforce search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm’s generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks. 1

4 0.10756665 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

Author: Sourish Chaudhuri, Bhiksha Raj

Abstract: Approaches to audio classification and retrieval tasks largely rely on detectionbased discriminative models. We submit that such models make a simplistic assumption in mapping acoustics directly to semantics, whereas the actual process is likely more complex. We present a generative model that maps acoustics in a hierarchical manner to increasingly higher-level semantics. Our model has two layers with the first layer modeling generalized sound units with no clear semantic associations, while the second layer models local patterns over these sound units. We evaluate our model on a large-scale retrieval task from TRECVID 2011, and report significant improvements over standard baselines. 1

5 0.10385366 150 nips-2012-Hierarchical spike coding of sound

Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli

Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1

6 0.1003546 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

7 0.095579281 72 nips-2012-Cocktail Party Processing via Structured Prediction

8 0.086500585 197 nips-2012-Learning with Recursive Perceptual Representations

9 0.085937843 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

10 0.083775699 187 nips-2012-Learning curves for multi-task Gaussian process regression

11 0.076677978 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

12 0.074115932 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

13 0.070508681 233 nips-2012-Multiresolution Gaussian Processes

14 0.067623481 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

15 0.065080546 55 nips-2012-Bayesian Warped Gaussian Processes

16 0.062581666 188 nips-2012-Learning from Distributions via Support Measure Machines

17 0.059818134 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

18 0.058399115 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

19 0.055926669 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

20 0.054092191 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.162), (1, 0.046), (2, -0.023), (3, 0.018), (4, -0.085), (5, -0.026), (6, -0.001), (7, 0.059), (8, -0.037), (9, -0.166), (10, -0.056), (11, 0.028), (12, 0.042), (13, 0.028), (14, 0.001), (15, -0.026), (16, -0.019), (17, 0.066), (18, -0.047), (19, -0.059), (20, -0.071), (21, -0.051), (22, -0.08), (23, -0.097), (24, 0.029), (25, 0.022), (26, 0.01), (27, -0.116), (28, -0.069), (29, 0.047), (30, 0.092), (31, 0.053), (32, 0.048), (33, -0.025), (34, -0.003), (35, 0.053), (36, 0.16), (37, -0.073), (38, 0.08), (39, 0.101), (40, 0.011), (41, 0.096), (42, -0.162), (43, -0.04), (44, 0.027), (45, 0.034), (46, -0.046), (47, 0.045), (48, -0.069), (49, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88968301 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo

Abstract: For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. 1

2 0.7160762 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

Author: Sourish Chaudhuri, Bhiksha Raj

Abstract: Approaches to audio classification and retrieval tasks largely rely on detectionbased discriminative models. We submit that such models make a simplistic assumption in mapping acoustics directly to semantics, whereas the actual process is likely more complex. We present a generative model that maps acoustics in a hierarchical manner to increasingly higher-level semantics. Our model has two layers with the first layer modeling generalized sound units with no clear semantic associations, while the second layer models local patterns over these sound units. We evaluate our model on a large-scale retrieval task from TRECVID 2011, and report significant improvements over standard baselines. 1

3 0.63618588 150 nips-2012-Hierarchical spike coding of sound

Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli

Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1

4 0.62713802 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

Author: Emanuele Coviello, Gert R. Lanckriet, Antoni B. Chan

Abstract: In this paper, we derive a novel algorithm to cluster hidden Markov models (HMMs) according to their probability distributions. We propose a variational hierarchical EM algorithm that i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a “cluster center”, i.e., a novel HMM that is representative for the group. We illustrate the benefits of the proposed algorithm on hierarchical clustering of motion capture sequences as well as on automatic music tagging. 1

5 0.5547058 55 nips-2012-Bayesian Warped Gaussian Processes

Author: Miguel Lázaro-gredilla

Abstract: Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classification, etc. We demonstrate the superior performance of Bayesian warped GPs on several real data sets.

6 0.5312205 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

7 0.53111076 72 nips-2012-Cocktail Party Processing via Structured Prediction

8 0.50477749 289 nips-2012-Recognizing Activities by Attribute Dynamics

9 0.4819234 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

10 0.45648527 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

11 0.4469853 233 nips-2012-Multiresolution Gaussian Processes

12 0.44307464 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

13 0.43889567 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

14 0.42452419 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

15 0.42026639 187 nips-2012-Learning curves for multi-task Gaussian process regression

16 0.41231209 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

17 0.40977514 66 nips-2012-Causal discovery with scale-mixture model for spatiotemporal variance dependencies

18 0.39833814 188 nips-2012-Learning from Distributions via Support Measure Machines

19 0.39576232 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

20 0.38577452 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.469), (17, 0.011), (21, 0.015), (38, 0.066), (42, 0.017), (54, 0.03), (55, 0.011), (74, 0.038), (76, 0.136), (80, 0.076), (92, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89585739 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

Author: Michael Paul, Mark Dredze

Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1

2 0.85659319 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

Author: Aaron Dennis, Dan Ventura

Abstract: The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difficult tasks such as image completion. Designing an SPN network architecture that is suitable for the task at hand is an open question. We propose an algorithm for learning the SPN architecture from data. The idea is to cluster variables (as opposed to data instances) in order to identify variable subsets that strongly interact with one another. Nodes in the SPN network are then allocated towards explaining these interactions. Experimental evidence shows that learning the SPN architecture significantly improves its performance compared to using a previously-proposed static architecture. 1

same-paper 3 0.80988097 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo

Abstract: For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. 1

4 0.80577785 233 nips-2012-Multiresolution Gaussian Processes

Author: David B. Dunson, Emily B. Fox

Abstract: We propose a multiresolution Gaussian process to capture long-range, nonMarkovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes. Due to the inherent conjugacy of the GPs, one can analytically marginalize the GPs and compute the marginal likelihood of the observations given the partition tree. This property allows for efficient inference of the partition itself, for which we employ graph-theoretic techniques. We apply the multiresolution GP to the analysis of magnetoencephalography (MEG) recordings of brain activity.

5 0.78295815 192 nips-2012-Learning the Dependency Structure of Latent Factors

Author: Yunlong He, Yanjun Qi, Koray Kavukcuoglu, Haesun Park

Abstract: In this paper, we study latent factor models with dependency structure in the latent space. We propose a general learning framework which induces sparsity on the undirected graphical model imposed on the vector of latent factors. A novel latent factor model SLFA is then proposed as a matrix factorization problem with a special regularization term that encourages collaborative reconstruction. The main benefit (novelty) of the model is that we can simultaneously learn the lowerdimensional representation for data and model the pairwise relationships between latent factors explicitly. An on-line learning algorithm is devised to make the model feasible for large-scale learning problems. Experimental results on two synthetic data and two real-world data sets demonstrate that pairwise relationships and latent factors learned by our model provide a more structured way of exploring high-dimensional data, and the learned representations achieve the state-of-the-art classification performance. 1

6 0.7826823 282 nips-2012-Proximal Newton-type methods for convex optimization

7 0.73370546 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.69167739 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

9 0.68316442 12 nips-2012-A Neural Autoregressive Topic Model

10 0.61902344 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

11 0.56657976 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

12 0.55928242 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

13 0.54947269 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

14 0.54724234 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

15 0.54634887 72 nips-2012-Cocktail Party Processing via Structured Prediction

16 0.5445013 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

17 0.53743178 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

18 0.53416789 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

19 0.53335667 150 nips-2012-Hierarchical spike coding of sound

20 0.53249073 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model