nips nips2013 nips2013-100 knowledge-graph by maker-knowledge-mining

100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture


Source: pdf

Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin

Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. [sent-9, score-0.316]

2 The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. [sent-10, score-0.186]

3 1 Introduction The Dirichlet process mixture model (DPMM) is a powerful tool for clustering data that enables the inference of an unbounded number of mixture components, and has been widely studied in the machine learning and statistics communities [1–4]. [sent-12, score-0.313]

4 Despite its flexibility, it assumes the observations are exchangeable, and therefore that the data points have no inherent ordering that influences their labeling. [sent-13, score-0.047]

5 This assumption is invalid for modeling temporally/spatially evolving phenomena, in which the order of the data points plays a principal role in creating meaningful clusters. [sent-14, score-0.083]

6 The dependent Dirichlet process (DDP), originally formulated by MacEachern [5], provides a prior over such evolving mixture models, and is a promising tool for incrementally monitoring the dynamic evolution of the cluster structure within a dataset. [sent-15, score-0.555]

7 More recently, a construction of the DDP built upon completely random measures [6] led to the development of the dependent Dirichlet process Mixture model (DDPMM) and a corresponding approximate posterior inference Gibbs sampling algorithm. [sent-16, score-0.177]

8 This model generalizes the DPMM by including birth, death and transition processes for the clusters in the model. [sent-17, score-0.152]

9 The DDPMM is a Bayesian nonparametric (BNP) model, part of an ever-growing class of probabilistic models for which inference captures uncertainty in both the number of parameters and their values. [sent-18, score-0.034]

10 Inference techniques for BNPs typically fall into two classes: sampling methods (e. [sent-20, score-0.056]

11 , Gibbs sampling [2] 1 or particle learning [4]) and optimization methods (e. [sent-22, score-0.168]

12 Current methods based on sampling do not scale well with the size of the dataset [8]. [sent-25, score-0.056]

13 Most optimization methods require analytic derivatives and the selection of an upper bound on the number of clusters a priori, where the computational complexity increases with that upper bound [3, 7]. [sent-26, score-0.152]

14 State-of-the-art techniques in both classes are not ideal for use in contexts where performing inference quickly and reliably on large volumes of streaming data is crucial for timely decision-making, such as autonomous robotic systems [9–11]. [sent-27, score-0.034]

15 On the other hand, many classical clustering methods [12–14] scale well with the size of the dataset and are easy to implement, and advances have recently been made to capture the flexibility of Bayesian nonparametrics in such approaches [15]. [sent-28, score-0.093]

16 However, as of yet there is no classical algorithm that captures dynamic cluster structure with the same representational power as the DDP mixture model. [sent-29, score-0.409]

17 This paper discusses the Dynamic Means algorithm, a novel hard clustering algorithm for spatiotemporal data derived from the low-variance asymptotic limit of the Gibbs sampling algorithm for the dependent Dirichlet process Gaussian mixture model. [sent-30, score-0.35]

18 This algorithm captures the scalability and ease of implementation of classical clustering methods, along with the representational power of the DDP prior, and is guaranteed to converge to a local minimum of a k-means-like cost function. [sent-31, score-0.128]

19 The algorithm is significantly more computationally tractable than Gibbs sampling, particle learning, and variational inference for the DDP mixture model in practice, while providing equivalent or better clustering accuracy on the examples presented. [sent-32, score-0.4]

20 The performance and characteristics of the algorithm are demonstrated in a test on synthetic data, with a comparison to those of Gibbs sampling, particle learning and variational inference. [sent-33, score-0.154]

21 Finally, the applicability of the algorithm to real data is presented through an example of clustering a spatio-temporal dataset of aircraft trajectories recorded across the United States. [sent-34, score-0.228]

22 2 Background The Dirichlet process (DP) is a prior over mixture models, where the number of mixture components is not known a priori[16]. [sent-35, score-0.186]

23 The dependent Dirichlet process (DDP)[5], an extension to the DP, is a prior over evolving mixture models. [sent-39, score-0.223]

24 Given a Poisson process construction[6], the DDP essentially forms a Markov chain of DPs (D1 , D2 , . [sent-40, score-0.032]

25 In slightly more detail, if Dt is the DP at time step t, then the following procedure defines the generative model of Dt conditioned on Dt−1 ∼ DP(µt−1 ): 1. [sent-45, score-0.052]

26 If the DDP is used as a prior over a mixture model, these three operations allow new mixture components to arise over time, and old mixture components to exhibit dynamics and perhaps disappear over time. [sent-58, score-0.324]

27 However, an important result of using such a construction is the development of an explicit posterior for Dt given observations of the points θkt at timestep t. [sent-60, score-0.13]

28 For each point k that was observed in Dτ for some τ : 1 ≤ τ ≤ t, define: nkt ∈ N as the number of observations of point k in timestep t; ckt ∈ N as the number of past 2 t−1 observations of point k prior to timestep t, i. [sent-61, score-0.625]

29 ckt = τ =1 nkτ ; qkt ∈ (0, 1) as the subsampling weight on point k at timestep t; and ∆tk as the number of time steps that have elapsed since point k was last observed. [sent-63, score-0.332]

30 Further, let νt be the measure for unobserved points at time step t. [sent-64, score-0.076]

31 Then, Dt |Dt−1 ∼ DP νt + k:nkt =0 qkt ckt T (· |θk(t−∆tk ) ) + (ckt + nkt )δθkt (1) k:nkt >0 where ckt = 0 for any point k that was first observed during timestep t. [sent-65, score-0.685]

32 This posterior leads directly to the development of a Gibbs sampling algorithm for the DDP, whose low-variance asymptotics are discussed further below. [sent-66, score-0.056]

33 3 Asymptotic Analysis of the DDP Mixture The dependent Dirichlet process Gaussian mixture model (DDP-GMM) serves as the foundation upon which the present work is built. [sent-67, score-0.194]

34 Throughout the rest of this paper, the subscript kt refers to quantities related to cluster k at time step t, and subscript it refers to quantities related to observation i at time step t. [sent-69, score-0.768]

35 The Gibbs sampling algorithm for the DDP-GMM iterates between sampling labels zit for datapoints yit given the set of parameters {θkt }, and sampling parameters θkt given each group of data {yit : zit = k}. [sent-70, score-0.859]

36 Given these functions and distributions, the low-variance asymptotic limits (i. [sent-72, score-0.037]

37 1 Setting Labels Given Parameters In the label sampling step, a datapoint yit can either create a new cluster, join a current cluster, or revive an old, transitioned cluster. [sent-76, score-0.531]

38 Using the distributions defined previously, the label assignment probabilities are  2  αt (2π(σ + ρ))−d/2 exp − ||yit −φ|| k =K +1  2(σ+ρ)   2 (ckt + nkt )(2πσ)−d/2 exp − ||yit −θkt || nkt > 0 p(zit = k| . [sent-77, score-0.681]

39 ) ∝ (3) 2σ   ||yit −θk(t−∆tk ) ||2   qkt ckt (2π(σ + ξ∆tk ))−d/2 exp − nkt = 0 2(σ+ξ∆tk ) t where qkt = q ∆tk due to the fact that q(θ) is constant over Ω, and αt = αν 1−q where αν is the 1−q concentration parameter for the innovation process, Ft . [sent-80, score-0.551]

40 2 Setting Parameters given Labels In the parameter sampling step, the parameters are sampled using the distribution p(θkt |{yit : zit = k}) ∝ p({yit : zit = k}|θkt )p(θkt ) (6) There are two cases to consider when setting a parameter θkt . [sent-84, score-0.448]

41 Either ∆tk = 0 and the cluster is new in the current time step, or ∆tk > 0 and the cluster was previously created, disappeared for some amount of time, and then was revived in the current time step. [sent-85, score-0.654]

42 New Cluster Suppose cluster k is being newly created. [sent-86, score-0.218]

43 Revived Cluster Suppose there are ∆tk time steps where cluster k was not observed, but there are now nkt data points with mean mkt assigned to it in this time step. [sent-89, score-0.659]

44 (9) Again using conjugacy of normal likelihoods and priors, θkt |{yit : zit = k} ∼ N (θpost , σpost ) θpost = σpost θ ξ∆tk + σ + nkt i=1 yit σ , σpost = 1 ξ∆tk + σ + nkt σ −1 (10) Similarly to the label assignment step, let ξ = τ σ. [sent-91, score-1.152]

45 τ controls how much the current data is favored - as τ increases, the weight on current data increases, which is explained by the fact that our uncertainty in where the old θ transitioned to increases with τ . [sent-93, score-0.212]

46 An interesting interpre−1 tation of this update is that it behaves like a standard Kalman filter, in which wkt serves as the current estimate variance, τ serves as the process noise variance, and nkt serves as the inverse of the measurement variance. [sent-96, score-0.533]

47 1 Algorithm Description As shown in the previous section, the low-variance asymptotic limit of the DDP Gibbs sampling algorithm is a deterministic observation label update (5) followed by a deterministic, weighted leastsquares parameter update (12). [sent-99, score-0.196]

48 Inspired by the original K-Means algorithm, applying these two updates iteratively yields an algorithm which clusters a set of observations at a single time step given cluster means and weights from past time steps (Algorithm 2). [sent-100, score-0.518]

49 Applying Algorithm 2 to a sequence of batches of data yields a clustering procedure that is able to track a set of dynamically evolving clusters (Algorithm 1), and allows new clusters to emerge and old clusters to be forgotten. [sent-101, score-0.73]

50 For example, Algorithm 1 may be used as an any-time clustering algorithm for large datasets, where the sequence of batches is generated by selecting random subsets of the full dataset. [sent-103, score-0.122]

51 Multiple random restarts of the algorithm with different assignment orders may be used to mitigate this dependence. [sent-107, score-0.106]

52 Indeed, it is; Theorem 1 shows that a particular cost function Lt monotonically decreases under the label and parameter updates (5) and (12) at each time step. [sent-109, score-0.114]

53 It is noted that for new clusters, θkt = θk(t−∆tk ) since ∆tk = 0, so the least squares cost is unweighted. [sent-114, score-0.035]

54 The A SSIGN PARAMS function calculates this cost function in each iteration of Algorithm 2, and the algorithm terminates once the cost function does not decrease during an iteration. [sent-115, score-0.07]

55 While λ represents how far an observation can be from a cluster before it is placed in a new cluster, and thus can be tuned intuitively, Q and τ are not so straightforward. [sent-118, score-0.218]

56 The parameter Q represents a conceptual added distance from any data point to a cluster for every time step that the cluster is not observed. [sent-119, score-0.488]

57 The parameter τ represents a conceptual reduction of distance from any data point to a cluster for every time step that the cluster is not observed. [sent-120, score-0.488]

58 5 Related Work Prior k-means clustering algorithms that determine the number of clusters present in the data have primarily involved a method for iteratively modifying k using various statistical criteria [13, 14, 18]. [sent-125, score-0.245]

59 MONIC [19] and MC3 [20] have the capability to monitor time-varying clusters; however, these methods require datapoints to be identifiable across timesteps, and determine cluster similarity across timesteps via the commonalities between label assignments. [sent-128, score-0.358]

60 The Dynamic Means algorithm does not require such information, and tracks clusters essentially based on similarity of the parameters across timesteps. [sent-129, score-0.152]

61 Evolutionary clustering [21, 22], similar to Dynamic Means, minimizes an objective consisting of a cost for clustering the present data set and a cost related to the comparison between the current clustering and past clusterings. [sent-130, score-0.374]

62 The present work can be seen as a theoretically-founded extension of this class of algorithm that provides methods for automatic and adaptive prior weight selection, forming correspondences between old and current clusters, and for deciding when to introduce new clusters. [sent-131, score-0.118]

63 particle learning [23] or multi-target tracking [24, 25]) can be adapted for use in the present context, but suffer the drawbacks typical of particle filtering methods. [sent-134, score-0.262]

64 1 Applications Synthetic Gaussian Motion Data In this experiment, moving Gaussian clusters on [0, 1] × [0, 1] were generated synthetically over a period of 100 time steps. [sent-136, score-0.176]

65 Between time steps, the cluster centers moved randomly, with displacements sampled from the same distribution. [sent-140, score-0.242]

66 (1d - 1e): Comparison with Gibbs sampling, variational inference, and particle learning. [sent-185, score-0.154]

67 (1f): Comparison of accuracy when enforcing (Gibbs, DynMeans) and not enforcing (Gibbs NC, DynMeans NC) correct cluster tracking. [sent-187, score-0.26]

68 This data was clustered with Dynamic Means (with 3 random assignment ordering restarts), DDPGMM Gibbs sampling [6], variational inference [3], and particle learning [4] on a computer with an Intel i7 processor and 16GB of memory. [sent-188, score-0.31]

69 First, the number of clusters was fixed to 5, and the parameter space of each algorithm was searched for the best possible cluster label accuracy (taking into account correct cluster tracking across time steps). [sent-189, score-0.747]

70 Figures 1a and 1b show how the average clustering accuracy varies with the parameters after fixing either kτ or TQ to their values at the maximum accuracy parameter setting over the full space. [sent-191, score-0.177]

71 The histogram in Figure 1c demonstrates that the clustering speed is robust to the setting of parameters. [sent-193, score-0.093]

72 Using the best parameter setting for each algorithm, the data as described above were clustered in 50 trials with a varying number of clusters present in the data. [sent-195, score-0.178]

73 01 were used, and the algorithm was again given 3 attempts with random labeling assignment orders, where the lowest cost solution of the 3 was picked to proceed to the next time step. [sent-199, score-0.129]

74 The number of samples for the Gibbs sampling algorithm was 5000 with one recorded for every 5 samples, the number of particles for the particle learning algorithm was 100, and the variational inference algorithm was run to a tolerance of 10−20 with the maximum number of iterations set to 5000. [sent-203, score-0.244]

75 In Figures 1d and 1e, the labeling accuracy and clustering time (respectively) for the algorithms is shown. [sent-204, score-0.189]

76 The sampling algorithms were handicapped to generate Figure 1d; the best posterior sample in terms of labeling accuracy was selected at each time step, which required knowledge of the true labeling. [sent-205, score-0.186]

77 Further, the accuracy computation included enforcing consistency across timesteps, to allow tracking individual cluster trajectories. [sent-206, score-0.298]

78 accuracy considers each time step independently), the other algorithms provide accuracies more comparable to those of the Dynamic Means algorithm. [sent-209, score-0.094]

79 This effect is demonstrated in Figure 1f, which shows the time/accuracy tradeoff for Gibbs sampling (varying the number of samples) and Dynamic Means (varying the number of restarts). [sent-210, score-0.056]

80 These examples illustrate that Dynamic Means outperforms standard inference algorithms in both label accuracy and computation time for cluster tracking problems. [sent-211, score-0.411]

81 3 Figure 2: Results of the GP aircraft trajectory clustering. [sent-221, score-0.153]

82 Right: A track of plane counts for the 12 clusters during the week, with color intensity proportional to the number of takeoffs at each time. [sent-223, score-0.183]

83 Automatic dependent surveillance-broadcast (ADS-B) data, including plane identification, timestamp, latitude, longitude, heading and speed, was collected from all transmitting planes across the United States during the week from 2013-3-22 1:30:0 to 2013-3-28 12:0:0 UTC. [sent-233, score-0.116]

84 Then, individual ADS-B messages were connected together based on their plane identification and timestamp to form trajectories, and erroneous trajectories were filtered based on reasonable spatial/temporal bounds, yielding 17,895 unique trajectories. [sent-234, score-0.088]

85 Then, for each trajectory, a Gaussian process was trained using the latitude and longitude of each ADS-B point along the trajectory as the inputs and the North and East components of plane velocity at those points as the outputs. [sent-235, score-0.194]

86 Of the resulting 17,895 feature vectors, 600 were hand-labeled (each label including a confidence weight in [0, 1]). [sent-237, score-0.055]

87 The feature vectors were clustered using the DP-Means algorithm on the entire dataset in a single batch, and using Dynamic Means / DDPGMM Gibbs sampling (with 50 samples) with half-hour takeoff window batches. [sent-238, score-0.151]

88 & accuracy The results of this exercise are provided in Figure 2 and Table 1. [sent-239, score-0.042]

89 Table 1: Mean computational time data on hand-labeled aircraft trajectory Figure 2 shows the spatial and temporal properties of the 12 most popular clusters discovered by Dynamic Means, demonstrating Alg. [sent-240, score-0.329]

90 4 × 104 and accuracy for the three algorithms tested over 20 trials. [sent-249, score-0.042]

91 The confidence-weighted accuracy was computed by taking the ratio between the sum of the weights for correctly labeled points and the sum of all weights. [sent-250, score-0.066]

92 The DDPGMM Gibbs sampling algorithm was handicapped as described in the synthetic experiment section. [sent-251, score-0.09]

93 Of the three algorithms, Dynamic Means provided the highest labeling accuracy, while requiring orders of magnitude less computation time than both DP-Means and DDPGMM Gibbs sampling. [sent-252, score-0.086]

94 7 Conclusion This work developed a clustering algorithm for batch-sequential data containing temporally evolving clusters, derived from a low-variance asymptotic analysis of the Gibbs sampling algorithm for the dependent Dirichlet process mixture model. [sent-253, score-0.409]

95 Synthetic and real data experiments demonstrated that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. [sent-254, score-0.215]

96 The speed of inference coupled with the convergence guarantees provided yield an algorithm which is suitable for use in time-critical applications, such as online model-based autonomous planning systems. [sent-255, score-0.034]

97 Markov chain sampling methods for dirichlet process mixture models. [sent-263, score-0.263]

98 Construction of dependent dirichlet processes based on poisson processes. [sent-283, score-0.153]

99 Unsupervised discovery of object classes from range data using latent dirichlet allocation. [sent-293, score-0.098]

100 Estimating the number of clusters in a data set via the gap statistic. [sent-309, score-0.152]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kt', 0.446), ('tk', 0.353), ('nkt', 0.293), ('yit', 0.275), ('cluster', 0.218), ('zit', 0.196), ('ddp', 0.153), ('clusters', 0.152), ('gibbs', 0.128), ('ckt', 0.12), ('revived', 0.12), ('dynamic', 0.114), ('particle', 0.112), ('dt', 0.111), ('post', 0.111), ('aircraft', 0.106), ('dynmeans', 0.103), ('ssign', 0.103), ('dirichlet', 0.098), ('old', 0.093), ('clustering', 0.093), ('dp', 0.093), ('nq', 0.091), ('timestep', 0.083), ('mixture', 0.077), ('ct', 0.076), ('mkt', 0.076), ('zt', 0.071), ('abels', 0.069), ('ddpgmm', 0.069), ('ddpmm', 0.069), ('qkt', 0.069), ('takeoff', 0.069), ('transitioned', 0.069), ('wkt', 0.069), ('evolving', 0.059), ('sampling', 0.056), ('label', 0.055), ('dependent', 0.055), ('lt', 0.052), ('revive', 0.051), ('nc', 0.05), ('yt', 0.05), ('means', 0.049), ('trajectory', 0.047), ('params', 0.045), ('accuracy', 0.042), ('tf', 0.042), ('variational', 0.042), ('assignment', 0.04), ('tracking', 0.038), ('asymptotic', 0.037), ('tq', 0.036), ('subsampling', 0.036), ('cost', 0.035), ('timesteps', 0.034), ('restarts', 0.034), ('handicapped', 0.034), ('hedibert', 0.034), ('lopes', 0.034), ('luster', 0.034), ('monic', 0.034), ('pdate', 0.034), ('plagemann', 0.034), ('inference', 0.034), ('evolutionary', 0.033), ('orders', 0.032), ('kulis', 0.032), ('process', 0.032), ('plane', 0.031), ('fri', 0.03), ('week', 0.03), ('latitude', 0.03), ('longitude', 0.03), ('matt', 0.03), ('labeling', 0.03), ('serves', 0.03), ('trajectories', 0.029), ('batches', 0.029), ('nt', 0.028), ('wolfram', 0.028), ('timestamp', 0.028), ('cpu', 0.028), ('step', 0.028), ('capability', 0.027), ('clustered', 0.026), ('current', 0.025), ('carvalho', 0.025), ('dpmm', 0.025), ('durham', 0.025), ('time', 0.024), ('points', 0.024), ('robotics', 0.024), ('bayesian', 0.024), ('update', 0.024), ('contemporary', 0.024), ('cf', 0.024), ('datapoints', 0.024), ('observations', 0.023), ('ows', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000015 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin

Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1

2 0.20575467 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations

Author: Botond Cseke, Manfred Opper, Guido Sanguinetti

Abstract: We propose an approximate inference algorithm for continuous time Gaussian Markov process models with both discrete and continuous time likelihoods. We show that the continuous time limit of the expectation propagation algorithm exists and results in a hybrid fixed point iteration consisting of (1) expectation propagation updates for discrete time terms and (2) variational updates for the continuous time term. We introduce postinference corrections methods that improve on the marginals of the approximation. This approach extends the classical Kalman-Bucy smoothing procedure to non-Gaussian observations, enabling continuous-time inference in a variety of models, including spiking neuronal models (state-space models with point process observations) and box likelihood models. Experimental results on real and simulated data demonstrate high distributional accuracy and significant computational savings compared to discrete-time approaches in a neural application. 1

3 0.1830605 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

Author: Mijung Park, Jonathan W. Pillow

Abstract: The receptive field (RF) of a sensory neuron describes how the neuron integrates sensory stimuli over time and space. In typical experiments with naturalistic or flickering spatiotemporal stimuli, RFs are very high-dimensional, due to the large number of coefficients needed to specify an integration profile across time and space. Estimating these coefficients from small amounts of data poses a variety of challenging statistical and computational problems. Here we address these challenges by developing Bayesian reduced rank regression methods for RF estimation. This corresponds to modeling the RF as a sum of space-time separable (i.e., rank-1) filters. This approach substantially reduces the number of parameters needed to specify the RF, from 1K-10K down to mere 100s in the examples we consider, and confers substantial benefits in statistical power and computational efficiency. We introduce a novel prior over low-rank RFs using the restriction of a matrix normal prior to the manifold of low-rank matrices, and use “localized” row and column covariances to obtain sparse, smooth, localized estimates of the spatial and temporal RF components. We develop two methods for inference in the resulting hierarchical model: (1) a fully Bayesian method using blocked-Gibbs sampling; and (2) a fast, approximate method that employs alternating ascent of conditional marginal likelihoods. We develop these methods for Gaussian and Poisson noise models, and show that low-rank estimates substantially outperform full rank estimates using neural data from retina and V1. 1

4 0.16809489 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

Author: Jason Chang, John W. Fisher III

Abstract: We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed sampler enforces the correct stationary distribution of the Markov chain without the need for finite approximations. Empirical results illustrate that the new sampler exhibits better convergence properties than current methods. 1

5 0.14514332 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

Author: Roger Frigola, Fredrik Lindsten, Thomas B. Schon, Carl Rasmussen

Abstract: State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference and learning (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and, instead, infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity. 1

6 0.13108855 148 nips-2013-Latent Maximum Margin Clustering

7 0.12956414 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

8 0.12249312 348 nips-2013-Variational Policy Search via Trajectory Optimization

9 0.098090038 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

10 0.097045392 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

11 0.095481068 63 nips-2013-Cluster Trees on Manifolds

12 0.092320211 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

13 0.090739958 255 nips-2013-Probabilistic Movement Primitives

14 0.087085381 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

15 0.085671745 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations

16 0.078970894 280 nips-2013-Robust Data-Driven Dynamic Programming

17 0.07585831 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

18 0.074651167 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

19 0.069622457 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

20 0.068542719 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.18), (1, 0.024), (2, -0.036), (3, -0.017), (4, -0.03), (5, 0.125), (6, 0.144), (7, 0.088), (8, 0.109), (9, -0.073), (10, -0.109), (11, 0.019), (12, -0.008), (13, 0.089), (14, -0.042), (15, 0.013), (16, -0.073), (17, -0.035), (18, 0.076), (19, 0.015), (20, 0.045), (21, 0.012), (22, 0.028), (23, -0.1), (24, -0.037), (25, 0.006), (26, -0.1), (27, -0.084), (28, -0.024), (29, -0.02), (30, 0.016), (31, -0.013), (32, 0.288), (33, -0.067), (34, -0.104), (35, -0.155), (36, -0.044), (37, -0.144), (38, -0.111), (39, -0.021), (40, -0.033), (41, -0.023), (42, 0.041), (43, -0.005), (44, -0.085), (45, 0.14), (46, -0.032), (47, -0.049), (48, 0.017), (49, 0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95249987 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin

Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1

2 0.6723125 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

Author: Jason Chang, John W. Fisher III

Abstract: We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed sampler enforces the correct stationary distribution of the Markov chain without the need for finite approximations. Empirical results illustrate that the new sampler exhibits better convergence properties than current methods. 1

3 0.61323607 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations

Author: Botond Cseke, Manfred Opper, Guido Sanguinetti

Abstract: We propose an approximate inference algorithm for continuous time Gaussian Markov process models with both discrete and continuous time likelihoods. We show that the continuous time limit of the expectation propagation algorithm exists and results in a hybrid fixed point iteration consisting of (1) expectation propagation updates for discrete time terms and (2) variational updates for the continuous time term. We introduce postinference corrections methods that improve on the marginals of the approximation. This approach extends the classical Kalman-Bucy smoothing procedure to non-Gaussian observations, enabling continuous-time inference in a variety of models, including spiking neuronal models (state-space models with point process observations) and box likelihood models. Experimental results on real and simulated data demonstrate high distributional accuracy and significant computational savings compared to discrete-time approaches in a neural application. 1

4 0.61125344 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

Author: Mijung Park, Jonathan W. Pillow

Abstract: The receptive field (RF) of a sensory neuron describes how the neuron integrates sensory stimuli over time and space. In typical experiments with naturalistic or flickering spatiotemporal stimuli, RFs are very high-dimensional, due to the large number of coefficients needed to specify an integration profile across time and space. Estimating these coefficients from small amounts of data poses a variety of challenging statistical and computational problems. Here we address these challenges by developing Bayesian reduced rank regression methods for RF estimation. This corresponds to modeling the RF as a sum of space-time separable (i.e., rank-1) filters. This approach substantially reduces the number of parameters needed to specify the RF, from 1K-10K down to mere 100s in the examples we consider, and confers substantial benefits in statistical power and computational efficiency. We introduce a novel prior over low-rank RFs using the restriction of a matrix normal prior to the manifold of low-rank matrices, and use “localized” row and column covariances to obtain sparse, smooth, localized estimates of the spatial and temporal RF components. We develop two methods for inference in the resulting hierarchical model: (1) a fully Bayesian method using blocked-Gibbs sampling; and (2) a fast, approximate method that employs alternating ascent of conditional marginal likelihoods. We develop these methods for Gaussian and Poisson noise models, and show that low-rank estimates substantially outperform full rank estimates using neural data from retina and V1. 1

5 0.57222641 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

Author: Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael Jordan

Abstract: Research on distributed machine learning algorithms has focused primarily on one of two extremes—algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this “optimistic concurrency control” paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment. 1

6 0.54028046 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

7 0.51655728 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

8 0.4991349 148 nips-2013-Latent Maximum Margin Clustering

9 0.48216125 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

10 0.46459064 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

11 0.44280705 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

12 0.44035947 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models

13 0.42152324 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

14 0.41008961 63 nips-2013-Cluster Trees on Manifolds

15 0.40785453 255 nips-2013-Probabilistic Movement Primitives

16 0.40278542 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling

17 0.40076917 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent

18 0.39263541 277 nips-2013-Restricting exchangeable nonparametric distributions

19 0.39044821 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

20 0.38302726 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.07), (33, 0.104), (34, 0.158), (41, 0.017), (49, 0.03), (56, 0.094), (70, 0.039), (85, 0.027), (89, 0.038), (93, 0.036), (95, 0.013), (99, 0.281)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85455883 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar

Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1

2 0.84872663 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation

Author: Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia

Abstract: In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency. 1

3 0.79378921 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces

Author: Leonid Boytsov, Bilegsaikhan Naidan

Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1

same-paper 4 0.78641677 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin

Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1

5 0.75851941 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Author: Fang Han, Han Liu

Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1

6 0.71234119 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

7 0.61383319 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

8 0.60919684 86 nips-2013-Demixing odors - fast inference in olfaction

9 0.60694605 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

10 0.60536253 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

11 0.60396284 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent

12 0.60368294 143 nips-2013-Integrated Non-Factorized Variational Inference

13 0.60338974 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

14 0.60290217 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression

15 0.60205936 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

16 0.60146552 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

17 0.60075015 347 nips-2013-Variational Planning for Graph-based MDPs

18 0.6001215 322 nips-2013-Symbolic Opportunistic Policy Iteration for Factored-Action MDPs

19 0.59988058 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

20 0.5996474 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models