nips nips2013 nips2013-40 knowledge-graph by maker-knowledge-mining

40 nips-2013-Approximate Inference in Continuous Determinantal Processes

Source: pdf

Author: Raja Hafiz Affandi, Emily Fox, Ben Taskar

Abstract: Determinantal point processes (DPPs) are random point processes well-suited for modeling repulsion. In machine learning, the focus of DPP-based models has been on diverse subset selection from a discrete and ﬁnite base set. This discrete setting admits an efﬁcient sampling algorithm based on the eigendecomposition of the deﬁning kernel matrix. Recently, there has been growing interest in using DPPs deﬁned on continuous spaces. While the discrete-DPP sampler extends formally to the continuous case, computationally, the steps required are not tractable in general. In this paper, we present two efﬁcient DPP sampling schemes that apply to a wide range of kernel functions: one based on low rank approximations via Nystr¨ m o and random Fourier feature techniques and another based on Gibbs sampling. We demonstrate the utility of continuous DPPs in repulsive mixture modeling and synthesizing human poses spanning activity spaces. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This discrete setting admits an efﬁcient sampling algorithm based on the eigendecomposition of the deﬁning kernel matrix. [sent-8, score-0.432]

2 While the discrete-DPP sampler extends formally to the continuous case, computationally, the steps required are not tractable in general. [sent-10, score-0.152]

3 In this paper, we present two efﬁcient DPP sampling schemes that apply to a wide range of kernel functions: one based on low rank approximations via Nystr¨ m o and random Fourier feature techniques and another based on Gibbs sampling. [sent-11, score-0.359]

4 We demonstrate the utility of continuous DPPs in repulsive mixture modeling and synthesizing human poses spanning activity spaces. [sent-12, score-0.48]

5 1 Introduction Samples from a determinantal point process (DPP) [15] are sets of points that tend to be spread out. [sent-13, score-0.16]

6 More speciﬁcally, given Ω ⊆ Rd and a positive semideﬁnite kernel function L : Ω × Ω → R, the probability density of a point conﬁguration A ⊂ Ω under a DPP with kernel L is given by PL (A) ∝ det(LA ) , (1) where LA is the |A| × |A| matrix with entries L(x, y) for each x, y ∈ A. [sent-14, score-0.437]

7 The tendency for repulsion is captured by the determinant since it depends on the volume spanned by the selected points in the associated Hilbert space of L. [sent-15, score-0.137]

8 [10], which relies on the eigendecomposition of the kernel matrix to recursively sample points based on their projections onto the subspace spanned by the selected eigenvectors. [sent-19, score-0.387]

9 Repulsive point processes, like hard core processes [7, 16], many based on thinned Poisson processes and Gibbs/Markov distributions, have a long history in the spatial statistics community, where considering continuous Ω is key. [sent-20, score-0.183]

10 Repulsive processes on continuous spaces have garnered interest in machine learning as well, especially relating to generative mixture modeling [18, 29]. [sent-22, score-0.24]

11 On the surface, it seems that the eigendecomposition and projection algorithm of [10] for discrete DPPs would naturally extend to the continuous case. [sent-24, score-0.209]

12 The absence of a tractable DPP sampling algorithm for general kernels in continuous spaces has hindered progress in developing DPP-based models for repulsion. [sent-26, score-0.278]

13 In this paper, we propose an efﬁcient algorithm to sample from DPPs in continuous spaces using low-rank approximations of the kernel function. [sent-27, score-0.369]

14 For k-DPPs, which only place positive probability on sets of cardinality k [13], we also devise a Gibbs sampler that iteratively samples points in the k-set conditioned on all k − 1 other points. [sent-30, score-0.148]

15 Our methods allow us to handle a broad range of typical kernels and continuous subspaces, provided certain simple integrals of the kernel function can be computed efﬁciently. [sent-32, score-0.316]

16 Decomposing our kernel into quality and similarity terms as in [13], this includes, but is not limited to, all cases where the (i) spectral density of the quality and (ii) characteristic function of the similarity kernel can be computed efﬁciently. [sent-33, score-0.635]

17 2, we review sampling algorithms for discrete DPPs and the challenges associated with sampling from continuous DPPs. [sent-36, score-0.299]

18 We then propose continuous DPP sampling algorithms based on low-rank kernel approximations in Sec. [sent-37, score-0.398]

19 Finally, we apply our methods to repulsive mixture modeling and human pose synthesis in Sec. [sent-42, score-0.322]

20 2 Sampling from a DPP When Ω is discrete with cardinality N , an efﬁcient algorithm for sampling from a DPP is given in [10]. [sent-44, score-0.165]

21 The algorithm, which is detailed in the supplement, uses an eigendecomposition of the N kernel matrix L = n=1 λn vn vn and recursively samples points xi as follows, resulting in a set A ∼ DPP(L) with A = {xi }: Phase 1 Select eigenvector vn with probability λn λn +1 . [sent-45, score-0.461]

22 When Ω is discrete, both steps are straightforward since the ﬁrst phase involves eigendecomposing a kernel matrix and the second phase involves sampling from discrete probability distributions based on inner products between points and eigenvectors. [sent-52, score-0.492]

23 Extending this algorithm to a continuous space was considered by [14], but for a very limited set of kernels L and spaces Ω. [sent-53, score-0.178]

24 Extending Phase 1 to a continuous space requires knowledge of the eigendecomposition of the kernel function. [sent-55, score-0.361]

25 When Ω is a compact rectangle in Rd , [14] suggest approximating the eigendecomposition using an orthonormal Fourier basis. [sent-56, score-0.132]

26 Even if we are able to obtain the eigendecomposition of the kernel function (either directly or via approximations as considered in [14] and Sec. [sent-57, score-0.344]

27 Whereas the discrete case only requires sampling from a discrete probability function, here we have to sample from a probability density. [sent-59, score-0.192]

28 When Ω is compact, [14] suggest using a rejection sampler with a uniform proposal on Ω. [sent-60, score-0.139]

29 The authors note that the acceptance rate of this rejection sampler decreases with the number of points sampled, making the method inefﬁcient in sampling large sets from a DPP. [sent-61, score-0.247]

30 In most other cases, implementing Phase 2 even via rejection sampling is infeasible since the target density is in general non-standard with unknown normalization. [sent-62, score-0.192]

31 In summary, current algorithms can sample approximately from a continuous DPP only for translationinvariant kernels deﬁned on a compact space. [sent-64, score-0.151]

32 3, we propose a sampling algorithm that allows us to sample approximately from DPPs for a wide range of kernels L and spaces Ω. [sent-66, score-0.261]

33 2 3 Sampling from a low-rank continuous DPP Again considering Ω discrete with cardinality N , the sampling algorithm of Sec. [sent-67, score-0.229]

34 The basic idea is to exploit the fact that L and the dual kernel matrix C = BB , which is D × D, share the same nonzero eigenvalues, and for each eigenvector vk of L, Bvk is the corresponding eigenvector of C. [sent-72, score-0.329]

35 While the dependence on N in the dual is sharply reduced, in continuous spaces, N is inﬁnite. [sent-74, score-0.114]

36 Generically, consider sampling from a DPP ∞ on a continuous space Ω with kernel L(x, y) = n=1 λn φn (x)φn (y),where λn and φn (x) are eigenvalues and eigenfunctions, and φn (y) is the complex conjugate of φn (y). [sent-76, score-0.417]

37 Using such a low-rank representation, we propose an analog of the dual sampling algorithm for continuous spaces, described in Algorithm 1. [sent-86, score-0.214]

38 Algorithm 1 Dual sampler for a low-rank continuous DPP ˜ Input: L(x, y) = B(x)∗ B(y), PHASE 2 a rank-D DPP kernel X←∅ PHASE 1 while |V | > 0 do 1 Compute C = Ω B(x)B(x)∗ dx ˆ Sample x from f (x) = |V | v∈V |v∗ B(x)|2 D ∗ X ← X ∪ {ˆ } x Compute eigendecomp. [sent-90, score-0.367]

39 v1 , v2 = v1 Cv2 V ← { √ vk }k∈J ∗ Cv Output: X vk k In this dual view, we still have the same two-phase structure, and must address two key challenges: Phase 1 Assuming a low-rank kernel function decomposition as in Eq. [sent-97, score-0.329]

40 (2), we need to able to compute the dual kernel matrix, given by an integral: B(x)B(x)∗ dx . [sent-98, score-0.265]

41 C= (3) Ω Phase 2 In general, sampling directly from the density f (x) is difﬁcult; instead, we can compute the cumulative distribution function (CDF) and sample x using the inverse CDF method [21]: d xl ˆ l=1 −∞ F (ˆ = (ˆ1 , . [sent-99, score-0.297]

42 (4) ˜ Assuming (i) the kernel function L is ﬁnite-rank and (ii) the terms C and f (x) are computable, ˜ Algorithm 1 provides exact samples from a DPP with kernel L. [sent-103, score-0.374]

43 In what follows, approximations only ˜ arise from approximating general kernels L with low-rank kernels L. [sent-104, score-0.199]

44 If given a ﬁnite-rank kernel L to begin with, the sampling procedure is exact. [sent-105, score-0.287]

45 For our approximation algorithm to work, not only do we need methods that approximate the kernel function well, but also that enable us to solve Eq. [sent-110, score-0.212]

46 (3) and (4) directly for many different kernel functions. [sent-111, score-0.187]

47 The frequencies are sampled independently from the Fourier transform of the kernel function, ωj ∼ F(k(x − y)), and letting: 1 ˜ k(x − y) = D D exp{iωj (x − y)} , x, y ∈ Ω . [sent-115, score-0.187]

48 (5) j=1 To apply RFFs, we factor L into a quality function q and similarity kernel k (i. [sent-116, score-0.275]

49 (5), we can approximate the similarity kernel function to obtain a low-rank kernel and dual matrix: 1 ˜ LRF F (x, y) = D D RF q(x) exp{iωj (x − y)}q(y), Cjk F = j=1 1 D q 2 (x) exp{i(ωj − ωk ) x}dx. [sent-123, score-0.479]

50 Ω The CDF of the sampling distribution f (x) in Algorithm 1 is given by: 1 FRF F (ˆ ) = x |V | D D d ∗ vj vk v∈V j=1 k=1 xl ˆ q 2 (x) exp{i(ωj − ωk ) x}1{xl ∈Ω} dxl . [sent-124, score-0.326]

51 In fact, this method works for any combination of (i) translation-invariant similarity kernel k with known characteristic function and (ii) quality function q with known spectral density. [sent-127, score-0.297]

52 The resulting kernel L need not be translation invariant. [sent-128, score-0.187]

53 In the supplement, we illustrate this method by considering a common and important example where Ω = Rd , q(x) is Gaussian, and k(x, y) is any kernel with known Fourier transform. [sent-129, score-0.187]

54 2 Sampling from a Nystr¨ m-approximated DPP o Another approach to kernel approximation is the Nystr¨ m method [27]. [sent-131, score-0.212]

55 , zD o landmarks sampled from Ω, we can approximate the kernel function and dual matrix as, D D D D N 2 Wjk L(x, zj )L(zk , y), Cjkys = ˜ LN ys (x, y) = D n=1 where Wjk = L(zj , zk )−1/2 . [sent-135, score-0.324]

56 1 is: d xl ˆ l=1 −∞ wj (v)wk (v) v∈V j=1 k=1 L(zn , x)L(x, zm )dx, Ω L(x, zj )L(zk , x)1{xl ∈Ω} dxl . [sent-137, score-0.203]

57 Here, there are no translation-invariant requirements, even for the similarity kernel k. [sent-139, score-0.242]

58 4 Gibbs sampling For k-DPPs, we can consider a Gibbs sampling scheme. [sent-141, score-0.2]

59 Let the kernel function be represented as before: L(x, y) = q(x)k(x, y)q(y). [sent-143, score-0.187]

60 Denoting J \k = {xj }j=k and M \k = L−1 the full conditional can J \k be simpliﬁed using Schur’s determinantal equality [22]: \k p(xk |{xj }j=k ) ∝ L(xk , xk ) − Mij L(xi , xk )L(xj , xk ). [sent-144, score-0.265]

61 2 0 0 20 40 (c) 60 80 100 (d) Figure 1: Estimates of total variational distance for Nystr¨ m and RFF approximation methods to a DPP with o Gaussian quality and similarity with covariances Γ = diag(ρ2 , . [sent-161, score-0.19]

62 However, for a wide range of kernel functions, including those which can be handled by the Nystr¨ m approximation in Sec. [sent-171, score-0.237]

63 We use this same Schur complement scheme for sampling from the full conditionals in the mixture model application of Sec. [sent-174, score-0.201]

64 A key advantage of this scheme for several types of kernels is that the complexity of sampling scales linearly with the number of dimensions d making it suitable in handling high-dimensional spaces. [sent-176, score-0.165]

65 In cases where the kernel introduces low repulsion we expect the Gibbs sampler to mix well, while in a high repulsion setting the sampler can mix slowly due to the strong dependencies between points and fact that we are only doing one-point-at-a-time moves. [sent-178, score-0.531]

66 5 Empirical analysis To evaluate the performance of the RFF and Nystr¨ m approximations, we compute the total variational o distance PL − PL 1 = 1 X |PL (X) − PL (X)|, where PL (X) denotes the probability of set X ˜ ˜ 2 under a DPP with kernel L, as given by Eq. [sent-185, score-0.239]

67 We restrict our analysis to the case where the quality function and similarity kernel are Gaussians with isotropic covariances Γ = diag(ρ2 , . [sent-187, score-0.3]

68 1 displays estimates of the total variational distance for the RFF and Nystr¨ m approximations o when ρ2 = 1, varying σ 2 (the repulsion strength) and the dimension d. [sent-197, score-0.168]

69 While this phenomenon seems perplexing at ﬁrst, a study of the eigenvalues of the Gaussian kernel across dimensions sheds light on the rationale (see Fig. [sent-199, score-0.274]

70 It has been previously demonstrated that the Nystr¨ m o method performs favorably in kernel learning tasks compared to RFF in cases where there is a large eigengap in the kernel matrix [28]. [sent-202, score-0.374]

71 For the RFF, while the kernel approximation is guaranteed to be an unbiased estimate of the true kernel element-wise, the variance is fairly high [19]. [sent-206, score-0.399]

72 For the Nystr¨ m method, on the other hand, the quality of the approximation depends on how well the o landmarks cover Ω. [sent-208, score-0.117]

73 We believe the same result holds for continuous Ω by extending the eigenvalues and spectral norm of the kernel matrix to operator eigenvalues and operator norms, respectively. [sent-215, score-0.383]

74 6 Repulsive priors for mixture models Mixture models are used in a wide range of applications from clustering to density estimation. [sent-217, score-0.193]

75 Instead, [18] show that sampling the location parameters using repulsive priors leads to better separated clusters while maintaining the accuracy of the density estimate. [sent-222, score-0.438]

76 They propose a class of repulsive priors that rely on explicitly deﬁning a distance metric and the manner in which small distances are penalized. [sent-223, score-0.243]

77 The theoretical properties of DPPs make them an appealing choice as a repulsive prior. [sent-225, score-0.194]

78 In fact, [29] considered using DPPs as repulsive priors in latent variable models. [sent-226, score-0.219]

79 However, in the absence of a feasible continuous DPP sampling algorithm, their method was restricted to performing MAP inference. [sent-227, score-0.164]

80 In the common case of mixtures of Gaussians (MoG), our posterior computations can be performed using Gibbs sampling with nearly the same simplicity of the standard case where the location parameters µk are assumed to be i. [sent-229, score-0.154]

81 For the location parameters, instead of sampling each µk independently from its conditional posterior, our full conditional depends upon the other locations µ\k as well. [sent-237, score-0.125]

82 We assess the clustering and density estimation performance of the DPP-based model on both synthetic and real datasets. [sent-241, score-0.117]

83 Synthetic data To assess the role of the prior in a density estimation task, we generated a small sample of 100 observations from a mixture of two Gaussians. [sent-246, score-0.195]

84 As a measure of simplicity of the resulting density description, we compute the average entropy of the posterior mixture membership distribution, which is a reasonable metric given the similarity of the overall densities. [sent-256, score-0.307]

85 We also assess the accuracy of the density estimate by computing both (i) Hamming distance error relative to true cluster labels and (ii) held-out log-likelihood on 100 observations. [sent-258, score-0.117]

86 2 −2 0 2 4 0 −1 0 1 2 3 4 0 −3 Well-Sep Poor-Sep Galaxy Enzyme Acidity Figure 2: For each synthetic and real dataset: (top) histogram of data overlaid with actual Gaussian mixture generating the synthetic data, and posterior mean mixture model for (middle) IID and (bottom) DPP. [sent-355, score-0.237]

87 Table 1: For IID and DPP on synthetic datasets: mean (stdev) for mixture membership entropy, cluster assignment error rate and held-out log-likelihood of 100 observations under the posterior mean density estimate. [sent-357, score-0.23]

88 We once again judge the complexity of the density estimates using the posterior mixture membership entropy as a proxy. [sent-376, score-0.252]

89 Finally, we consider a classiﬁcation task based on the iris dataset: 150 observations from three iris species with four length measurements. [sent-381, score-0.124]

90 A typical approximation is to use a set of perturbed expert trajectories as a comparison set, where a good set of trajectories should cover as large a part of the space as possible. [sent-391, score-0.119]

91 7 Table 2: For IID and DPP, mean (stdev) of (left) mixture membership entropy and held-out log-likelihood for three density estimation tasks and (right) classiﬁcation error under 2 vs. [sent-392, score-0.223]

92 2 0 0 Original DPP IID 50 100 ε DPP Samples Figure 3: Left: Diverse set of human poses relative to an original pose by sampling from an RFF (top) and Nystr¨ m (bottom) approximations with kernel based on MoCap of the activity dance. [sent-419, score-0.474]

93 To achieve this, we build a kernel with Gaussian quality and similarity using covariances estimated from the training data associated with the activity. [sent-427, score-0.3]

94 The Gaussian quality is centered about the selected reference pose and we synthesize new poses by sampling from our continuous DPP using the low-rank approximation scheme. [sent-428, score-0.329]

95 For the activity dance, to quantitatively assess our performance in covering the activity space, we compute a coverage rate metric based on a random sample of 50 poses from a DPP. [sent-431, score-0.223]

96 case by inﬂating the variance to match the diverse DPP sample, the DPP poses still provide better average coverage over 100 runs. [sent-440, score-0.157]

97 Our low-rank approach harnessed approximations provided by Nystr¨ m and random Fourier o feature methods and then utilized a continuous dual DPP representation. [sent-454, score-0.161]

98 The resulting approximate sampler garners the same efﬁciencies that led to the success of the DPP in the discrete case. [sent-455, score-0.123]

99 Finally, we demonstrated that continuous-DPP sampling is useful both for repulsive mixture modeling (which utilizes the Gibbs sampling scheme) and in synthesizing diverse human poses (which we demonstrated with the low-rank approximation method). [sent-458, score-0.663]

100 Using the Nystr¨ m method to speed up kernel machines. [sent-637, score-0.187]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dpp', 0.674), ('rff', 0.288), ('nystr', 0.236), ('dpps', 0.221), ('repulsive', 0.194), ('kernel', 0.187), ('determinantal', 0.13), ('xl', 0.112), ('eigendecomposition', 0.11), ('sampling', 0.1), ('iid', 0.096), ('sampler', 0.088), ('acidity', 0.085), ('mixture', 0.08), ('supplement', 0.07), ('phase', 0.07), ('fourier', 0.069), ('repulsion', 0.069), ('kulesza', 0.068), ('dxl', 0.068), ('mocap', 0.068), ('eigenvalues', 0.066), ('gibbs', 0.066), ('kernels', 0.065), ('continuous', 0.064), ('density', 0.063), ('iris', 0.062), ('diverse', 0.058), ('schur', 0.057), ('poses', 0.056), ('pl', 0.056), ('galaxy', 0.055), ('cdf', 0.055), ('similarity', 0.055), ('eigenfunctions', 0.052), ('dual', 0.05), ('spaces', 0.049), ('processes', 0.047), ('approximations', 0.047), ('entropy', 0.046), ('vk', 0.046), ('xk', 0.045), ('coverage', 0.043), ('enzyme', 0.041), ('nystrom', 0.041), ('spanned', 0.038), ('landmarks', 0.037), ('vn', 0.037), ('activity', 0.036), ('trajectories', 0.036), ('discrete', 0.035), ('cls', 0.034), ('frf', 0.034), ('jrss', 0.034), ('mog', 0.034), ('wjn', 0.034), ('membership', 0.034), ('quality', 0.033), ('clusters', 0.031), ('points', 0.03), ('assess', 0.03), ('affandi', 0.03), ('wjk', 0.03), ('cardinality', 0.03), ('posterior', 0.029), ('rejection', 0.029), ('dx', 0.028), ('mij', 0.028), ('die', 0.028), ('stdev', 0.028), ('synthesizing', 0.028), ('variational', 0.028), ('zk', 0.027), ('pose', 0.026), ('hough', 0.026), ('nerve', 0.026), ('diversity', 0.026), ('approximation', 0.025), ('xj', 0.025), ('covariances', 0.025), ('spatial', 0.025), ('location', 0.025), ('wide', 0.025), ('synthesize', 0.025), ('priors', 0.025), ('synthetic', 0.024), ('distance', 0.024), ('zj', 0.023), ('eigenvector', 0.023), ('heldout', 0.023), ('proposal', 0.022), ('cover', 0.022), ('sample', 0.022), ('approximating', 0.022), ('human', 0.022), ('characteristic', 0.022), ('phenomenon', 0.021), ('diag', 0.021), ('complement', 0.021), ('growing', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 40 nips-2013-Approximate Inference in Continuous Determinantal Processes

Author: Raja Hafiz Affandi, Emily Fox, Ben Taskar

2 0.4932549 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection

Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye

Abstract: Lasso is a widely used regression technique to ﬁnd sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efﬁciency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efﬁcient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. 1

3 0.39629638 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering

Author: Byungkon Kang

Abstract: Determinantal Point Process (DPP) has gained much popularity for modeling sets of diverse items. The gist of DPP is that the probability of choosing a particular set of items is proportional to the determinant of a positive deﬁnite matrix that deﬁnes the similarity of those items. However, computing the determinant requires time cubic in the number of items, and is hence impractical for large sets. In this paper, we address this problem by constructing a rapidly mixing Markov chain, from which we can acquire a sample from the given DPP in sub-cubic time. In addition, we show that this framework can be extended to sampling from cardinalityconstrained DPPs. As an application, we show how our sampling algorithm can be used to provide a fast heuristic for determining the number of clusters, resulting in better clustering.

4 0.22694342 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data

Author: Jasper Snoek, Richard Zemel, Ryan P. Adams

Abstract: Point processes are popular models of neural spiking behavior as they provide a statistical distribution over temporal sequences of spikes and help to reveal the complexities underlying a series of recorded action potentials. However, the most common neural point process models, the Poisson process and the gamma renewal process, do not capture interactions and correlations that are critical to modeling populations of neurons. We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. We show that this model is a natural extension of the popular generalized linear model to sets of interacting neurons. The model is extended to incorporate gain control or divisive normalization, and the modulation of neural spiking based on periodic phenomena. Applied to neural spike recordings from the rat hippocampus, we see that the model captures inhibitory relationships, a dichotomy of classes of neurons, and a periodic modulation by the theta rhythm known to be present in the data. 1

5 0.13474727 76 nips-2013-Correlated random features for fast semi-supervised learning

Author: Brian McWilliams, David Balduzzi, Joachim Buhmann

Abstract: This paper presents Correlated Nystr¨ m Views (XNV), a fast semi-supervised alo gorithm for regression and classiﬁcation. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, multiview regression, using Canonical Correlation Analysis (CCA) on unlabeled data, biases the regression towards useful features. It has been shown that CCA regression can substantially reduce variance with a minimal increase in bias if the views contains accurate estimators. Recent theoretical and empirical work shows that regression with random features closely approximates kernel regression, implying that the accuracy requirement holds for random views. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude. 1

6 0.1016748 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

7 0.090936139 156 nips-2013-Learning Kernels Using Local Rademacher Complexity

8 0.08871223 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

9 0.088117801 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

10 0.082747661 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations

11 0.081785195 289 nips-2013-Scalable kernels for graphs with continuous attributes

12 0.079868078 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test

13 0.068249144 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

14 0.062371727 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling

15 0.059403572 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

16 0.059191044 161 nips-2013-Learning Stochastic Inverses

17 0.058947958 252 nips-2013-Predictive PAC Learning and Process Decompositions

18 0.05801563 173 nips-2013-Least Informative Dimensions

19 0.057438817 137 nips-2013-High-Dimensional Gaussian Process Bandits

20 0.055903137 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, 0.073), (2, -0.014), (3, 0.016), (4, -0.052), (5, 0.119), (6, 0.094), (7, -0.009), (8, -0.048), (9, 0.017), (10, -0.061), (11, -0.046), (12, -0.113), (13, -0.223), (14, 0.296), (15, 0.136), (16, -0.052), (17, -0.296), (18, 0.488), (19, 0.049), (20, 0.081), (21, -0.225), (22, -0.023), (23, -0.043), (24, 0.022), (25, -0.022), (26, 0.062), (27, 0.028), (28, 0.001), (29, -0.02), (30, 0.026), (31, 0.026), (32, -0.082), (33, 0.041), (34, 0.031), (35, 0.037), (36, -0.003), (37, -0.007), (38, -0.001), (39, 0.023), (40, 0.003), (41, -0.027), (42, 0.007), (43, -0.031), (44, 0.035), (45, 0.051), (46, 0.025), (47, -0.021), (48, 0.029), (49, -0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90224719 40 nips-2013-Approximate Inference in Continuous Determinantal Processes

Author: Raja Hafiz Affandi, Emily Fox, Ben Taskar

2 0.83978105 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering

Author: Byungkon Kang

3 0.80500883 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection

Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye

4 0.3990013 76 nips-2013-Correlated random features for fast semi-supervised learning

Author: Brian McWilliams, David Balduzzi, Joachim Buhmann

5 0.39427638 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data

Author: Jasper Snoek, Richard Zemel, Ryan P. Adams

6 0.3344011 156 nips-2013-Learning Kernels Using Local Rademacher Complexity

7 0.3075836 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

8 0.29297307 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

9 0.27280724 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

10 0.26727363 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test

11 0.25499058 252 nips-2013-Predictive PAC Learning and Process Decompositions

12 0.25164786 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling

13 0.24916406 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

14 0.24752928 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

15 0.24632169 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space

16 0.24443455 289 nips-2013-Scalable kernels for graphs with continuous attributes

17 0.23406123 105 nips-2013-Efficient Optimization for Sparse Gaussian Process Regression

18 0.23341954 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

19 0.22468291 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations

20 0.21927695 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.027), (16, 0.06), (33, 0.145), (34, 0.108), (39, 0.019), (41, 0.035), (49, 0.049), (56, 0.094), (59, 0.232), (70, 0.044), (85, 0.025), (89, 0.038), (93, 0.025), (95, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85497355 134 nips-2013-Graphical Models for Inference with Missing Data

Author: Karthika Mohan, Judea Pearl, Jin Tian

Abstract: We address the problem of recoverability i.e. deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called ‘Missingness Graphs’ to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we derive conditions that the graph should satisfy to ensure recoverability and devise algorithms to detect the presence of these conditions in the graph. 1

2 0.83098245 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

Author: Myunghwan Kim, Jure Leskovec

Abstract: unkown-abstract

same-paper 3 0.78828716 40 nips-2013-Approximate Inference in Continuous Determinantal Processes

Author: Raja Hafiz Affandi, Emily Fox, Ben Taskar

4 0.69478101 330 nips-2013-Thompson Sampling for 1-Dimensional Exponential Family Bandits

Author: Nathaniel Korda, Emilie Kaufmann, Remi Munos

Abstract: Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for 1-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (through the Jeffreys prior) available in an exponential family. This allow us to give a ﬁnite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families. 1

5 0.68553513 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin

Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we ﬁnd several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1

6 0.68464249 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

7 0.68071592 173 nips-2013-Least Informative Dimensions

8 0.68000865 350 nips-2013-Wavelets on Graphs via Deep Learning

9 0.6799069 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

10 0.67842954 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

11 0.6777072 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

12 0.6775344 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit

13 0.67566359 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions

14 0.67545784 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

15 0.67498803 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

16 0.67467481 201 nips-2013-Multi-Task Bayesian Optimization

17 0.67430657 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

18 0.67402995 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering

19 0.67347199 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models

20 0.67341286 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections