nips nips2013 nips2013-277 knowledge-graph by maker-knowledge-mining

277 nips-2013-Restricting exchangeable nonparametric distributions


Source: pdf

Author: Sinead A. Williamson, Steve N. MacEachern, Eric Xing

Abstract: Distributions over matrices with exchangeable rows and infinitely many columns are useful in constructing nonparametric latent variable models. However, the distribution implied by such models over the number of features exhibited by each data point may be poorly-suited for many modeling tasks. In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. Such models allow us to specify the distribution over the number of features per data point, and can achieve better performance on data sets where the number of features is not well-modeled by the original distribution. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Xing Carnegie Mellon University Abstract Distributions over matrices with exchangeable rows and infinitely many columns are useful in constructing nonparametric latent variable models. [sent-4, score-0.647]

2 However, the distribution implied by such models over the number of features exhibited by each data point may be poorly-suited for many modeling tasks. [sent-5, score-0.254]

3 In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. [sent-6, score-0.557]

4 Such models allow us to specify the distribution over the number of features per data point, and can achieve better performance on data sets where the number of features is not well-modeled by the original distribution. [sent-7, score-0.393]

5 1 Introduction The Indian buffet process [IBP, 11] is one of several distributions over matrices with exchangeable rows and infinitely many columns, only a finite (but random) number of which contain any non-zero entries. [sent-8, score-0.705]

6 Such distributions have proved useful for constructing flexible latent feature models that do not require us to specify the number of latent features a priori. [sent-9, score-0.403]

7 In such models, each column of the random matrix corresponds to a latent feature, and each row to a data point. [sent-10, score-0.248]

8 The non-zero elements of a row select the subset of features that contribute to the corresponding data point. [sent-11, score-0.25]

9 Specifically, such priors impose distributions on the number of data points that exhibit a given feature, and on the number of features exhibited by a given data point. [sent-13, score-0.29]

10 For example, in the IBP, the number of features exhibited by a data point is marginally Poisson-distributed, and the probability of a data point exhibiting a previously-observed feature is proportional to the number of times that feature has been seen so far. [sent-14, score-0.378]

11 In certain cases we may instead wish to add constraints on the number of latent features exhibited per data point, particularly in cases where we expect, or desire, the latent features to correspond to interpretable features, or causes, of the data [20]. [sent-18, score-0.668]

12 For example, we might believe that each data point exhibits exactly S features – corresponding perhaps to speakers in a dialog, members of a team, or alleles in a genotype – but be agnostic about the total number of features in our data set. [sent-19, score-0.355]

13 A model that explicitly encodes this prior expectation about the number of features per data point will tend to lead to more interpretable and parsimonious results. [sent-20, score-0.291]

14 1 In this paper, we propose a method for modifying the distribution over the number of non-zero elements per row in arbitrary exchangeable matrices, allowing us to control the number of features per data point in a corresponding latent feature model. [sent-25, score-0.977]

15 We show that our construction yields exchangeable distributions, and present Monte Carlo methods for posterior inference. [sent-26, score-0.493]

16 Our experimental evaluation shows that this approach allows us to incorporate prior beliefs about the number of features per data point into our model, yielding superior modeling performance. [sent-27, score-0.212]

17 , XN ) is exchangeable [see, for example, 1] if its distribution is unchanged under any permutation σ of {1, . [sent-31, score-0.467]

18 is infinitely exchangeable if all of its finite subsequences are exchangeable. [sent-38, score-0.428]

19 In addition, exchangeable models often yield efficient Gibbs samplers. [sent-41, score-0.428]

20 De Finetti’s law tells us that a sequence is exchangeable iff the observations are i. [sent-42, score-0.502]

21 This means that we can write the probability of any exchangeable sequence as P (X1 = x1 , X2 = x2 , . [sent-46, score-0.462]

22 ) = µθ (Xi = xi )ν(dθ) Θ (1) i for some probability distribution ν over parameter space Θ, and some parametrized family {µθ }θ∈Θ of conditional probability distributions. [sent-49, score-0.189]

23 ) to represent the joint distribution over an exchangeable sequence x1 , x2 , . [sent-56, score-0.501]

24 , xn ) to n represent the associated predictive distribution; and p(x1 , . [sent-62, score-0.196]

25 , xn , θ) := i=1 µθ (Xi = xi )ν(θ) to represent the joint distribution over the observations and the parameter θ. [sent-65, score-0.194]

26 1 Distributions over exchangeable matrices The Indian buffet process [IBP, 11] is a distribution over binary matrices with exchangeable rows and infinitely many columns. [sent-67, score-1.161]

27 In the de Finetti representation, the mixing distribution ν is a beta process, the parameter θ is a countably infinite measure with atom sizes πk ∈ (0, 1], and the conditional distribution µθ is a Bernoulli process [17]. [sent-68, score-0.381]

28 The beta process and the Bernoulli process are both completely random measures [CRM, 12] – distributions over random measures on some space Ω that ∞ assign independent masses to disjoint subsets of Ω, that can be written in the form Γ = k=1 πk δφk . [sent-69, score-0.264]

29 We can think of each atom of θ as determining the latent probability for a column of a matrix with infinitely many columns, and the Bernoulli process as sampling binary values for the entries of that column of the matrix. [sent-70, score-0.296]

30 The resulting matrix has a finite number of non-zero entries, with the number of non-zero entries in each row distributed as Poisson(α) and the total number of non-zero columns in N rows distributed as Poisson(αHN ), where HN is the N th harmonic number. [sent-71, score-0.229]

31 The number of rows with a non-zero entry for a given column exhibits a “rich gets richer” property – a new row has a one in a given column with probability proportional to the number of times a one has appeared in that column in the preceding rows. [sent-72, score-0.208]

32 A three-parameter extension to the IBP [15] replaces the beta process with a completely random measure called the stable-beta process, which includes the beta process as a special case. [sent-74, score-0.337]

33 The resulting random matrix exhibits power law behavior: the total number of features exhibited in a data set of size N grows as O(N s ) for some s > 0, and the number of data points exhibiting each feature also follows a power law. [sent-75, score-0.367]

34 The number of features per data point, however, remains Poisson-distributed. [sent-76, score-0.196]

35 The infinite gamma-Poisson process [iGaP, 18] replaces the beta process with a gamma process, and the Bernoulli process with a Poisson process, to give a distribution over non-negative integer-valued matrices with infinitely many columns and exchangeable rows. [sent-77, score-0.798]

36 In this model, the sum of each row is distributed according to a negative binomial distribution, and the number of non-zero entries in each row is Poisson-distributed. [sent-78, score-0.297]

37 The beta-negative binomial process [21] replaces the Bernoulli process with a negative binomial process to get an alternative distribution over non-negative integer-valued matrices. [sent-79, score-0.33]

38 To elaborate on this, note that, marginally, the distribution over the value of each element zik of a row zi of the IBP (or a three-parameter IBP) is given by a Bernoulli distribution. [sent-83, score-0.55]

39 Therefore, by the law of rare events, the sum k zik is distributed according to a Poisson distribution. [sent-84, score-0.371]

40 Marginally, the distribution over whether an element zik is greater than zero is given by a Bernoulli distribution, hence the number of non-zero elements, k zik ∧ 1, is Poisson-distributed. [sent-86, score-0.701]

41 The distribution over the row sum, k zik , will depend on the choice of CRMs. [sent-87, score-0.466]

42 It follows that, if we wish to circumvent the requirement of a Poisson (or mixture of Poisson) number of features per data point in an IBP-like model, we must remove the completely random assumption on either the de Finetti mixing distribution or the family of conditional distributions. [sent-88, score-0.439]

43 The remainder of this section discusses how we can obtain arbitrary marginal distributions over the number of features per row by using conditional distributions that are not completely random. [sent-89, score-0.452]

44 1 Restricting the family of conditional distributions in the de Finetti representation Recall from Section 2 that any exchangeable sequence can be represented as a mixture over some family of conditional distributions. [sent-91, score-0.715]

45 The support of this family determines the support of the exchangeable sequence. [sent-92, score-0.513]

46 We are familiar with the idea of restricting the support of a distribution to a measurable subset. [sent-95, score-0.187]

47 We can restrict the support of an exchangeable distribution by restricting the family of conditional distributions {µθ }θ∈Θ introduced in Equation 1, to obtain an exchangeable distribution on the restricted space. [sent-99, score-1.305]

48 Consider an unrestricted exchangeable model with de Finetti representation N p(x1 , . [sent-101, score-0.513]

49 , obtained by restricting the family of conditional distributions {µθ } to |A {µθ } as described above. [sent-108, score-0.242]

50 , xN ) ∝ I(xN +1 ∈ A) Θ N +1 i=1 µθ (Xi =xi ) N +1 i=1 µθ (Xi ∈A) ν(dθ) (2) is an exchangeable sequence by construction, according to de Finetti’s law. [sent-115, score-0.462]

51 We give three examples of exchangeable matrices where the number of non-zero entries per row is restricted to follow a given distribution. [sent-116, score-0.747]

52 While our focus is on exchangeability of the rows, we note that the following distributions (like their unrestricted counterparts) are invariant under reordering of the columns, and that the resulting matrices are separately exchangeable [2]. [sent-117, score-0.69]

53 The probability of Z given in Equation 3 is the infinite limit of the conditional Bernoulli distribution [6], which describes the distribution of the locations of the successes in such a trial, conditioned on their sum. [sent-125, score-0.174]

54 , zN } under such a distribution is given by: |S µG (Z = Z) = N i=1 ∞ µG (Zi = zi )I( k=1 zik ∧ 1 = S) ∞ (µG ( k=1 Zik ∧ 1 = S))N (4) m = −λ ∞ λk k e k N k=1 zik ! [sent-134, score-0.785]

55 ∞ N i=1 PoiBin(S|{e−λk }∞ )N k=1 zik ∧ 1 = S . [sent-135, score-0.331]

56 Rather than specify the number of non-zero entries in each row a priori, we can allow it to be random, with some arbitrary distribution f (·) over the non-negative integers. [sent-137, score-0.203]

57 A Bernoulli process restricted to have f -marginals can be described as N N |S |f µB i (Zi = zi )f (Si ) = µB (Z) = i=1 ∞ ∞ f (Si )I( k=1 zik = Si ) PoiBin(Si |{πk }∞ ) k=1 i=1 ∞ m πk k (1 − πk )N −mk , (5) k=1 ∞ where Sn = k=1 znk . [sent-138, score-0.538]

58 The IBP has Poisson(α) marginals over the number non-zero elements per row, but the conditional distribution is described by a Poisson-binomial distribution. [sent-146, score-0.174]

59 IBP 1 per row 5 per row 10 per row Uniform{1,. [sent-149, score-0.471]

60 2 Direct restriction of the predictive distributions The construction in Section 3. [sent-154, score-0.226]

61 Since it might be cumbersome to explicitly represent the infinite dimensional 4 object B, it is tempting to consider constructions that directly restrict the predictive distribution p(XN +1 |X1 , . [sent-156, score-0.187]

62 Unfortunately, the distribution over matrices obtained by this approach does not (in general – see the appendix for a counter-example) correspond to the distribution over matrices obtained by restricting the family of conditional distributions. [sent-160, score-0.35]

63 This means it is not appropriate for data sets where we have no explicit ordering of the data, and also means we cannot directly use the predictive distribution to define a Gibbs sampler (as is possible in exchangeable models). [sent-162, score-0.629]

64 Theorem 2 (Sequences obtained by directly restricting the predictive distribution of an exchangeable sequence are not, in general, exchangeable). [sent-163, score-0.718]

65 Let p be the distribution of the unrestricted exchangeable model introduced in the proof of Theorem 1. [sent-164, score-0.552]

66 Let p∗|A be the distribution obtained by directly restricting this unrestricted exchangeable model such that Xi ∈ A, i. [sent-165, score-0.67]

67 We can represent the predictive distribution of such a model using three indexed urns, each containing one red ball (representing a one in the resulting matrix) and one blue ball (representing a zero in the resulting matrix). [sent-179, score-0.268]

68 We generate a sequence of ball sequences by repeatedly picking a ball from each urn, noting the ordered sequence of colors, and returning the balls to their urns, plus one ball of each sampled color. [sent-180, score-0.212]

69 The directly restricted three-urn scheme (and, by extension, the directly restricted IBP) is not exchangeable. [sent-186, score-0.192]

70 Consider the same scheme, but where the outcome is restricted such that there is one, and only one, red ball per sample. [sent-188, score-0.185]

71 By extension, a model obtained by directly restricting the predictive distribution of the IBP is not exchangeable. [sent-191, score-0.256]

72 It is not, however, an appropriate model if we believe our data to be exchangeable, or if we are interested in finding a single, stationary latent distribution describing our data. [sent-193, score-0.194]

73 This exchangeable setting is the focus of this paper, so we defer exploration of distribution of non-exchangeable matrices obtained by restriction of the predictive distribution to future work. [sent-194, score-0.692]

74 4 Inference We focus on models obtained by restricting the IBP to have f -marginals over the number of nonzero elements per row, as described in Example 3. [sent-195, score-0.178]

75 Where f = δS , this approach will fail, since any move that changes zik must change k zik . [sent-206, score-0.662]

76 , S of zi : (j) p(zi (¬j) = k|xi , π, zi (j) ) ∝ πk (1 − πk )−1 g(xi |zi (¬j) = k, zi ). [sent-210, score-0.252]

77 2 Sampling the beta process atoms π Conditioned on Z, the the distribution of π is ν({πk }∞ |Z) k=1 ∝ |f µ{πk } (Z = Z)ν({πk }∞ ) k=1 ∝ K k=1 α (mk + K −1) πk N i=1 (1 − πk )N −mk PoiBin(Si |π) . [sent-214, score-0.204]

78 (11) The Poisson-binomial term can be calculated exactly in O(K k zik ) using either a recursive algorithm [3, 5] or an algorithm based on the characteristic function that uses the Discrete Fourier Transform [8]. [sent-215, score-0.331]

79 Since we believe our posterior will be close to the posterior for the unrestricted model, we use the proposal distribution q(πk |Z) = Beta(α/K + mk , N + 1 − mk ) to propose new values of πk . [sent-218, score-0.523]

80 3 Evaluating the predictive distribution In certain cases, we may wish to directly evaluate the predictive distribution p|f (zN +1 |z1 , . [sent-220, score-0.327]

81 We sample T measures π (t) ∼ ν(π|Z), where ν(π|Z) is the posterior distribution over π in the finite approximation to the IBP, and then weight them to obtain the restricted predictive distribution p|f (zN +1 |z1 , . [sent-226, score-0.289]

82 8 Table 1: Structure error on synthetic data with 100 data points and S features per data point. [sent-248, score-0.262]

83 , zN ), and N µ|f (Z) ∝ π 5 K f (Si )I( k=1 zik = Si ) PoiBin(Si |π) i=1 K m πk k (1 − πk )N −mk . [sent-255, score-0.331]

84 k=1 Experimental evaluation In this paper, we have described how distributions over exchangeable matrices, such as the IBP, can be modified to allow more flexible control over the distributions over the number of latent features. [sent-256, score-0.625]

85 The experiments on real data are designed to show that careful choice of the distribution over the number of latent features in our models can lead to improved predictive performance. [sent-259, score-0.366]

86 1 Synthetic data The IBP has been used to discover latent features that correspond to interpretable phenomena, such as latent causes behind patient symptoms [20]. [sent-261, score-0.377]

87 If we have prior knowledge about the number of latent features per data point – for example, the number of players in a team, or the number of speakers in a conversation – we may expect both better predictive performance, and more interpretable latent features. [sent-262, score-0.581]

88 In this experiment, we evaluate this hypothesis on synthetic data, where the true latent features are known. [sent-263, score-0.228]

89 We modeled the resulting data using an uncollapsed linear Gaussian model, as described in [7], using both the IBP, and the IBP restricted to have S features per row. [sent-266, score-0.289]

90 0 Table 1 shows the structure error obtained using both a standard IBP model (IBP) and an IBP restricted to have the correct number of latent features (rIBP), for varying numbers of features S. [sent-269, score-0.395]

91 Where the number of features per data point is small relative to the total number of features, the restricted model does a much better job at recovering the “correct” latent structure. [sent-273, score-0.381]

92 While the IBP may be able to explain the training data set as well as the restricted model, it will not in general recover the desired latent structure – which is important if we wish to interpret the latent structure. [sent-274, score-0.315]

93 Once the number of features per data point increases beyond half the total number of features, the model is ill-specified – it is more parsimonious to represent features via the absence of a bar. [sent-275, score-0.348]

94 The restricted model – and indeed the IBP – should only be expected to recover easily interpretable features where the number of such features per data point is small relative to the total number of features. [sent-277, score-0.457]

95 In such settings, the IBP is used to directly model the presence or absence of words, and so the matrix Z is observed rather than latent, and the total number of features is given by the vocabulary size. [sent-321, score-0.174]

96 Due to the very large state space, we restricted our samples such that, in a single sample, atoms with the same posterior distribution were assigned the same value. [sent-331, score-0.191]

97 For both models, we estimated the predictive distribution by generating 1000 samples from the posterior of the beta process in the IBP model. [sent-335, score-0.333]

98 For the IBP, we used these samples directly to estimate the predictive distribution; for the restricted model, we used the importance-weighted samples obtained using Equation 12. [sent-336, score-0.229]

99 6 Discussion and future work The framework explored in this paper allows us to relax the distributional assumptions made by existing exchangeable nonparametric processes. [sent-342, score-0.477]

100 Infinite latent feature models and the Indian buffet process. [sent-420, score-0.229]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ibp', 0.618), ('exchangeable', 0.428), ('zik', 0.331), ('mk', 0.154), ('finetti', 0.12), ('poisson', 0.114), ('features', 0.113), ('poibin', 0.111), ('buffet', 0.107), ('predictive', 0.099), ('restricting', 0.098), ('xn', 0.097), ('row', 0.096), ('beta', 0.095), ('latent', 0.093), ('ribp', 0.093), ('bernoulli', 0.09), ('indian', 0.088), ('unrestricted', 0.085), ('zi', 0.084), ('restricted', 0.076), ('igap', 0.074), ('zn', 0.073), ('exchangeability', 0.067), ('exhibited', 0.064), ('per', 0.061), ('binomial', 0.06), ('xi', 0.058), ('interpretable', 0.056), ('conditional', 0.055), ('atom', 0.054), ('distributions', 0.052), ('ball', 0.048), ('process', 0.047), ('marginally', 0.046), ('restriction', 0.046), ('nitely', 0.046), ('entries', 0.045), ('si', 0.044), ('documents', 0.043), ('matrices', 0.041), ('law', 0.04), ('zk', 0.039), ('distribution', 0.039), ('family', 0.037), ('urns', 0.037), ('posterior', 0.036), ('sequence', 0.034), ('ik', 0.032), ('nonparametric', 0.031), ('wish', 0.031), ('ths', 0.03), ('replaces', 0.03), ('rows', 0.03), ('corpora', 0.029), ('truncated', 0.029), ('nite', 0.029), ('construction', 0.029), ('restrict', 0.029), ('feature', 0.029), ('speakers', 0.028), ('countably', 0.027), ('grif', 0.027), ('measurable', 0.026), ('contiguous', 0.025), ('intend', 0.025), ('mixing', 0.025), ('text', 0.024), ('vocabulary', 0.024), ('columns', 0.024), ('support', 0.024), ('atoms', 0.023), ('completely', 0.023), ('parsimonious', 0.023), ('zij', 0.023), ('specify', 0.023), ('gibbs', 0.022), ('data', 0.022), ('synthetic', 0.022), ('exhibits', 0.022), ('exhibiting', 0.021), ('team', 0.021), ('conditioned', 0.021), ('hn', 0.021), ('appropriate', 0.021), ('directly', 0.02), ('column', 0.02), ('successes', 0.02), ('wt', 0.019), ('believe', 0.019), ('elements', 0.019), ('truncation', 0.019), ('distributional', 0.018), ('slice', 0.018), ('matrix', 0.017), ('mixture', 0.017), ('exhibit', 0.017), ('samples', 0.017), ('resulting', 0.017), ('point', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 277 nips-2013-Restricting exchangeable nonparametric distributions

Author: Sinead A. Williamson, Steve N. MacEachern, Eric Xing

Abstract: Distributions over matrices with exchangeable rows and infinitely many columns are useful in constructing nonparametric latent variable models. However, the distribution implied by such models over the number of features exhibited by each data point may be poorly-suited for many modeling tasks. In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. Such models allow us to specify the distribution over the number of features per data point, and can achieve better performance on data sets where the number of features is not well-modeled by the original distribution. 1

2 0.14556846 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

Author: Myunghwan Kim, Jure Leskovec

Abstract: unkown-abstract

3 0.14092654 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

Author: Kohei Hayashi, Ryohei Fujimaki

Abstract: This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has the same form as those of mixture models. FAB/LFMs have several desirable properties (e.g., automatic hidden states selection and parameter identifiability) and empirically perform better than state-of-the-art Indian Buffet processes in terms of model selection, prediction, and computational efficiency. 1

4 0.11314946 252 nips-2013-Predictive PAC Learning and Process Decompositions

Author: Cosma Shalizi, Aryeh Kontorovitch

Abstract: We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. This definition not only matches what a realistic learner might demand, but also allows us to sidestep several otherwise grave problems in learning from dependent data. In particular, we give a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component. We also provide a characterization of mixtures of absolutely regular (β-mixing) processes, of independent probability-theoretic interest. 1

5 0.083835088 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Author: Dahua Lin

Abstract: Reliance on computationally expensive algorithms for inference has been limiting the use of Bayesian nonparametric models in large scale applications. To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. Instead of following the conventional paradigm – random initialization plus iterative update, we take an progressive approach. Starting with a given prior, our method recursively transforms it into an approximate posterior through sequential variational approximation. In this process, new components will be incorporated on the fly when needed. The algorithm can reliably estimate a DP mixture model in one pass, making it particularly suited for applications with massive data. Experiments on both synthetic data and real datasets demonstrate remarkable improvement on efficiency – orders of magnitude speed-up compared to the state-of-the-art. 1

6 0.07342124 217 nips-2013-On Poisson Graphical Models

7 0.070491195 75 nips-2013-Convex Two-Layer Modeling

8 0.065260969 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

9 0.062578417 1 nips-2013-(More) Efficient Reinforcement Learning via Posterior Sampling

10 0.062532172 88 nips-2013-Designed Measurements for Vector Count Data

11 0.061966557 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

12 0.061211135 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

13 0.055979297 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

14 0.052080162 158 nips-2013-Learning Multiple Models via Regularized Weighting

15 0.051194191 294 nips-2013-Similarity Component Analysis

16 0.050843894 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

17 0.0506991 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

18 0.049875267 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models

19 0.048951212 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

20 0.048189498 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.145), (1, 0.054), (2, -0.017), (3, 0.004), (4, 0.008), (5, 0.05), (6, 0.106), (7, 0.014), (8, 0.057), (9, -0.012), (10, 0.001), (11, 0.048), (12, -0.011), (13, -0.003), (14, -0.019), (15, 0.006), (16, 0.042), (17, -0.013), (18, 0.007), (19, -0.029), (20, -0.015), (21, 0.037), (22, -0.025), (23, -0.046), (24, 0.013), (25, 0.01), (26, -0.025), (27, -0.045), (28, -0.059), (29, -0.046), (30, -0.04), (31, -0.035), (32, 0.063), (33, -0.035), (34, -0.088), (35, 0.065), (36, -0.101), (37, -0.014), (38, 0.039), (39, 0.157), (40, -0.04), (41, -0.049), (42, 0.055), (43, -0.068), (44, 0.028), (45, -0.075), (46, 0.112), (47, -0.156), (48, 0.006), (49, -0.171)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91465306 277 nips-2013-Restricting exchangeable nonparametric distributions

Author: Sinead A. Williamson, Steve N. MacEachern, Eric Xing

Abstract: Distributions over matrices with exchangeable rows and infinitely many columns are useful in constructing nonparametric latent variable models. However, the distribution implied by such models over the number of features exhibited by each data point may be poorly-suited for many modeling tasks. In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. Such models allow us to specify the distribution over the number of features per data point, and can achieve better performance on data sets where the number of features is not well-modeled by the original distribution. 1

2 0.61330193 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

Author: Kohei Hayashi, Ryohei Fujimaki

Abstract: This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has the same form as those of mixture models. FAB/LFMs have several desirable properties (e.g., automatic hidden states selection and parameter identifiability) and empirically perform better than state-of-the-art Indian Buffet processes in terms of model selection, prediction, and computational efficiency. 1

3 0.60039127 252 nips-2013-Predictive PAC Learning and Process Decompositions

Author: Cosma Shalizi, Aryeh Kontorovitch

Abstract: We informally call a stochastic process learnable if it admits a generalization error approaching zero in probability for any concept class with finite VC-dimension (IID processes are the simplest example). A mixture of learnable processes need not be learnable itself, and certainly its generalization error need not decay at the same rate. In this paper, we argue that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path. This definition not only matches what a realistic learner might demand, but also allows us to sidestep several otherwise grave problems in learning from dependent data. In particular, we give a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component. We also provide a characterization of mixtures of absolutely regular (β-mixing) processes, of independent probability-theoretic interest. 1

4 0.56326044 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

Author: Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, Michael Jordan

Abstract: Research on distributed machine learning algorithms has focused primarily on one of two extremes—algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this “optimistic concurrency control” paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment. 1

5 0.5404377 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

Author: Myunghwan Kim, Jure Leskovec

Abstract: unkown-abstract

6 0.52576613 294 nips-2013-Similarity Component Analysis

7 0.52182156 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components

8 0.51990306 217 nips-2013-On Poisson Graphical Models

9 0.51126194 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

10 0.47263956 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

11 0.47213778 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

12 0.47089198 320 nips-2013-Summary Statistics for Partitionings and Feature Allocations

13 0.46144375 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

14 0.4561781 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator

15 0.45389742 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

16 0.44340581 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

17 0.44232374 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits

18 0.43894169 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

19 0.424458 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

20 0.41298732 67 nips-2013-Conditional Random Fields via Univariate Exponential Families


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.024), (3, 0.25), (16, 0.061), (33, 0.166), (34, 0.132), (41, 0.018), (49, 0.033), (56, 0.07), (70, 0.02), (85, 0.054), (89, 0.023), (93, 0.047), (95, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84016412 320 nips-2013-Summary Statistics for Partitionings and Feature Allocations

Author: Isik B. Fidaner, Taylan Cemgil

Abstract: Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.

2 0.83251917 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles

Author: Jinwoo Shin, Andrew E. Gelfand, Misha Chertkov

Abstract: Max-product ‘belief propagation’ (BP) is a popular distributed heuristic for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM). It was recently shown that BP converges to the correct MAP assignment for a class of loopy GMs with the following common feature: the Linear Programming (LP) relaxation to the MAP problem is tight (has no integrality gap). Unfortunately, tightness of the LP relaxation does not, in general, guarantee convergence and correctness of the BP algorithm. The failure of BP in such cases motivates reverse engineering a solution – namely, given a tight LP, can we design a ‘good’ BP algorithm. In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs. We prove that the algorithm converges to the correct optimum if the respective LP relaxation, which may include inequalities associated with non-intersecting odd-sized cycles, is tight. The most significant part of our approach is the introduction of a novel graph transformation designed to force convergence of BP. Our theoretical result suggests an efficient BP-based heuristic for the MWM problem, which consists of making sequential, “cutting plane”, modifications to the underlying GM. Our experiments show that this heuristic performs as well as traditional cutting-plane algorithms using LP solvers on MWM problems. 1

same-paper 3 0.80982006 277 nips-2013-Restricting exchangeable nonparametric distributions

Author: Sinead A. Williamson, Steve N. MacEachern, Eric Xing

Abstract: Distributions over matrices with exchangeable rows and infinitely many columns are useful in constructing nonparametric latent variable models. However, the distribution implied by such models over the number of features exhibited by each data point may be poorly-suited for many modeling tasks. In this paper, we propose a class of exchangeable nonparametric priors obtained by restricting the domain of existing models. Such models allow us to specify the distribution over the number of features per data point, and can achieve better performance on data sets where the number of features is not well-modeled by the original distribution. 1

4 0.72817636 1 nips-2013-(More) Efficient Reinforcement Learning via Posterior Sampling

Author: Ian Osband, Dan Russo, Benjamin Van Roy

Abstract: Most provably-efficient reinforcement learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration: posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an √ agent to encode prior knowledge ˜ in a natural way. We establish an O(τ S AT ) bound on expected regret, where T is time, τ is the episode length and S and A are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds. 1

5 0.68552291 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

Author: Kohei Hayashi, Ryohei Fujimaki

Abstract: This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has the same form as those of mixture models. FAB/LFMs have several desirable properties (e.g., automatic hidden states selection and parameter identifiability) and empirically perform better than state-of-the-art Indian Buffet processes in terms of model selection, prediction, and computational efficiency. 1

6 0.68151993 201 nips-2013-Multi-Task Bayesian Optimization

7 0.68054962 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

8 0.68039709 173 nips-2013-Least Informative Dimensions

9 0.6786809 350 nips-2013-Wavelets on Graphs via Deep Learning

10 0.67789078 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

11 0.67637551 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

12 0.67409486 294 nips-2013-Similarity Component Analysis

13 0.67350215 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

14 0.67348498 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions

15 0.673356 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

16 0.67321008 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

17 0.67126566 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

18 0.67125642 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

19 0.67012227 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

20 0.66986775 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits