jmlr jmlr2011 jmlr2011-26 knowledge-graph by maker-knowledge-mining

26 jmlr-2011-Distance Dependent Chinese Restaurant Processes


Source: pdf

Author: David M. Blei, Peter I. Frazier

Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation. Keywords: Chinese restaurant processes, Bayesian nonparametrics

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. [sent-8, score-0.567]

2 Each customer sits at a previously occupied table with probability proportional to the number of customers already sitting there, and at a new table with probability proportional to a concentration parameter. [sent-25, score-0.933]

3 In this paper, we develop the distance dependent Chinese restaurant process, a new CRP in which the random seating assignment of the customers depends on the distances between them. [sent-42, score-0.793]

4 3 The key to the distance dependent CRP is that it represents the partition with customer assignments, rather than table assignments. [sent-48, score-0.793]

5 While the traditional CRP connects customers to tables, the distance dependent CRP connects customers to other customers. [sent-49, score-0.935]

6 The partition of the data, that is, the table assignment representation, arises from these customer connections. [sent-50, score-0.55]

7 When used in a Bayesian model, the customer assignment representation allows for a straightforward Gibbs sampling algorithm for approximate posterior inference (see Section 3). [sent-51, score-0.565]

8 Like the distance dependent CRP, this distribution may be constructed through a collection of independent priors on customer assignments to other customers, which then implies a prior on partitions. [sent-57, score-0.834]

9 2462 D ISTANCE D EPENDENT C HINESE R ESTAURANT P ROCESSES sented in Dahl (2008) requires normalization of these customer assignment probabilities. [sent-68, score-0.482]

10 We note that Dahl (2008) does not present an algorithm for sampling from the posterior, but the Gibbs sampler presented here for the distance dependent CRP can also be employed for posterior inference in that model. [sent-70, score-0.494]

11 In general, dependent DPs exhibit marginal invariance while distance dependent CRPs do not. [sent-81, score-0.547]

12 In Section 3 we show how the customer assignment representation allows for an efficient Gibbs sampling algorithm. [sent-96, score-0.509]

13 The process operates at the level of customer assignments, where each customer chooses either another customer or no customer according to Equation (2). [sent-102, score-1.748]

14 Customers that chose not to connect to another are indicated with a self link The table assignments, a representation of the partition that is familiar to the CRP, are derived from the customer assignments. [sent-103, score-0.54]

15 It is described by considering a Chinese restaurant with an infinite number of tables and a sequential process by which customers enter the restaurant and each sit down at a randomly chosen table. [sent-106, score-0.58]

16 In the traditional CRP, the probability of a customer sitting at a table is computed from the number of other customers already sitting at that table. [sent-109, score-0.986]

17 Let zi denote the table assignment of the ith customer, assume that the customers z1:(i−1) occupy K tables, and let nk denote the number of customers sitting at table k. [sent-110, score-0.732]

18 In this distribution, the seating plan probability is described in terms of the probability of a customer sitting with each of the other customers. [sent-116, score-0.58]

19 Illustrated are draws for different decay functions, which are inset: (1) The traditional CRP; (2) The window decay function; (3) The exponential decay function; (4) The logistic decay function. [sent-119, score-0.922]

20 The table assignments are illustrated, which are derived from the customer assignments drawn from the distance dependent CRP. [sent-120, score-0.991]

21 The decay functions (inset) are functions of the distance between the current customer and each previous customer. [sent-121, score-0.75]

22 Let ci denote the ith customer assignment, the index of the customer with whom the ith customer is sitting. [sent-124, score-1.364]

23 Let di j denote the distance measurement between customers i and j, let D denote the set of all distance measurements between customers, and let f be a decay function (described in more detail below). [sent-125, score-0.807]

24 The distance dependent CRP independently draws the customer assignments conditioned on the distance measurements, p(ci = j | D, α) ∝ f (di j ) if α if j=i i = j. [sent-126, score-0.981]

25 (2) Notice the customer assignments do not depend on other customer assignments, only the distances between customers. [sent-127, score-1.072]

26 Also notice that j ranges over the entire set of customers, and so any customer may sit with any other. [sent-128, score-0.508]

27 ) As we mentioned above, customers are assigned to tables by considering sets of customers that are reachable from each other through the customer assignments. [sent-131, score-1.089]

28 ) We denote the induced table assignments z(c), and notice that many configurations of customer assignments c might lead to the same table assignment. [sent-133, score-0.711]

29 Finally, customer assignments can produce a cycle, for example, customer 1 sits with 2 and customer 2 sits with 1. [sent-134, score-1.49]

30 By being defined over customer assignments, the distance dependent CRP provides a more expressive distribution over partitions than models based on table assignments. [sent-136, score-0.809]

31 For example, if each customer is time-stamped, then di j might be the time difference between customers i and j; the decay function can encourage customers to sit with those that are contemporaneous. [sent-138, score-1.34]

32 If each customer is associated with a location in space, then di j might be the Euclidean distance between them; the decay function can encourage customers to sit with those that are in proximity. [sent-139, score-1.179]

33 1 Decay Functions In general, the decay function mediates how distances between customers affect the resulting distribution over partitions. [sent-142, score-0.563]

34 The window decay f (d) = 1[d < a] only considers customers that are at most distance a from the current customer. [sent-145, score-0.632]

35 The exponential decay f (d) = e−d/a decays the probability of linking to an earlier customer exponentially with the distance to the current customer. [sent-146, score-0.796]

36 To see this, consider the marginal distribution of a customer sitting at a particular table, given the previous customers’ assignments. [sent-165, score-0.542]

37 Moreover, the probability of not being assigned to a previous customer is proportional to the scaling parameter α. [sent-168, score-0.472]

38 Figure 2 illustrates seating assignments (at the table level) derived from draws from sequential CRPs with each of the decay functions described above, including the original CRP. [sent-172, score-0.47]

39 ) Compared to the traditional CRP, customers tend to sit at the same table with other nearby customers. [sent-174, score-0.49]

40 3 Marginal Invariance The traditional CRP is marginally invariant: Marginalizing over a particular customer gives the same probability distribution as if that customer were not included in the model at all. [sent-178, score-1.076]

41 The data are first placed at tables via customer assignments, and then assigned to the word associated with their tables. [sent-203, score-0.531]

42 z(c) The notation z(c)i is the table assignment of the ith customer in the table assignments induced by the complete collection of customer assignments. [sent-225, score-1.084]

43 We have not described the process sequentially, as one would with a traditional CRP, in order to emphasize the three stage process of the distance dependent CRP—first the customer assignments and table parameters are drawn, and then the observations are assigned to their corresponding parameter. [sent-235, score-0.981]

44 Regardless of the likelihood model, the posterior will be intractable to compute because the distance dependent CRP places a prior over a combinatorial number of possible customer configurations. [sent-281, score-0.783]

45 For distance dependent CRP models, the state of the chain is defined by ci , the customer assignments for each data point. [sent-286, score-0.887]

46 We will also consider z(c), which are the table assignments that follow from the customer assignments (see Figure 1). [sent-287, score-0.683]

47 This can be thought of as removing the current link from the ith customer and then considering how each alternative new link affects the likelihood of the observations. [sent-296, score-0.545]

48 Before examining this likelihood, we describe how removing and then replacing a customer link affects the underlying partition (i. [sent-297, score-0.531]

49 Here we illustrate a scenario that highlights all the ways that the sampler can move: A table can be split when we remove the customer link before conditioning; and two tables can join when we resample that link. [sent-301, score-0.699]

50 Upon removing ci , the customers at its table are split in two: those customers pointing (directly or indirectly) to i are at one table; the other customers previously seated with i are at a different table. [sent-307, score-0.943]

51 If the ith link is not the only connection between customer i and his table or if ci was a self-link (ci = i) then the tables remain the same. [sent-310, score-0.608]

52 cnew The outer summation is over the customer assignment of the new data point; its prior probability only depends on the distance matrix D. [sent-352, score-0.636]

53 The inner summation is over the posterior customer assignments of the data set; it determines the probability of the new data point conditioned on the previous data and its partition. [sent-353, score-0.602]

54 In this case, the distribution of the data set customer assignments c does not depend on the new data point’s location in time. [sent-356, score-0.546]

55 , the information in D) changes the prior over customer assignments and thus changes the posterior as well. [sent-362, score-0.585]

56 Marginal Invariance In Section 2 we discussed the property of marginal invariance, where removing a customer leaves the partition distribution over the remaining customers unchanged. [sent-366, score-0.813]

57 We mentioned that the traditional CRP is marginally invariant, while the distance dependent CRP does not necessarily have this property. [sent-368, score-0.473]

58 In fact, the traditional CRP is the only distance dependent CRP that is marginally invariant. [sent-369, score-0.473]

59 One can also create a marginally invariant distance dependent CRP by combining several independent copies of the traditional CRP. [sent-372, score-0.513]

60 From this observation, and the lack of marginal invariance of the distance dependent CRP, it follows that the distributions on partitions induced by random-measure models are different from the distance dependent CRP. [sent-385, score-0.723]

61 8 Further, we compared the traditional Gibbs sampler for DP mixtures to the Gibbs sampler for the distance dependent CRP formulation of DP mixtures. [sent-393, score-0.649]

62 We found that the sampler based on customer assignments mixes faster than the traditional sampler. [sent-394, score-0.754]

63 The black line at 0 denotes an equal fit between the traditional CRP and distance dependent CRP, while positive values denote a better fit for the distance dependent CRP. [sent-402, score-0.661]

64 The logistic decay function always provides a better model than the traditional CRP; the exponential decay function provides a better model at certain settings of its parameter. [sent-412, score-0.471]

65 On the NIPS data, the distance dependent CRP outperforms the traditional CRP for the logistic decay with a decay parameter of 2 years. [sent-418, score-0.759]

66 On the NIPS corpus, the logistic decay function with a decay parameter of 2 years outperforms the traditional CRP. [sent-431, score-0.471]

67 We use the window decay function with parameter 1, enforcing that a customer can only link to itself or to another customer that refers to an immediately connected document. [sent-442, score-1.14]

68 The log Bayes factor is 13,062, strongly in favor of the distance dependent CRP, although we emphasize that much of this improvement may occur simply because the distance dependent CRP avoids clustering abstracts from unconnected components of the network. [sent-448, score-0.618]

69 The Gibbs sampler for the distance dependent CRP iteratively samples the customer assignment of each data point, while the collapsed Gibbs sampler iteratively samples the cluster assignment of each data point. [sent-460, score-1.061]

70 The practical difference between the two algorithms is that the distance dependent CRP based sampler can change several customers’ cluster assignments via a single customer assignment. [sent-461, score-0.957]

71 Both samplers have the same limiting distribution because the distance dependent CRP with identity decay is the traditional CRP. [sent-475, score-0.587]

72 These numbers are comparable because the models, and thus the normalizing constant, are the same for both the traditional representation and customer based CRP. [sent-485, score-0.522]

73 Discussion We have developed the distance dependent Chinese restaurant process, a distribution over partitions that accommodates a flexible and non-exchangeable seating assignment distribution. [sent-489, score-0.479]

74 The distance dependent CRP hinges on the customer assignment representation. [sent-490, score-0.77]

75 Then, the probability distribution on clusters induced by this construction is identical to the distance dependent CRP with decay function f (d) = a1[d ∈ A]. [sent-523, score-0.498]

76 Each customer i will be placed in the set containing customer J(i). [sent-533, score-0.874]

77 , BK , the probability of linkage under the distance dependent CRP with decay function f (d) = a1[d ∈ A] may be written  α if i = j,  p(ci = j) ∝ a if j < i and j ∈ Bk(i) ,   0 if j > i or j ∈ Bk(i) . [sent-549, score-0.498]

78 2481 B LEI AND F RAZIER Proposition 2 If the distance dependent CRP for a given decay function f is marginally invariant over all sets of sequential distances then f is of the form f (d) = a1[d ∈ A] for some a > 0 and A / equal to either 0, {0}, or R. [sent-569, score-0.783]

79 Proof Consider a setting with 3 customers, in which customer 2 may either be absent, or present with his seating assignment marginalized out. [sent-570, score-0.522]

80 Suppose that the distance dependent CRP resulting from this f and any collection of sequential distances is marginally invariant. [sent-572, score-0.55]

81 Then the probability that customers 1 and 3 share a table must be the same whether customer 2 is absent or present. [sent-573, score-0.792]

82 If customer 2 is absent, P {1 and 3 sit at same table | 2 absent} = f (d31 ) . [sent-574, score-0.536]

83 f (d31 ) + α (4) If customer 2 is present, customers 1 and 3 may sit at the same table in two different ways: 3 sits with 1 directly (c3 = 1); or 3 sits with 2, and 2 sits with 1 (c3 = 2 and c2 = 1). [sent-575, score-0.922]

84 The distance dependent CRP resulting from this decay function is marginally invariant over all sequential distances if and only if f is of the form / f (d) = a1[d ∈ A] for some a > 0 and some A ∈ {0, {0}, R}. [sent-593, score-0.783]

85 Although Corollary 3 allows any choice of a > 0 in the decay function f (d) = a1[d ∈ A], the distribution of the distance dependent CRP with a particular f and α remains unchanged if both f and α are multiplied by a constant factor (see Equation (2)). [sent-596, score-0.481]

86 The class of distance dependent CRPs that are marginally invariant over this larger class of distances is even more restricted than in the sequential case. [sent-601, score-0.59]

87 Proposition 4 If the distance dependent CRP for a given decay function f is marginally invariant over all sets of distances, both sequential and non-sequential, then f is identically 0. [sent-603, score-0.694]

88 Proof From Proposition 2, we have that any decay function that is marginally invariant under all / sequential distances must be of the form f (d) = a1[d ∈ A], where a > 0 and A ∈ {0, {0}, R}. [sent-604, score-0.495]

89 Under our assumption of marginal invariance, the probability that the first n customers sit at separate tables should be invariant to the absence or presence of customer n + 1. [sent-611, score-0.937]

90 When customer n + 1 is absent, the only way in which the first n customers may sit at separate tables is for each to link to himself. [sent-612, score-0.879]

91 Let pn = α/(α + (n − 1) f (0)) denote the probability of a given customer linking to himself when customer n + 1 is absent. [sent-613, score-0.957]

92 Let pn+1 = α/(α + n f (0)) be the probability of a given customer linking to himself, and let qn+1 = f (0)/(α + n f (0)) be the probability of a given customer linking to some other given customer. [sent-619, score-0.966]

93 Second, all but one of these first n customers may link to himself, with the remaining customer linking to customer n + 1, and customer n + 1 linking either to himself or to the customer that linked to him. [sent-622, score-2.122]

94 As described above, the resulting probability distribution is one in which each customer links to himself, and is thus clustered by himself. [sent-635, score-0.47]

95 Corollary 5 The decay function f = 0 is the only one for which the resulting distance dependent CRP is marginally invariant over all distances, both sequential and non-sequential. [sent-638, score-0.694]

96 Sufficiency follows from the fact that the probability distribution on partitions induced by f = 0 is the one under which each customer is clustered alone almost surely, which is marginally invariant. [sent-640, score-0.626]

97 To sample from the posterior of α given the customer assignments c and data, we begin by noting that α is conditionally independent of the observed data given the customer assignments. [sent-643, score-1.022]

98 Nori=1 malizing provides N 1[ci = i]α + 1[ci = i] f (dici ) α + ∑ j=i f (di j ) i=1 p(c | α) = ∏ ∝α N ∏ K i=1 −1 α + ∑ f (di j ) , j=i where K is the number of self-links ci = i in the customer assignments c. [sent-646, score-0.599]

99 (10) j=i Equation (10) reduces further in the following special case: f is the window decay function, f (d) = 1[d < a]; di j = i − j for i > j; and distances are sequential so di j = ∞ for i < j. [sent-649, score-0.547]

100 In the case of the window decay function with sequential distances and di j = i − j for i > j, we can simplify this further as we did above with Equation (11). [sent-665, score-0.47]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('crp', 0.625), ('customer', 0.437), ('customers', 0.281), ('decay', 0.193), ('crps', 0.169), ('dependent', 0.168), ('sampler', 0.123), ('distance', 0.12), ('assignments', 0.109), ('gibbs', 0.105), ('marginally', 0.1), ('distances', 0.089), ('estaurant', 0.088), ('hinese', 0.088), ('istance', 0.088), ('traditional', 0.085), ('razier', 0.081), ('di', 0.077), ('rocesses', 0.075), ('sequential', 0.073), ('lei', 0.072), ('sit', 0.071), ('sitting', 0.069), ('mixture', 0.068), ('ependent', 0.067), ('dirichlet', 0.059), ('bk', 0.058), ('partitions', 0.056), ('tables', 0.055), ('invariance', 0.055), ('ddp', 0.054), ('ci', 0.053), ('restaurant', 0.05), ('assignment', 0.045), ('chinese', 0.043), ('dp', 0.041), ('partition', 0.04), ('invariant', 0.04), ('seating', 0.04), ('posterior', 0.039), ('blei', 0.039), ('window', 0.038), ('pn', 0.037), ('marginal', 0.036), ('link', 0.035), ('preferences', 0.035), ('sits', 0.035), ('xzk', 0.034), ('dahl', 0.031), ('documents', 0.031), ('mixtures', 0.03), ('linking', 0.029), ('absent', 0.029), ('table', 0.028), ('draws', 0.027), ('sampling', 0.027), ('networked', 0.027), ('xnew', 0.027), ('grif', 0.026), ('sudderth', 0.026), ('clustering', 0.026), ('nearby', 0.025), ('exchangeable', 0.025), ('qn', 0.024), ('document', 0.023), ('friend', 0.023), ('ahmed', 0.023), ('articles', 0.022), ('modeling', 0.021), ('word', 0.021), ('join', 0.021), ('samplers', 0.021), ('goldwater', 0.021), ('concentration', 0.021), ('teh', 0.021), ('bdr', 0.02), ('bnp', 0.02), ('ddcrp', 0.02), ('nyt', 0.02), ('drawn', 0.02), ('bayesian', 0.02), ('nonparametric', 0.02), ('likelihood', 0.019), ('removing', 0.019), ('covariates', 0.019), ('assigned', 0.018), ('collections', 0.018), ('dunson', 0.018), ('cnew', 0.017), ('ritter', 0.017), ('probability', 0.017), ('inference', 0.017), ('reachable', 0.017), ('exchangeability', 0.017), ('exible', 0.016), ('measurements', 0.016), ('spatial', 0.016), ('emphasize', 0.016), ('clustered', 0.016), ('ths', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

Author: David M. Blei, Peter I. Frazier

Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation. Keywords: Chinese restaurant processes, Bayesian nonparametrics

2 0.25519136 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

Author: Thomas L. Griffiths, Zoubin Ghahramani

Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices

3 0.20171735 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

Author: Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that can generically produce power laws, breaking generative models into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes—the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process—that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology. Keywords: nonparametric Bayes, Pitman-Yor process, language model, unsupervised

4 0.086645268 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

Author: Lauren A. Hannah, David M. Blei, Warren B. Powell

Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency

5 0.054153483 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes

Author: Elias Zavitsanos, Georgios Paliouras, George A. Vouros

Abstract: This paper presents hHDP, a hierarchical algorithm for representing a document collection as a hierarchy of latent topics, based on Dirichlet process priors. The hierarchical nature of the algorithm refers to the Bayesian hierarchy that it comprises, as well as to the hierarchy of the latent topics. hHDP relies on nonparametric Bayesian priors and it is able to infer a hierarchy of topics, without making any assumption about the depth of the learned hierarchy and the branching factor at each level. We evaluate the proposed method on real-world data sets in document modeling, as well as in ontology learning, and provide qualitative and quantitative evaluation results, showing that the model is robust, it models accurately the training data set and is able to generalize on held-out data. Keywords: hierarchical Dirichlet processes, probabilistic topic models, topic distributions, ontology learning from text, topic hierarchy

6 0.049028438 61 jmlr-2011-Logistic Stick-Breaking Process

7 0.044985108 16 jmlr-2011-Clustering Algorithms for Chains

8 0.042030513 54 jmlr-2011-Learning Latent Tree Graphical Models

9 0.03583052 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

10 0.033709496 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

11 0.033464372 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

12 0.029890563 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods

13 0.029069828 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

14 0.027786177 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods

15 0.027252756 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

16 0.02540637 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

17 0.024588281 55 jmlr-2011-Learning Multi-modal Similarity

18 0.023005618 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling

19 0.021885281 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach

20 0.021466561 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.159), (1, -0.156), (2, -0.17), (3, 0.016), (4, -0.23), (5, -0.024), (6, 0.075), (7, 0.371), (8, -0.083), (9, -0.028), (10, 0.228), (11, 0.221), (12, -0.116), (13, -0.083), (14, -0.097), (15, 0.05), (16, -0.159), (17, 0.059), (18, -0.007), (19, -0.103), (20, 0.123), (21, -0.098), (22, -0.06), (23, -0.032), (24, 0.041), (25, 0.042), (26, 0.103), (27, -0.07), (28, -0.051), (29, -0.004), (30, -0.096), (31, 0.005), (32, -0.095), (33, 0.068), (34, -0.078), (35, 0.047), (36, -0.024), (37, -0.044), (38, 0.082), (39, 0.059), (40, 0.071), (41, 0.009), (42, -0.006), (43, -0.082), (44, 0.039), (45, -0.013), (46, 0.051), (47, -0.058), (48, -0.04), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97411031 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

Author: David M. Blei, Peter I. Frazier

Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation. Keywords: Chinese restaurant processes, Bayesian nonparametrics

2 0.79510599 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

Author: Thomas L. Griffiths, Zoubin Ghahramani

Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices

3 0.72378552 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

Author: Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that can generically produce power laws, breaking generative models into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes—the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process—that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology. Keywords: nonparametric Bayes, Pitman-Yor process, language model, unsupervised

4 0.3239066 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

Author: Lauren A. Hannah, David M. Blei, Warren B. Powell

Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency

5 0.29888076 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes

Author: Elias Zavitsanos, Georgios Paliouras, George A. Vouros

Abstract: This paper presents hHDP, a hierarchical algorithm for representing a document collection as a hierarchy of latent topics, based on Dirichlet process priors. The hierarchical nature of the algorithm refers to the Bayesian hierarchy that it comprises, as well as to the hierarchy of the latent topics. hHDP relies on nonparametric Bayesian priors and it is able to infer a hierarchy of topics, without making any assumption about the depth of the learned hierarchy and the branching factor at each level. We evaluate the proposed method on real-world data sets in document modeling, as well as in ontology learning, and provide qualitative and quantitative evaluation results, showing that the model is robust, it models accurately the training data set and is able to generalize on held-out data. Keywords: hierarchical Dirichlet processes, probabilistic topic models, topic distributions, ontology learning from text, topic hierarchy

6 0.23218805 61 jmlr-2011-Logistic Stick-Breaking Process

7 0.20452276 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

8 0.19103159 16 jmlr-2011-Clustering Algorithms for Chains

9 0.17663017 54 jmlr-2011-Learning Latent Tree Graphical Models

10 0.14985237 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

11 0.13860099 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

12 0.13259132 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

13 0.12985037 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

14 0.11981595 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

15 0.119417 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

16 0.11458578 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

17 0.11436906 1 jmlr-2011-A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

18 0.10509554 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal

19 0.10456528 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

20 0.10408613 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.046), (9, 0.016), (10, 0.026), (13, 0.012), (24, 0.087), (31, 0.54), (32, 0.015), (41, 0.013), (67, 0.023), (73, 0.021), (78, 0.028), (90, 0.028), (96, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98237169 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

Author: David M. Blei, Peter I. Frazier

Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation. Keywords: Chinese restaurant processes, Bayesian nonparametrics

2 0.97949284 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

Author: Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon

Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled data set. We develop a novel technique for estimating such risks using only unlabeled data and the marginal label distribution. We prove that the proposed risk estimator is consistent on high-dimensional data sets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers with no labeled data whatsoever. Keywords: classification, large margin, maximum likelihood

3 0.95599622 101 jmlr-2011-Variable Sparsity Kernel Learning

Author: Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath, Sankaran Raman

Abstract: paper1 This presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls , s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O m2 ntot log nmax /ε2 where m is no. training data points, nmax , ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency. Keywords: multiple kernel learning, mirror descent, mixed-norm, object categorization, scalability 1. All authors contributed equally. The author names appear in alphabetical order. c 2011 Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath and Sankaran Raman. A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN

4 0.95157355 95 jmlr-2011-Training SVMs Without Offset

Author: Ingo Steinwart, Don Hush, Clint Scovel

Abstract: We develop, analyze, and test a training algorithm for support vector machine classifiers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that significantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs significantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms

5 0.94154114 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis

Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture

6 0.75804597 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal

7 0.73447883 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

8 0.72034091 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

9 0.71631867 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

10 0.71309471 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets

11 0.71231544 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data

12 0.71190709 12 jmlr-2011-Bayesian Co-Training

13 0.70584995 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

14 0.70188189 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

15 0.70028174 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

16 0.69534338 69 jmlr-2011-Neyman-Pearson Classification, Convexity and Stochastic Constraints

17 0.69102257 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

18 0.68494147 36 jmlr-2011-Generalized TD Learning

19 0.68308038 16 jmlr-2011-Clustering Algorithms for Chains

20 0.68098861 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets