jmlr jmlr2011 jmlr2011-61 knowledge-graph by maker-knowledge-mining

61 jmlr-2011-Logistic Stick-Breaking Process

Source: pdf

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efﬁcient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. [sent-11, score-0.324]

2 Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. [sent-15, score-0.438]

3 Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation 1. [sent-16, score-0.318]

4 Introduction One is often interested in clustering data that have associated spatial or temporal coordinates. [sent-17, score-0.177]

5 The available spatial or temporal information may be exploited to help infer patterns, clusters or segments in the data. [sent-20, score-0.209]

6 To simplify the exposition, in the following discussion we focus on exploiting spatial information, although when presenting results we also consider temporal data (Fox et al. [sent-21, score-0.178]

7 There have been numerous techniques developed to cluster data, although most of these do not explicitly exploit appended spatial information. [sent-23, score-0.146]

8 These approaches regard the twodimensional (2D) data as an undirected weighted graph, and the segmentation is equivalent to ﬁnding the minimum cut of the graph, minimizing the between-group disassociation while maximizing the within-group association (Shi and Malik, 2000). [sent-25, score-0.283]

9 For each yn , the DP mixture model is represented as yn |θn ∼ F(θn ), iid θn |G ∼ G, G|α0 , G0 ∼ DP(α0 G0 ), where α0 is a non-negative precision parameter and G0 is the base probability measure. [sent-32, score-0.133]

10 The assumption within the DP that the data are exchangeable is generally inappropriate when one wishes to impose knowledge of spatial information (in which each yn has an associated spatial location). [sent-36, score-0.271]

11 For example, the data may be represented as {yn , sn }N , in which yn is again the feature n=1 vector and sn represents the spatial location of yn . [sent-37, score-0.556]

12 Provided with such spatial information, one may wish to explicitly impose the belief that proximate data are more likely to be clustered together. [sent-38, score-0.149]

13 The spatial location sn may be readily considered as an appended feature, and the modiﬁed feature vectors (data) may then be analyzed via traditional clustering algorithms, like those discussed above. [sent-39, score-0.339]

14 These previous studies seek to cluster visual words, with such clustering encouraged if the features are spatially proximate. [sent-42, score-0.171]

15 However, these methods may produce spurious clusters that are introduced to better characterize the spatial data likelihood instead of the likelihood of the features conditionally on spatial location (Park and Dunson, 2009). [sent-43, score-0.261]

16 To address these challenges, and impose spatial information more explicitly, researchers have recently modiﬁed the DP construction to manifest spatial-location dependent stick weights. [sent-45, score-0.15]

17 , 2007) to assign a feature vector to a particular parameter, we introduce a novel non-parametric spatially dependent prior, called the logistic stick-breaking process (LSBP), to impose that it is probable that proximate feature vectors are assigned to the same parameter. [sent-57, score-0.183]

18 The new model is constructed based on a hierarchy of spatial logistic regressions, with sparseness-promoting priors on the regression coefﬁcients. [sent-58, score-0.153]

19 Further, for reasons discussed below, this model favors contiguous segments with sharp boundaries, of interest in many applications. [sent-60, score-0.127]

20 In the ﬁrst example we consider segmentation of multi-person spoken audio data. [sent-80, score-0.377]

21 In addition to inferring a segmentation of each image, the framework allows sorting and searching among the images. [sent-82, score-0.283]

22 Logistic Stick-breaking Process (LSBP) We ﬁrst consider spatially constrained clustering for a single data set (task). [sent-89, score-0.14]

23 Assume N sample points {Dn }n=1,N , where Dn = (yn , sn ), with yn representing the nth feature vector and sn its associated spatial location. [sent-90, score-0.484]

24 Given Zn , data Dn is associated with ∗ ˆ parameter θk if znk = 1 and znk = 0 for k < k. [sent-99, score-0.188]

25 ˆ The Zn are drawn from a spatially dependent density function, encouraging that proximate Dn will have similar Zn , thereby encouraging spatial contiguity. [sent-100, score-0.26]

26 Speciﬁcally, let pk (sn ) deﬁne the probability that znk = 1, with 1 − pk (sn ) representing the probability that znk = 0; the spatial dependence of these density functions is made explicit via sn . [sent-102, score-0.585]

27 The indicator variables controlling allocation to components are then drawn from znk ∼ Bernoulli[σ gk (sn ) ], where σ(g) = 1/[1 + exp(−g)] is the inverse of the logit link in (3). [sent-114, score-0.206]

28 As one choice, one may employ a hierarchical Student-t prior as applied in the relevance vector machine (Tipping, 2001; Bishop and Tipping, 2000; Bishop and Svens´ n, 2003): e wki ∼ N(wki |0, λ−1 )Gamma(λki |a0 , b0 ), ki where shrinkage is encouraged with a0 = b0 = 10−6 (Tipping, 2001). [sent-116, score-0.136]

29 In our numerical experiments on waveform and image segmentation, we have employed the Student-t construction. [sent-122, score-0.162]

30 There are two key components of the LSBP construction: (i) sparseness promotion on the wki , and (ii) the use of a logistic link function to deﬁne space-dependent stick weights. [sent-125, score-0.174]

31 l=1 a0 mk w mk H s mn mk * l t mk z mnk b0 e0 l y mn Nm M f0 Figure 4: Graphical representation of H-LSBP. [sent-128, score-0.564]

32 Given the assigned layer k indicated by zmn , the appearance feature ymn is drawn from the associated atom θt∗ . [sent-137, score-0.142]

33 For the image-processing application, Nc may be set to the number of superpixels used to deﬁne spacedependent image features (discussed in more detail when presenting image-segmentation results in Section 5. [sent-144, score-0.182]

34 The truncation level K on the LSBP may be set to any large value that exceeds the number of anticipated segments in the image, and the model automatically infers the number of segments in the end. [sent-146, score-0.163]

35 , deﬁned by the size of the expected segment sizes relative to the overall image size). [sent-155, score-0.125]

36 In the speciﬁc audio and image segmentation applications discussed below we explicitly deﬁne these parameters, and note that no tuning of these parameters was performed. [sent-156, score-0.469]

37 For the image processing application the observed image feature vectors are quantized, and consequently the observation at any point in the image corresponds to a code index. [sent-161, score-0.276]

38 , 2008) model is developed based on a hybrid variational inference inference algorithm; however, nearly half of the model parameters still need to be estimated via a sampling technique. [sent-170, score-0.158]

39 However, motivated by the goal of fast and relatively accurate inference for large-scale problems, we consider variational Bayesian (VB) inference (Beal, 2003). [sent-173, score-0.158]

40 Variational Bayesian (VB) inference (Beal, 2003) seeks a variational distribution q(Φ) to approximate the true posterior distribution of the latent variables p(Φ). [sent-176, score-0.15]

41 Based on bounding log convex functions, we use a variational bound for the logistic sigmoid function in the form (Bishop and Svens´ n, 2003) e σ(x) ≥ σ(η)exp x−η − f (η)(x2 − η2 ) , 2 (7) where f (η) = tanh(η/2) and η is a variational parameter. [sent-190, score-0.174]

42 Based on the variables zn , the cluster membership of each data Dn corresponding to different ∗ mixture components {θk }K can be speciﬁed as k=1 ξnk = k−1 ∏ (1 − znk ) · znk . [sent-206, score-0.27]

43 ∑τ p(ψk =ψ∗ ) l=1 l We sample the kernel width based on the multinomial distribution from a given discrete set in each iteration, or we can set the kernel width by choosing one with the largest probability component. [sent-231, score-0.134]

44 For the hierarchical logistic stick-breaking process (H-LSBP), we adopt a similar inference technique to that employed for LSBP, with the addition of updating the parameters of the Dirichlet process. [sent-237, score-0.157]

45 Since the spatial relationships are encoded via a kernel distance measure, the model can also be used to segment time-series data. [sent-241, score-0.18]

46 Below we consider three examples: (i) a simple “toy” problem that allows us to compare with related approaches in an easily understood setting, (ii) segmentation of multiple speakers in an audio signal, and (iii) segmentation of images. [sent-242, score-0.66]

47 We then consider joint segmentation of multiple images, with the goal of inferring relationships between images (of interest for image sorting and search). [sent-244, score-0.435]

48 1 Simulation Example In this example the feature vector yn is the intensity value of each pixel, and the pixel location is the spatial information sn . [sent-247, score-0.351]

49 Each observation is assumed to be drawn from a spatially dependent Gaussian mixture (i. [sent-248, score-0.162]

50 Therefore, we ﬁxed the truncation level to K = 10 217 R EN , D U , C ARIN AND D UNSON for all models, and the clustering results are shown in Figure 6, with different colors representing the cluster index (mixture component to which a data sample is assigned). [sent-257, score-0.132]

51 (a) original image, (b) DP, (c) KSBP, (d) LSBP Compared with DP and KSBP, the proposed LSBP shows a much cleaner segmentation in Figure 6(d), as a consequence of the imposed favoring of contiguous segments. [sent-259, score-0.349]

52 5 2 −4 50 100 150 200 250 300 350 400 450 observation index 6 x 10 (a) (b) Figure 7: Original audio waveform, (a), and representation in terms of MFCC features, (b). [sent-271, score-0.125]

53 1) speciﬁed in a temporal (one-dimensional) space, the proposed LSBP is naturally extended to segmentation of sequential data, such as for speaker diarization (Ben et al. [sent-273, score-0.48]

54 Provided with a spoken document consisting of multiple speakers, speaker diarization is the process of segmenting the audio signal into contiguous temporal regions, and within a given region a particular individual is speaking. [sent-276, score-0.357]

55 We assume the acoustic observations at different times are drawn from a Gaussian mixture model (each generating Gaussian ideally corresponds to a speaker ID). [sent-278, score-0.185]

56 , 2008), in which the speech is represented by an HMM with Gaussian state-dependent emissions; to associate a given speaker with a particular state, the states are made to be persistent, or “sticky’, with the state-dependent degree of stickiness also inferred. [sent-282, score-0.134]

57 Figure 7(a) presents the audio waveform with a sampling rate of 16000 Hz. [sent-286, score-0.125]

58 For the sticky HMM, we employed two distinct forms of posterior computation: (i) a VB analysis, which is consistent with the methods employed for the other models; and (ii) a Gibbs sampler, analogous to that employed in the original sticky-HMM paper (Fox et al. [sent-314, score-0.255]

59 To segment the audio data, we labeled each observation to the index of the cluster with the largest probability value, and the results are shown in Figure 8 (here the sticky-HMM results were computed via VB analysis). [sent-329, score-0.189]

60 From the results in Figure 8, the proposed LSBP yields the best segmentation performance, with results in close agreement with ground truth. [sent-331, score-0.321]

61 220 L OGISTIC S TICK -B REAKING P ROCESS While the sticky HMM did not yield reliable VB-computed results, it performed well when a Gibbs sampler was employed (as in the work Fox et al. [sent-338, score-0.14]

62 Therefore, we ﬁxed the truncation level to K = 10 for all models, and the clustering results are shown in Figure 6, with different colors representing the cluster index (mixture component to which a data sample is assigned). [sent-348, score-0.132]

63 Note that only three sticks have appreciable probability for any time t, and the segments tend to be localized, with near-unity probability of using a corresponding model parameter within a given segment. [sent-358, score-0.131]

64 3 Image Segmentation with LSBP The images considered ﬁrst are from Microsoft Research Cambridge3 and each image has 320×213 pixels. [sent-361, score-0.152]

65 To apply the hierarchical model to image segmentation, we ﬁrst over-segment each image into 1, 000 “superpixels”, which are local, coherent and preserve most of the structure necessary for segmentation at the scale of interest (Ren and Malik, 2003). [sent-362, score-0.502]

66 (a) wk , (b) gk (t) , (c) σk (t), (d) πk (t) is described in Mori (2005), and can be downloaded at http://fas. [sent-376, score-0.173]

67 To perform segmentation at the patch level (each superpixel corresponds to one patch), the center of each superpixel is recorded as the location coordinate sn . [sent-386, score-0.618]

68 The segmentation task now reduces to grouping/clustering the superpixels based on the associated image feature vector and associated spatial information. [sent-394, score-0.55]

69 The segmentation performance for each of these images is shown in Figure 11(g), (h) and (i), using respectively K = 4, 3 and 6, based on the model evidence (discussed further below). [sent-396, score-0.374]

70 In Figure 11(j), (k) and (l), the segmentation results are shown with K ﬁxed at K = 10. [sent-398, score-0.283]

71 In this case the LSBP has ten sticks; however, based on the segmentation there are a subset of sticks (5, 8 and 7, respectively) inferred to have appreciable amplitude. [sent-399, score-0.353]

72 For the relatively simple “chimney” image, the segmentation results are very similar with different initializations of K (Figure 11(g)) and simply truncating the sticks at a “large” value (Figure 11(j) with K = 10). [sent-404, score-0.353]

73 We again observe homogeneous contiguous segments with sharp boundaries. [sent-406, score-0.127]

74 In this case a smaller K yields (as expected) a simpler segmentation (Figure 11(h)). [sent-407, score-0.283]

75 (a)VB iteration lowerbound for image “chimney” with K = 4; (b) Approximating the model evidence as a function of K for image “chimney”. [sent-425, score-0.215]

76 To further evaluate the performance of LSBP for image segmentation, we also consider several other state-of-art methods, including two other non-parametric statistical models: the Dirichlet process (DP) (Sethuraman, 1994) and the kernel stick-breaking process (KSBP) (An et al. [sent-426, score-0.124]

77 , 2007), and also spatially varying mixture segmentation with edge preservation (St. [sent-432, score-0.445]

78 We consider the same data source as in the previous examples, but for the next set of results segmentation “ground truth” was provided with the data. [sent-435, score-0.283]

79 Figure 13 shows typical segmentation results for the different algorithms. [sent-438, score-0.283]

80 -svgm) favors a more contiguous segmentation for the texture region, preserving edges; this may make a good tradeoff between keeping coherence and capturing details, but the segmentation performance is degraded by redundant boundaries, such as those within the goose body. [sent-444, score-0.667]

81 Among the Bayesian methods (DP, KSBP and LSBP), LSBP tends to yield better segmentation, characterized by homogeneous segmentation regions and sharp segment boundaries. [sent-446, score-0.316]

82 To quantify segmentation results, we also calculated the Rand Index (RI) (Unnikrishnan et al. [sent-447, score-0.283]

83 , 2007) and the Variation of Information (VoI) (Meil˘ , 2003), using segmentation “truth” provided a with the data. [sent-448, score-0.283]

84 RI measures consistency between two segmentation labels via an overlapping fraction, and VoI roughly calculates the amount of randomness that exists in one segmentation that is not explained by the other. [sent-449, score-0.566]

85 We calculated the average RI and VoI values of the 225 R EN , D U , C ARIN AND D UNSON original image ground truth Normalized li d cuts Multi-scale Ncut Stu. [sent-451, score-0.2]

86 From top to down, each row shows: the original image, the image ground truth, normalized cuts, multiscale Ncut, Student-t distributions mixture model (Stu. [sent-454, score-0.219]

87 4 These images have size 481× 321 pixels, and we also over-segmented each image into 1000 superpixels. [sent-467, score-0.152]

88 By a visual evaluation of the segmentation results (see Figure 15), multi-scale Ncut is not as good as the other methods when the segments are of irregular shape and unequal size. [sent-599, score-0.344]

89 The purpose of this section was to demonstrate that LSBP yields competitive segmentation performance, compared with many state-of-the-art algorithms. [sent-600, score-0.283]

90 It should be emphasized that there is no perfect way of quantifying segmentation performance, especially since the underlying “truth” is itself subjective. [sent-601, score-0.283]

91 An important advantage of the Bayesian methods (DP, KSBP and LSBP) is that they may be readily extended to joint segmentation of multiple images, considered in the next section. [sent-602, score-0.283]

92 4 Joint Image Segmentation with H-LSBP In this section we consider H-LSBP for joint segmentation of multiple images. [sent-604, score-0.283]

93 The same feature and image processing techniques are employed as above. [sent-607, score-0.131]

94 6591 Table 3: Different segmentation methods compared on Berkeley 300 images data set. [sent-746, score-0.343]

95 The probability that the nth superpixel within image m is associated with the kth hidden indicator variable tmk , is represented as pk (smn ) = σ(gk (smn )) ∏l < ξmn,k′ >q(zmn ) < tmk′ ,l > ymni , ′ where α0i = 1/I for i = 1, . [sent-748, score-0.392]

96 , K − 1, then K ∑ (−1)ν kk′ hmnk = k′ =k < ξ−k ′ >q(z−k ) mn,k mn L ∑ q(tmk l ′ =1 ′ ˜ mk mn = l ′ ) < logp(y|θl ) >q(θl ) + mT xk , where < ξ−k ′ >z−k = ∏K−1 j=k ρmn, j (−1)ν jk′ + ν jk′ is the expectation associated the gating varimn,k j=1, mn ables {zmn1 , . [sent-768, score-0.371]

97 deﬁned as xmn mn ˆ mi mk i=1 and f (ηmnk ) = xk mn Bishop and Tipping, 2000). [sent-780, score-0.252]

98 Spatially coherent latent topic model for concurrent object segmentation and classiﬁcation. [sent-878, score-0.283]

99 Comparative evaluation of various MFCC implementations on the speaker veriﬁcation task. [sent-938, score-0.134]

100 Shared segmentation of natural scenes using dependent Pitman-Yor processes. [sent-1052, score-0.283]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lsbp', 0.584), ('segmentation', 0.283), ('ksbp', 0.231), ('tmk', 0.171), ('sn', 0.164), ('speaker', 0.134), ('vb', 0.122), ('arin', 0.121), ('reaking', 0.121), ('tick', 0.121), ('unson', 0.121), ('spatial', 0.115), ('gk', 0.112), ('spatially', 0.111), ('voi', 0.111), ('zmn', 0.111), ('sticky', 0.101), ('wki', 0.101), ('znk', 0.094), ('audio', 0.094), ('ogistic', 0.092), ('image', 0.092), ('dp', 0.088), ('mk', 0.084), ('mn', 0.084), ('nc', 0.075), ('chimney', 0.07), ('sticks', 0.07), ('superpixel', 0.07), ('rocess', 0.07), ('variational', 0.068), ('contiguous', 0.066), ('nk', 0.065), ('median', 0.061), ('wk', 0.061), ('segments', 0.061), ('mnk', 0.06), ('ncut', 0.06), ('superpixels', 0.06), ('images', 0.06), ('dirichlet', 0.059), ('pk', 0.059), ('hmm', 0.053), ('dunson', 0.053), ('beal', 0.053), ('bishop', 0.052), ('svens', 0.051), ('mixture', 0.051), ('cows', 0.05), ('logp', 0.05), ('ncuts', 0.05), ('mm', 0.05), ('vk', 0.05), ('fox', 0.05), ('inference', 0.045), ('ri', 0.043), ('mult', 0.043), ('yn', 0.041), ('truncation', 0.041), ('kas', 0.04), ('pantofaru', 0.04), ('shi', 0.04), ('en', 0.039), ('employed', 0.039), ('multiscale', 0.038), ('gibbs', 0.038), ('logistic', 0.038), ('ground', 0.038), ('cuts', 0.037), ('duke', 0.037), ('posterior', 0.037), ('hierarchical', 0.035), ('gating', 0.035), ('texture', 0.035), ('stick', 0.035), ('width', 0.035), ('proximate', 0.034), ('malik', 0.034), ('dn', 0.034), ('temporal', 0.033), ('truth', 0.033), ('tipping', 0.033), ('segment', 0.033), ('kernel', 0.032), ('location', 0.031), ('index', 0.031), ('atoms', 0.031), ('waveform', 0.031), ('sudderth', 0.031), ('atom', 0.031), ('evidence', 0.031), ('cluster', 0.031), ('presenting', 0.03), ('diarization', 0.03), ('mki', 0.03), ('mmk', 0.03), ('smn', 0.03), ('umk', 0.03), ('wmk', 0.03), ('clustering', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

2 0.092916414 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

Author: Thomas L. Griffiths, Zoubin Ghahramani

Abstract: The Indian buffet process is a stochastic process deﬁning a probability distribution over equivalence classes of sparse binary matrices with a ﬁnite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially inﬁnite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an inﬁnite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices

3 0.082468539 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

Author: Lauren A. Hannah, David M. Blei, Warren B. Powell

Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency

4 0.069298014 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the modelĂ˘€™s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication

5 0.058324516 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

Author: Adam D. Bull

Abstract: In the efﬁcient global optimization problem, we minimize an unknown function f , using as few observations f (x) as possible. It can be considered a continuum-armed-bandit problem, with noiseless data, and simple regret. Expected-improvement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussian-process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is ﬁxed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modiﬁed algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never ﬁnd the minimum of f . We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a ﬁxed prior. Keywords: convergence rates, efﬁcient global optimization, expected improvement, continuumarmed bandit, Bayesian optimization

6 0.056203768 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

7 0.05371765 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning

8 0.049028438 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

9 0.046884716 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

10 0.039527886 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

11 0.038514648 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes

12 0.038390394 58 jmlr-2011-Learning from Partial Labels

13 0.03691192 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

14 0.036898836 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

15 0.03688452 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

16 0.033912983 55 jmlr-2011-Learning Multi-modal Similarity

17 0.03362073 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models

18 0.032974675 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

19 0.032740518 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

20 0.032105222 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.184), (1, -0.106), (2, -0.068), (3, 0.012), (4, -0.083), (5, -0.026), (6, 0.008), (7, 0.128), (8, -0.057), (9, -0.02), (10, 0.093), (11, 0.075), (12, 0.096), (13, 0.042), (14, 0.076), (15, -0.014), (16, -0.002), (17, -0.031), (18, -0.138), (19, 0.04), (20, 0.042), (21, -0.09), (22, 0.05), (23, 0.136), (24, -0.042), (25, 0.087), (26, 0.094), (27, 0.052), (28, 0.084), (29, 0.063), (30, 0.018), (31, -0.052), (32, 0.202), (33, -0.103), (34, 0.182), (35, 0.082), (36, -0.105), (37, 0.114), (38, -0.155), (39, -0.015), (40, -0.15), (41, 0.012), (42, 0.007), (43, -0.013), (44, -0.327), (45, -0.106), (46, -0.042), (47, 0.161), (48, 0.162), (49, 0.077)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94007027 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

2 0.40758207 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

Author: Lauren A. Hannah, David M. Blei, Warren B. Powell

3 0.40555999 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

4 0.36530235 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

Author: Thomas L. Griffiths, Zoubin Ghahramani

5 0.34189194 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

Author: Lars Omlor, Martin A. Giese

Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If speciﬁc invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The ﬁrst type of subproblem can be solved by existing algorithms, for example by an appropriate modiﬁcation of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum

6 0.32577053 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

7 0.29449672 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

8 0.2914966 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

9 0.27851015 94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization

10 0.27150568 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors

11 0.24818532 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning

12 0.23745438 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

13 0.23208205 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

14 0.22214992 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes

15 0.22148156 58 jmlr-2011-Learning from Partial Labels

16 0.21322571 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

17 0.20277384 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

18 0.2006304 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

19 0.2002569 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes

20 0.19224244 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.036), (9, 0.024), (10, 0.062), (13, 0.424), (24, 0.073), (31, 0.085), (32, 0.031), (41, 0.02), (60, 0.014), (65, 0.014), (71, 0.017), (73, 0.034), (78, 0.05), (90, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98456728 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

Author: Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, Christian Buchta

Abstract: This paper describes the ecosystem of R add-on packages developed around the infrastructure provided by the package arules. The packages provide comprehensive functionality for analyzing interesting patterns including frequent itemsets, association rules, frequent sequences and for building applications like associative classiﬁcation. After discussing the ecosystem’s design we illustrate the ease of mining and visualizing rules with a short example. Keywords: frequent itemsets, association rules, frequent sequences, visualization 1. Overview Mining frequent itemsets and association rules is a popular and well researched method for discovering interesting relations between variables in large databases. Association rules are used in many applications and have become prominent as an important exploratory method for uncovering cross-selling opportunities in large retail databases. Agrawal et al. (1993) introduced the problem of mining association rules from transaction data as follows: Let I = {i1 , i2 , . . . , in } be a set of n binary attributes called items. Let D = {t1 ,t2 , . . . ,tm } be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is deﬁned as an implication of the form X ⇒ Y where / X,Y ⊆ I and X ∩ Y = 0 are called itemsets. On itemsets and rules several quality measures can be deﬁned. The most important measures are support and conﬁdence. The support supp(X) of an itemset X is deﬁned as the proportion of transactions in the data set which contain the itemset. Itemsets with a support which surpasses a user deﬁned threshold σ are called frequent itemsets. The conﬁdence of a rule is deﬁned as conf(X ⇒ Y ) = supp(X ∪Y )/supp(X). Association rules are rules with supp(X ∪Y ) ≥ σ and conf(X) ≥ δ where σ and δ are user deﬁned thresholds. ©2011 Michael Hahsler, Sudheer Chelluboina, Kurt Hornik and Christian Buchta. H AHSLER , C HELLUBOINA , H ORNIK AND B UCHTA Figure 1: The arules ecosystem. The R package arules (Hahsler et al., 2005, 2010) implements the basic infrastructure for creating and manipulating transaction databases and basic algorithms to efﬁciently ﬁnd and analyze association rules. Over the last ﬁve years several packages were built around the arules infrastructure to create the ecosystem shown in Figure 1. Compared to other tools, the arules ecosystem is fully integrated, implements the latest approaches and has the vast functionality of R for further analysis of found patterns at its disposal. 2. Design and Implementation The core package arules provides an object-oriented framework to represent transaction databases and patterns. To facilitate extensibility, patterns are implemented as an abstract superclass associations and then concrete subclasses implement individual types of patterns. In arules the associations itemsets and rules are provided. Databases and associations both use a sparse matrix representation for efﬁcient storage and basic operations like sorting, subsetting and matching are supported. Different aspects of arules were discussed in previous publications (Hahsler et al., 2005; Hahsler and Hornik, 2007b,a; Hahsler et al., 2008). In this paper we focus on the ecosystem of several R-packages which are built on top of the arules infrastructure. While arules provides Apriori and Eclat (implementations by Borgelt, 2003), two of the most important frequent itemset/association rule mining algorithms, additional algorithms can easily be added as new packages. For example, package arulesNBMiner (Hahsler, 2010) implements an algorithm to ﬁnd NB-frequent itemsets (Hahsler, 2006). A collection of further implementations which could be interfaced by arules in the future and a comparison of state-of-the-art algorithms can be found at the Frequent Itemset Mining Implementations Repository.1 arulesSequences (Buchta and Hahsler, 2010) implements mining frequent sequences in transaction databases. It implements additional association classes called sequences and sequencerules and provides the algorithm cSpade (Zaki, 2001) to efﬁciently mine frequent sequences. Another application currently under development is arulesClassify which uses the arules infrastructure to implement rule-based classiﬁers, including Classiﬁcation Based on Association rules (CBA, Liu et al., 1998) and general associative classiﬁcation techniques (Jalali-Heravi and Zaïane, 2010). A known drawback of mining for frequent patterns such as association rules is that typically the algorithm returns a very large set of results where only a small fraction of patterns is of interest to the analysts. Many researchers introduced visualization techniques including scatter plots, matrix 1. The Frequent Itemset Mining Implementations Repository can be found at http://fimi.ua.ac.be/. 2022 T HE ARULES R-PACKAGE E COSYSTEM Graph for 3 rules Scatter plot for 410 rules 1 size: support (0.001 − 0.0019) color: lift (8.3404 − 11.2353) red/blush wine 10 citrus fruit soda 0.95 confidence liquor 8 bottled beer fruit/vegetable juice 0.9 other vegetables 6 root vegetables 0.85 oil 4 0.8 0.001 0.0015 0.002 0.0025 0.003 whole milk lift yogurt tropical fruit support (a) (b) Figure 2: Visualization of all 410 rules as (a) a scatter plot and (b) shows the top 3 rules according to lift as a graph. visualizations, graphs, mosaic plots and parallel coordinates plots to analyze large sets of association rules (see Bruzzese and Davino, 2008, for a recent overview paper). arulesViz (Hahsler and Chelluboina, 2010) implements most of these methods for arules while also providing improvements using color shading, reordering and interactive features. Finally, arules provides a Predictive Model Markup Language (PMML) interface to import and export rules via package pmml (Williams et al., 2010). PMML is the leading standard for exchanging statistical and data mining models and is supported by all major solution providers. Although pmml provides interfaces for different packages it is still considered part of the arules ecosystem. The packages in the described ecosystem are available for Linux, OS X and Windows. All packages are distributed via the Comprehensive R Archive Network2 under GPL-2, along with comprehensive manuals, documentation, regression tests and source code. Development versions of most packages are available from R-Forge.3 3. User Interface We illustrate the user interface and the interaction between the packages in the arules ecosystem with a small example using a retail data set called Groceries which contains 9835 transactions with items aggregated to 169 categories. We mine association rules and then present the rules found as well as the top 3 rules according to the interest measure lift (deviation from independence) in two visualizations. > > > > library(

same-paper 2 0.76533419 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

3 0.31116417 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

Author: Ricardo Henao, Ole Winther

Abstract: In this paper we consider sparse and identiﬁable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efﬁcient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identiﬁable Multivariate modeling), is validated and bench-marked on artiﬁcial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identiﬁability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identiﬁable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identiﬁability, factor models, linear Bayesian networks

4 0.3101972 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets

Author: Chiwoo Park, Jianhua Z. Huang, Yu Ding

Abstract: Gaussian process regression is a ﬂexible and powerful tool for machine learning, but the high computational complexity hinders its broader applications. In this paper, we propose a new approach for fast computation of Gaussian process regression with a focus on large spatial data sets. The approach decomposes the domain of a regression function into small subdomains and infers a local piece of the regression function for each subdomain. We explicitly address the mismatch problem of the local pieces on the boundaries of neighboring subdomains by imposing continuity constraints. The new approach has comparable or better computation complexity as other competing methods, but it is easier to be parallelized for faster computation. Moreover, the method can be adaptive to non-stationary features because of its local nature and, in particular, its use of different hyperparameters of the covariance function for different local regions. We illustrate application of the method and demonstrate its advantages over existing methods using two synthetic data sets and two real spatial data sets. Keywords: domain decomposition, boundary value problem, Gaussian process regression, parallel computation, spatial prediction

5 0.30407557 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar

Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve signiﬁcant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.

6 0.30373102 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

7 0.30202171 16 jmlr-2011-Clustering Algorithms for Chains

8 0.30031899 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

9 0.29914477 48 jmlr-2011-Kernel Analysis of Deep Networks

10 0.29732704 12 jmlr-2011-Bayesian Co-Training

11 0.29656062 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

12 0.29615358 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

13 0.29566517 58 jmlr-2011-Learning from Partial Labels

14 0.2952182 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

15 0.2948454 28 jmlr-2011-Double Updating Online Learning

16 0.29452887 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

17 0.29341134 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

18 0.28953898 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

19 0.28839236 66 jmlr-2011-Multiple Kernel Learning Algorithms

20 0.28745353 29 jmlr-2011-Efficient Learning with Partially Observed Attributes