jmlr jmlr2011 jmlr2011-61 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
Reference: text
sentIndex sentText sentNum sentScore
1 The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. [sent-11, score-0.324]
2 Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. [sent-15, score-0.438]
3 Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation 1. [sent-16, score-0.318]
4 Introduction One is often interested in clustering data that have associated spatial or temporal coordinates. [sent-17, score-0.177]
5 The available spatial or temporal information may be exploited to help infer patterns, clusters or segments in the data. [sent-20, score-0.209]
6 To simplify the exposition, in the following discussion we focus on exploiting spatial information, although when presenting results we also consider temporal data (Fox et al. [sent-21, score-0.178]
7 There have been numerous techniques developed to cluster data, although most of these do not explicitly exploit appended spatial information. [sent-23, score-0.146]
8 These approaches regard the twodimensional (2D) data as an undirected weighted graph, and the segmentation is equivalent to finding the minimum cut of the graph, minimizing the between-group disassociation while maximizing the within-group association (Shi and Malik, 2000). [sent-25, score-0.283]
9 For each yn , the DP mixture model is represented as yn |θn ∼ F(θn ), iid θn |G ∼ G, G|α0 , G0 ∼ DP(α0 G0 ), where α0 is a non-negative precision parameter and G0 is the base probability measure. [sent-32, score-0.133]
10 The assumption within the DP that the data are exchangeable is generally inappropriate when one wishes to impose knowledge of spatial information (in which each yn has an associated spatial location). [sent-36, score-0.271]
11 For example, the data may be represented as {yn , sn }N , in which yn is again the feature n=1 vector and sn represents the spatial location of yn . [sent-37, score-0.556]
12 Provided with such spatial information, one may wish to explicitly impose the belief that proximate data are more likely to be clustered together. [sent-38, score-0.149]
13 The spatial location sn may be readily considered as an appended feature, and the modified feature vectors (data) may then be analyzed via traditional clustering algorithms, like those discussed above. [sent-39, score-0.339]
14 These previous studies seek to cluster visual words, with such clustering encouraged if the features are spatially proximate. [sent-42, score-0.171]
15 However, these methods may produce spurious clusters that are introduced to better characterize the spatial data likelihood instead of the likelihood of the features conditionally on spatial location (Park and Dunson, 2009). [sent-43, score-0.261]
16 To address these challenges, and impose spatial information more explicitly, researchers have recently modified the DP construction to manifest spatial-location dependent stick weights. [sent-45, score-0.15]
17 , 2007) to assign a feature vector to a particular parameter, we introduce a novel non-parametric spatially dependent prior, called the logistic stick-breaking process (LSBP), to impose that it is probable that proximate feature vectors are assigned to the same parameter. [sent-57, score-0.183]
18 The new model is constructed based on a hierarchy of spatial logistic regressions, with sparseness-promoting priors on the regression coefficients. [sent-58, score-0.153]
19 Further, for reasons discussed below, this model favors contiguous segments with sharp boundaries, of interest in many applications. [sent-60, score-0.127]
20 In the first example we consider segmentation of multi-person spoken audio data. [sent-80, score-0.377]
21 In addition to inferring a segmentation of each image, the framework allows sorting and searching among the images. [sent-82, score-0.283]
22 Logistic Stick-breaking Process (LSBP) We first consider spatially constrained clustering for a single data set (task). [sent-89, score-0.14]
23 Assume N sample points {Dn }n=1,N , where Dn = (yn , sn ), with yn representing the nth feature vector and sn its associated spatial location. [sent-90, score-0.484]
24 Given Zn , data Dn is associated with ∗ ˆ parameter θk if znk = 1 and znk = 0 for k < k. [sent-99, score-0.188]
25 ˆ The Zn are drawn from a spatially dependent density function, encouraging that proximate Dn will have similar Zn , thereby encouraging spatial contiguity. [sent-100, score-0.26]
26 Specifically, let pk (sn ) define the probability that znk = 1, with 1 − pk (sn ) representing the probability that znk = 0; the spatial dependence of these density functions is made explicit via sn . [sent-102, score-0.585]
27 The indicator variables controlling allocation to components are then drawn from znk ∼ Bernoulli[σ gk (sn ) ], where σ(g) = 1/[1 + exp(−g)] is the inverse of the logit link in (3). [sent-114, score-0.206]
28 As one choice, one may employ a hierarchical Student-t prior as applied in the relevance vector machine (Tipping, 2001; Bishop and Tipping, 2000; Bishop and Svens´ n, 2003): e wki ∼ N(wki |0, λ−1 )Gamma(λki |a0 , b0 ), ki where shrinkage is encouraged with a0 = b0 = 10−6 (Tipping, 2001). [sent-116, score-0.136]
29 In our numerical experiments on waveform and image segmentation, we have employed the Student-t construction. [sent-122, score-0.162]
30 There are two key components of the LSBP construction: (i) sparseness promotion on the wki , and (ii) the use of a logistic link function to define space-dependent stick weights. [sent-125, score-0.174]
31 l=1 a0 mk w mk H s mn mk * l t mk z mnk b0 e0 l y mn Nm M f0 Figure 4: Graphical representation of H-LSBP. [sent-128, score-0.564]
32 Given the assigned layer k indicated by zmn , the appearance feature ymn is drawn from the associated atom θt∗ . [sent-137, score-0.142]
33 For the image-processing application, Nc may be set to the number of superpixels used to define spacedependent image features (discussed in more detail when presenting image-segmentation results in Section 5. [sent-144, score-0.182]
34 The truncation level K on the LSBP may be set to any large value that exceeds the number of anticipated segments in the image, and the model automatically infers the number of segments in the end. [sent-146, score-0.163]
35 , defined by the size of the expected segment sizes relative to the overall image size). [sent-155, score-0.125]
36 In the specific audio and image segmentation applications discussed below we explicitly define these parameters, and note that no tuning of these parameters was performed. [sent-156, score-0.469]
37 For the image processing application the observed image feature vectors are quantized, and consequently the observation at any point in the image corresponds to a code index. [sent-161, score-0.276]
38 , 2008) model is developed based on a hybrid variational inference inference algorithm; however, nearly half of the model parameters still need to be estimated via a sampling technique. [sent-170, score-0.158]
39 However, motivated by the goal of fast and relatively accurate inference for large-scale problems, we consider variational Bayesian (VB) inference (Beal, 2003). [sent-173, score-0.158]
40 Variational Bayesian (VB) inference (Beal, 2003) seeks a variational distribution q(Φ) to approximate the true posterior distribution of the latent variables p(Φ). [sent-176, score-0.15]
41 Based on bounding log convex functions, we use a variational bound for the logistic sigmoid function in the form (Bishop and Svens´ n, 2003) e σ(x) ≥ σ(η)exp x−η − f (η)(x2 − η2 ) , 2 (7) where f (η) = tanh(η/2) and η is a variational parameter. [sent-190, score-0.174]
42 Based on the variables zn , the cluster membership of each data Dn corresponding to different ∗ mixture components {θk }K can be specified as k=1 ξnk = k−1 ∏ (1 − znk ) · znk . [sent-206, score-0.27]
43 ∑τ p(ψk =ψ∗ ) l=1 l We sample the kernel width based on the multinomial distribution from a given discrete set in each iteration, or we can set the kernel width by choosing one with the largest probability component. [sent-231, score-0.134]
44 For the hierarchical logistic stick-breaking process (H-LSBP), we adopt a similar inference technique to that employed for LSBP, with the addition of updating the parameters of the Dirichlet process. [sent-237, score-0.157]
45 Since the spatial relationships are encoded via a kernel distance measure, the model can also be used to segment time-series data. [sent-241, score-0.18]
46 Below we consider three examples: (i) a simple “toy” problem that allows us to compare with related approaches in an easily understood setting, (ii) segmentation of multiple speakers in an audio signal, and (iii) segmentation of images. [sent-242, score-0.66]
47 We then consider joint segmentation of multiple images, with the goal of inferring relationships between images (of interest for image sorting and search). [sent-244, score-0.435]
48 1 Simulation Example In this example the feature vector yn is the intensity value of each pixel, and the pixel location is the spatial information sn . [sent-247, score-0.351]
49 Each observation is assumed to be drawn from a spatially dependent Gaussian mixture (i. [sent-248, score-0.162]
50 Therefore, we fixed the truncation level to K = 10 217 R EN , D U , C ARIN AND D UNSON for all models, and the clustering results are shown in Figure 6, with different colors representing the cluster index (mixture component to which a data sample is assigned). [sent-257, score-0.132]
51 (a) original image, (b) DP, (c) KSBP, (d) LSBP Compared with DP and KSBP, the proposed LSBP shows a much cleaner segmentation in Figure 6(d), as a consequence of the imposed favoring of contiguous segments. [sent-259, score-0.349]
52 5 2 −4 50 100 150 200 250 300 350 400 450 observation index 6 x 10 (a) (b) Figure 7: Original audio waveform, (a), and representation in terms of MFCC features, (b). [sent-271, score-0.125]
53 1) specified in a temporal (one-dimensional) space, the proposed LSBP is naturally extended to segmentation of sequential data, such as for speaker diarization (Ben et al. [sent-273, score-0.48]
54 Provided with a spoken document consisting of multiple speakers, speaker diarization is the process of segmenting the audio signal into contiguous temporal regions, and within a given region a particular individual is speaking. [sent-276, score-0.357]
55 We assume the acoustic observations at different times are drawn from a Gaussian mixture model (each generating Gaussian ideally corresponds to a speaker ID). [sent-278, score-0.185]
56 , 2008), in which the speech is represented by an HMM with Gaussian state-dependent emissions; to associate a given speaker with a particular state, the states are made to be persistent, or “sticky’, with the state-dependent degree of stickiness also inferred. [sent-282, score-0.134]
57 Figure 7(a) presents the audio waveform with a sampling rate of 16000 Hz. [sent-286, score-0.125]
58 For the sticky HMM, we employed two distinct forms of posterior computation: (i) a VB analysis, which is consistent with the methods employed for the other models; and (ii) a Gibbs sampler, analogous to that employed in the original sticky-HMM paper (Fox et al. [sent-314, score-0.255]
59 To segment the audio data, we labeled each observation to the index of the cluster with the largest probability value, and the results are shown in Figure 8 (here the sticky-HMM results were computed via VB analysis). [sent-329, score-0.189]
60 From the results in Figure 8, the proposed LSBP yields the best segmentation performance, with results in close agreement with ground truth. [sent-331, score-0.321]
61 220 L OGISTIC S TICK -B REAKING P ROCESS While the sticky HMM did not yield reliable VB-computed results, it performed well when a Gibbs sampler was employed (as in the work Fox et al. [sent-338, score-0.14]
62 Therefore, we fixed the truncation level to K = 10 for all models, and the clustering results are shown in Figure 6, with different colors representing the cluster index (mixture component to which a data sample is assigned). [sent-348, score-0.132]
63 Note that only three sticks have appreciable probability for any time t, and the segments tend to be localized, with near-unity probability of using a corresponding model parameter within a given segment. [sent-358, score-0.131]
64 3 Image Segmentation with LSBP The images considered first are from Microsoft Research Cambridge3 and each image has 320×213 pixels. [sent-361, score-0.152]
65 To apply the hierarchical model to image segmentation, we first over-segment each image into 1, 000 “superpixels”, which are local, coherent and preserve most of the structure necessary for segmentation at the scale of interest (Ren and Malik, 2003). [sent-362, score-0.502]
66 (a) wk , (b) gk (t) , (c) σk (t), (d) πk (t) is described in Mori (2005), and can be downloaded at http://fas. [sent-376, score-0.173]
67 To perform segmentation at the patch level (each superpixel corresponds to one patch), the center of each superpixel is recorded as the location coordinate sn . [sent-386, score-0.618]
68 The segmentation task now reduces to grouping/clustering the superpixels based on the associated image feature vector and associated spatial information. [sent-394, score-0.55]
69 The segmentation performance for each of these images is shown in Figure 11(g), (h) and (i), using respectively K = 4, 3 and 6, based on the model evidence (discussed further below). [sent-396, score-0.374]
70 In Figure 11(j), (k) and (l), the segmentation results are shown with K fixed at K = 10. [sent-398, score-0.283]
71 In this case the LSBP has ten sticks; however, based on the segmentation there are a subset of sticks (5, 8 and 7, respectively) inferred to have appreciable amplitude. [sent-399, score-0.353]
72 For the relatively simple “chimney” image, the segmentation results are very similar with different initializations of K (Figure 11(g)) and simply truncating the sticks at a “large” value (Figure 11(j) with K = 10). [sent-404, score-0.353]
73 We again observe homogeneous contiguous segments with sharp boundaries. [sent-406, score-0.127]
74 In this case a smaller K yields (as expected) a simpler segmentation (Figure 11(h)). [sent-407, score-0.283]
75 (a)VB iteration lowerbound for image “chimney” with K = 4; (b) Approximating the model evidence as a function of K for image “chimney”. [sent-425, score-0.215]
76 To further evaluate the performance of LSBP for image segmentation, we also consider several other state-of-art methods, including two other non-parametric statistical models: the Dirichlet process (DP) (Sethuraman, 1994) and the kernel stick-breaking process (KSBP) (An et al. [sent-426, score-0.124]
77 , 2007), and also spatially varying mixture segmentation with edge preservation (St. [sent-432, score-0.445]
78 We consider the same data source as in the previous examples, but for the next set of results segmentation “ground truth” was provided with the data. [sent-435, score-0.283]
79 Figure 13 shows typical segmentation results for the different algorithms. [sent-438, score-0.283]
80 -svgm) favors a more contiguous segmentation for the texture region, preserving edges; this may make a good tradeoff between keeping coherence and capturing details, but the segmentation performance is degraded by redundant boundaries, such as those within the goose body. [sent-444, score-0.667]
81 Among the Bayesian methods (DP, KSBP and LSBP), LSBP tends to yield better segmentation, characterized by homogeneous segmentation regions and sharp segment boundaries. [sent-446, score-0.316]
82 To quantify segmentation results, we also calculated the Rand Index (RI) (Unnikrishnan et al. [sent-447, score-0.283]
83 , 2007) and the Variation of Information (VoI) (Meil˘ , 2003), using segmentation “truth” provided a with the data. [sent-448, score-0.283]
84 RI measures consistency between two segmentation labels via an overlapping fraction, and VoI roughly calculates the amount of randomness that exists in one segmentation that is not explained by the other. [sent-449, score-0.566]
85 We calculated the average RI and VoI values of the 225 R EN , D U , C ARIN AND D UNSON original image ground truth Normalized li d cuts Multi-scale Ncut Stu. [sent-451, score-0.2]
86 From top to down, each row shows: the original image, the image ground truth, normalized cuts, multiscale Ncut, Student-t distributions mixture model (Stu. [sent-454, score-0.219]
87 4 These images have size 481× 321 pixels, and we also over-segmented each image into 1000 superpixels. [sent-467, score-0.152]
88 By a visual evaluation of the segmentation results (see Figure 15), multi-scale Ncut is not as good as the other methods when the segments are of irregular shape and unequal size. [sent-599, score-0.344]
89 The purpose of this section was to demonstrate that LSBP yields competitive segmentation performance, compared with many state-of-the-art algorithms. [sent-600, score-0.283]
90 It should be emphasized that there is no perfect way of quantifying segmentation performance, especially since the underlying “truth” is itself subjective. [sent-601, score-0.283]
91 An important advantage of the Bayesian methods (DP, KSBP and LSBP) is that they may be readily extended to joint segmentation of multiple images, considered in the next section. [sent-602, score-0.283]
92 4 Joint Image Segmentation with H-LSBP In this section we consider H-LSBP for joint segmentation of multiple images. [sent-604, score-0.283]
93 The same feature and image processing techniques are employed as above. [sent-607, score-0.131]
94 6591 Table 3: Different segmentation methods compared on Berkeley 300 images data set. [sent-746, score-0.343]
95 The probability that the nth superpixel within image m is associated with the kth hidden indicator variable tmk , is represented as pk (smn ) = σ(gk (smn )) ∏l < ξmn,k′ >q(zmn ) < tmk′ ,l > ymni , ′ where α0i = 1/I for i = 1, . [sent-748, score-0.392]
96 , K − 1, then K ∑ (−1)ν kk′ hmnk = k′ =k < ξ−k ′ >q(z−k ) mn,k mn L ∑ q(tmk l ′ =1 ′ ˜ mk mn = l ′ ) < logp(y|θl ) >q(θl ) + mT xk , where < ξ−k ′ >z−k = ∏K−1 j=k ρmn, j (−1)ν jk′ + ν jk′ is the expectation associated the gating varimn,k j=1, mn ables {zmn1 , . [sent-768, score-0.371]
97 defined as xmn mn ˆ mi mk i=1 and f (ηmnk ) = xk mn Bishop and Tipping, 2000). [sent-780, score-0.252]
98 Spatially coherent latent topic model for concurrent object segmentation and classification. [sent-878, score-0.283]
99 Comparative evaluation of various MFCC implementations on the speaker verification task. [sent-938, score-0.134]
100 Shared segmentation of natural scenes using dependent Pitman-Yor processes. [sent-1052, score-0.283]
wordName wordTfidf (topN-words)
[('lsbp', 0.584), ('segmentation', 0.283), ('ksbp', 0.231), ('tmk', 0.171), ('sn', 0.164), ('speaker', 0.134), ('vb', 0.122), ('arin', 0.121), ('reaking', 0.121), ('tick', 0.121), ('unson', 0.121), ('spatial', 0.115), ('gk', 0.112), ('spatially', 0.111), ('voi', 0.111), ('zmn', 0.111), ('sticky', 0.101), ('wki', 0.101), ('znk', 0.094), ('audio', 0.094), ('ogistic', 0.092), ('image', 0.092), ('dp', 0.088), ('mk', 0.084), ('mn', 0.084), ('nc', 0.075), ('chimney', 0.07), ('sticks', 0.07), ('superpixel', 0.07), ('rocess', 0.07), ('variational', 0.068), ('contiguous', 0.066), ('nk', 0.065), ('median', 0.061), ('wk', 0.061), ('segments', 0.061), ('mnk', 0.06), ('ncut', 0.06), ('superpixels', 0.06), ('images', 0.06), ('dirichlet', 0.059), ('pk', 0.059), ('hmm', 0.053), ('dunson', 0.053), ('beal', 0.053), ('bishop', 0.052), ('svens', 0.051), ('mixture', 0.051), ('cows', 0.05), ('logp', 0.05), ('ncuts', 0.05), ('mm', 0.05), ('vk', 0.05), ('fox', 0.05), ('inference', 0.045), ('ri', 0.043), ('mult', 0.043), ('yn', 0.041), ('truncation', 0.041), ('kas', 0.04), ('pantofaru', 0.04), ('shi', 0.04), ('en', 0.039), ('employed', 0.039), ('multiscale', 0.038), ('gibbs', 0.038), ('logistic', 0.038), ('ground', 0.038), ('cuts', 0.037), ('duke', 0.037), ('posterior', 0.037), ('hierarchical', 0.035), ('gating', 0.035), ('texture', 0.035), ('stick', 0.035), ('width', 0.035), ('proximate', 0.034), ('malik', 0.034), ('dn', 0.034), ('temporal', 0.033), ('truth', 0.033), ('tipping', 0.033), ('segment', 0.033), ('kernel', 0.032), ('location', 0.031), ('index', 0.031), ('atoms', 0.031), ('waveform', 0.031), ('sudderth', 0.031), ('atom', 0.031), ('evidence', 0.031), ('cluster', 0.031), ('presenting', 0.03), ('diarization', 0.03), ('mki', 0.03), ('mmk', 0.03), ('smn', 0.03), ('umk', 0.03), ('wmk', 0.03), ('clustering', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
2 0.092916414 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
Author: Thomas L. Griffiths, Zoubin Ghahramani
Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices
3 0.082468539 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
4 0.069298014 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
5 0.058324516 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
Author: Adam D. Bull
Abstract: In the efficient global optimization problem, we minimize an unknown function f , using as few observations f (x) as possible. It can be considered a continuum-armed-bandit problem, with noiseless data, and simple regret. Expected-improvement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussian-process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f . We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior. Keywords: convergence rates, efficient global optimization, expected improvement, continuumarmed bandit, Bayesian optimization
6 0.056203768 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
7 0.05371765 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning
8 0.049028438 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
9 0.046884716 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
10 0.039527886 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem
11 0.038514648 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes
12 0.038390394 58 jmlr-2011-Learning from Partial Labels
13 0.03691192 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
14 0.036898836 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
15 0.03688452 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough
16 0.033912983 55 jmlr-2011-Learning Multi-modal Similarity
17 0.03362073 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
18 0.032974675 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
19 0.032740518 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
20 0.032105222 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
topicId topicWeight
[(0, 0.184), (1, -0.106), (2, -0.068), (3, 0.012), (4, -0.083), (5, -0.026), (6, 0.008), (7, 0.128), (8, -0.057), (9, -0.02), (10, 0.093), (11, 0.075), (12, 0.096), (13, 0.042), (14, 0.076), (15, -0.014), (16, -0.002), (17, -0.031), (18, -0.138), (19, 0.04), (20, 0.042), (21, -0.09), (22, 0.05), (23, 0.136), (24, -0.042), (25, 0.087), (26, 0.094), (27, 0.052), (28, 0.084), (29, 0.063), (30, 0.018), (31, -0.052), (32, 0.202), (33, -0.103), (34, 0.182), (35, 0.082), (36, -0.105), (37, 0.114), (38, -0.155), (39, -0.015), (40, -0.15), (41, 0.012), (42, 0.007), (43, -0.013), (44, -0.327), (45, -0.106), (46, -0.042), (47, 0.161), (48, 0.162), (49, 0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.94007027 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
2 0.40758207 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
3 0.40555999 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
4 0.36530235 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
Author: Thomas L. Griffiths, Zoubin Ghahramani
Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices
5 0.34189194 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
Author: Lars Omlor, Martin A. Giese
Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum
6 0.32577053 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
7 0.29449672 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
8 0.2914966 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
9 0.27851015 94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization
10 0.27150568 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
11 0.24818532 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning
12 0.23745438 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation
13 0.23208205 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem
14 0.22214992 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes
15 0.22148156 58 jmlr-2011-Learning from Partial Labels
16 0.21322571 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
17 0.20277384 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
18 0.2006304 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
19 0.2002569 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
20 0.19224244 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
topicId topicWeight
[(4, 0.036), (9, 0.024), (10, 0.062), (13, 0.424), (24, 0.073), (31, 0.085), (32, 0.031), (41, 0.02), (60, 0.014), (65, 0.014), (71, 0.017), (73, 0.034), (78, 0.05), (90, 0.028)]
simIndex simValue paperId paperTitle
1 0.98456728 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets
Author: Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, Christian Buchta
Abstract: This paper describes the ecosystem of R add-on packages developed around the infrastructure provided by the package arules. The packages provide comprehensive functionality for analyzing interesting patterns including frequent itemsets, association rules, frequent sequences and for building applications like associative classification. After discussing the ecosystem’s design we illustrate the ease of mining and visualizing rules with a short example. Keywords: frequent itemsets, association rules, frequent sequences, visualization 1. Overview Mining frequent itemsets and association rules is a popular and well researched method for discovering interesting relations between variables in large databases. Association rules are used in many applications and have become prominent as an important exploratory method for uncovering cross-selling opportunities in large retail databases. Agrawal et al. (1993) introduced the problem of mining association rules from transaction data as follows: Let I = {i1 , i2 , . . . , in } be a set of n binary attributes called items. Let D = {t1 ,t2 , . . . ,tm } be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is defined as an implication of the form X ⇒ Y where / X,Y ⊆ I and X ∩ Y = 0 are called itemsets. On itemsets and rules several quality measures can be defined. The most important measures are support and confidence. The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset. Itemsets with a support which surpasses a user defined threshold σ are called frequent itemsets. The confidence of a rule is defined as conf(X ⇒ Y ) = supp(X ∪Y )/supp(X). Association rules are rules with supp(X ∪Y ) ≥ σ and conf(X) ≥ δ where σ and δ are user defined thresholds. ©2011 Michael Hahsler, Sudheer Chelluboina, Kurt Hornik and Christian Buchta. H AHSLER , C HELLUBOINA , H ORNIK AND B UCHTA Figure 1: The arules ecosystem. The R package arules (Hahsler et al., 2005, 2010) implements the basic infrastructure for creating and manipulating transaction databases and basic algorithms to efficiently find and analyze association rules. Over the last five years several packages were built around the arules infrastructure to create the ecosystem shown in Figure 1. Compared to other tools, the arules ecosystem is fully integrated, implements the latest approaches and has the vast functionality of R for further analysis of found patterns at its disposal. 2. Design and Implementation The core package arules provides an object-oriented framework to represent transaction databases and patterns. To facilitate extensibility, patterns are implemented as an abstract superclass associations and then concrete subclasses implement individual types of patterns. In arules the associations itemsets and rules are provided. Databases and associations both use a sparse matrix representation for efficient storage and basic operations like sorting, subsetting and matching are supported. Different aspects of arules were discussed in previous publications (Hahsler et al., 2005; Hahsler and Hornik, 2007b,a; Hahsler et al., 2008). In this paper we focus on the ecosystem of several R-packages which are built on top of the arules infrastructure. While arules provides Apriori and Eclat (implementations by Borgelt, 2003), two of the most important frequent itemset/association rule mining algorithms, additional algorithms can easily be added as new packages. For example, package arulesNBMiner (Hahsler, 2010) implements an algorithm to find NB-frequent itemsets (Hahsler, 2006). A collection of further implementations which could be interfaced by arules in the future and a comparison of state-of-the-art algorithms can be found at the Frequent Itemset Mining Implementations Repository.1 arulesSequences (Buchta and Hahsler, 2010) implements mining frequent sequences in transaction databases. It implements additional association classes called sequences and sequencerules and provides the algorithm cSpade (Zaki, 2001) to efficiently mine frequent sequences. Another application currently under development is arulesClassify which uses the arules infrastructure to implement rule-based classifiers, including Classification Based on Association rules (CBA, Liu et al., 1998) and general associative classification techniques (Jalali-Heravi and Zaïane, 2010). A known drawback of mining for frequent patterns such as association rules is that typically the algorithm returns a very large set of results where only a small fraction of patterns is of interest to the analysts. Many researchers introduced visualization techniques including scatter plots, matrix 1. The Frequent Itemset Mining Implementations Repository can be found at http://fimi.ua.ac.be/. 2022 T HE ARULES R-PACKAGE E COSYSTEM Graph for 3 rules Scatter plot for 410 rules 1 size: support (0.001 − 0.0019) color: lift (8.3404 − 11.2353) red/blush wine 10 citrus fruit soda 0.95 confidence liquor 8 bottled beer fruit/vegetable juice 0.9 other vegetables 6 root vegetables 0.85 oil 4 0.8 0.001 0.0015 0.002 0.0025 0.003 whole milk lift yogurt tropical fruit support (a) (b) Figure 2: Visualization of all 410 rules as (a) a scatter plot and (b) shows the top 3 rules according to lift as a graph. visualizations, graphs, mosaic plots and parallel coordinates plots to analyze large sets of association rules (see Bruzzese and Davino, 2008, for a recent overview paper). arulesViz (Hahsler and Chelluboina, 2010) implements most of these methods for arules while also providing improvements using color shading, reordering and interactive features. Finally, arules provides a Predictive Model Markup Language (PMML) interface to import and export rules via package pmml (Williams et al., 2010). PMML is the leading standard for exchanging statistical and data mining models and is supported by all major solution providers. Although pmml provides interfaces for different packages it is still considered part of the arules ecosystem. The packages in the described ecosystem are available for Linux, OS X and Windows. All packages are distributed via the Comprehensive R Archive Network2 under GPL-2, along with comprehensive manuals, documentation, regression tests and source code. Development versions of most packages are available from R-Forge.3 3. User Interface We illustrate the user interface and the interaction between the packages in the arules ecosystem with a small example using a retail data set called Groceries which contains 9835 transactions with items aggregated to 169 categories. We mine association rules and then present the rules found as well as the top 3 rules according to the interest measure lift (deviation from independence) in two visualizations. > > > > library(
same-paper 2 0.76533419 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
3 0.31116417 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
Author: Ricardo Henao, Ole Winther
Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identifiability, factor models, linear Bayesian networks
4 0.3101972 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets
Author: Chiwoo Park, Jianhua Z. Huang, Yu Ding
Abstract: Gaussian process regression is a flexible and powerful tool for machine learning, but the high computational complexity hinders its broader applications. In this paper, we propose a new approach for fast computation of Gaussian process regression with a focus on large spatial data sets. The approach decomposes the domain of a regression function into small subdomains and infers a local piece of the regression function for each subdomain. We explicitly address the mismatch problem of the local pieces on the boundaries of neighboring subdomains by imposing continuity constraints. The new approach has comparable or better computation complexity as other competing methods, but it is easier to be parallelized for faster computation. Moreover, the method can be adaptive to non-stationary features because of its local nature and, in particular, its use of different hyperparameters of the covariance function for different local regions. We illustrate application of the method and demonstrate its advantages over existing methods using two synthetic data sets and two real spatial data sets. Keywords: domain decomposition, boundary value problem, Gaussian process regression, parallel computation, spatial prediction
5 0.30407557 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
6 0.30373102 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
7 0.30202171 16 jmlr-2011-Clustering Algorithms for Chains
8 0.30031899 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
9 0.29914477 48 jmlr-2011-Kernel Analysis of Deep Networks
10 0.29732704 12 jmlr-2011-Bayesian Co-Training
11 0.29656062 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
12 0.29615358 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
13 0.29566517 58 jmlr-2011-Learning from Partial Labels
14 0.2952182 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
15 0.2948454 28 jmlr-2011-Double Updating Online Learning
16 0.29452887 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
17 0.29341134 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
18 0.28953898 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
19 0.28839236 66 jmlr-2011-Multiple Kernel Learning Algorithms
20 0.28745353 29 jmlr-2011-Efficient Learning with Partially Observed Attributes