nips nips2009 nips2009-167 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley
Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). [sent-4, score-0.801]
2 The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. [sent-5, score-0.639]
3 The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. [sent-6, score-0.209]
4 The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. [sent-7, score-0.511]
5 Further, the noise variance need not be known, and can be nonstationary. [sent-8, score-0.118]
6 Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. [sent-9, score-0.115]
7 1 Introduction There has been significant recent interest in sparse signal expansions in several settings. [sent-11, score-0.137]
8 For example, such algorithms as the support vector machine (SVM) [1], the relevance vector machine (RVM) [2], Lasso [3] and many others have been developed for sparse regression (and classification). [sent-12, score-0.104]
9 The inferred sparse coefficients also often have biological/physical meaning, of interest for model interpretation [4]. [sent-14, score-0.205]
10 Of relevance for the current paper, there has recently been significant interest in sparse representations in the context of denoising, inpainting [5–10], compressive sensing (CS) [11, 12], and classification [13]. [sent-15, score-0.681]
11 Most of the CS literature assumes “off-the-shelf” wavelet and DCT bases/dictionaries [14], but recent denoising and inpainting research has demonstrated the significant advantages of learning an often over-complete dictionary matched to the signals of interest (e. [sent-17, score-0.981]
12 The purpose of this paper is to perform dictionary learning using new non-parametric Bayesian technology [16,17], that offers several advantages not found in earlier approaches, which have generally sought point estimates. [sent-20, score-0.366]
13 This paper makes four main contributions: • The dictionary is learned using a beta process construction [16, 17], and therefore the number of dictionary elements and their relative importance may be inferred non-parametrically. [sent-21, score-1.09]
14 • For the denoising and inpainting applications, we do not have to assume a priori knowledge of the noise variance (it is inferred within the inversion). [sent-22, score-0.833]
15 • The spatial inter-relationships between different components in images are exploited by use of the Dirichlet process [18] and a probit stick-breaking process [19]. [sent-24, score-0.217]
16 1 • Using learned dictionaries, inferred off-line or in situ, the proposed approach yields CS performance that is markedly better than existing standard CS methods as applied to imagery. [sent-25, score-0.164]
17 2 Dictionary Learning with a Beta Process In traditional sparse coding tasks, one considers a signal x ∈ n and a fixed dictionary D = (d1 , d2 , . [sent-26, score-0.473]
18 In many applications one may not know the noise variance or an appropriate sparsity level of α; further, one may be interested in the confidence of the estimate (e. [sent-33, score-0.242]
19 As discussed further below, the non-parametric Bayesian formulation also allows one to relax other assumptions that have been made in the field of learning D and α for denoising, inpainting and compressive sensing. [sent-37, score-0.525]
20 , Nc }, where Nc ≥ 2 represents the number of classes from which the data arise; when learning the dictionary we ignore the class labels yi , and later discuss how they may be considered in the learning process. [sent-48, score-0.426]
21 The two-parameter beta process (BP) was developed in [17], to which the reader is referred for further details; we here only provide those details of relevance for the current application. [sent-49, score-0.22]
22 These N binary column vectors are used to constitute a matrix Z ∈ {0, 1}K×N , with ith column corresponding to z i ; the kth row of Z is associated with atom ψ k , drawn as discussed above. [sent-58, score-0.111]
23 For our problem the atoms ψ k ∈ n will correspond to candidate members of our dictionary D, and the binary vector z i defines which members of the dictionary are used to represent sample xi ∈ D. [sent-59, score-0.798]
24 A naive form of our model, for representation of sample xi ∈ D, is xi = Ψz i + i . [sent-64, score-0.132]
25 However, this is highly restrictive, as it imposes that the coefficients of the dictionary expansion must be binary. [sent-65, score-0.405]
26 To address this, we draw −1 weights wi ∼ N (0, γw IK ), where γw is the precision or inverse variance; the dictionary weights are now αi = z i ◦ wi , and xi = Ψαi + i , where ◦ represents the Hadamard (element-wise) multiplication of two vectors. [sent-66, score-0.432]
27 For simplicity we assume that the dictionary elements, defined by the atoms ψ k , are drawn from a multivariate Gaussian base H0 , and the components of the error vectors i are drawn i. [sent-68, score-0.414]
28 To impose our desire that the vector of dictionary weights α is sparse, one may adjust the parameters a and b. [sent-81, score-0.394]
29 Following [9], we may define a linear or bilinear classifier based on the sparse weights α and the associated data x (in the bilinear case), with this here implemented in the form of a probit classifier. [sent-90, score-0.264]
30 Note that the model in (2) may be employed for unlabeled data, and the extension above may be employed for the available labeled data; consequently, all data (labeled and unlabeled) may be processed jointly to infer D. [sent-95, score-0.252]
31 3 Sequential dictionary learning for large training sets In the above discussion, we implicitly assumed all data D = {xi , yi }i=1,N are used together to infer the dictionary D. [sent-97, score-0.765]
32 It is of interest to briefly note that sequential inference is handled naturally via the proposed Bayesian analysis. [sent-104, score-0.126]
33 When doing variational Bayesian (VB) inference we have an analytic approximate representation for posteriors such as p(D|D1 , Θ), while for Gibbs sampling we may use the inferred samples. [sent-110, score-0.235]
34 1 Denoising, Inpainting and Compressive Sensing Image Denoising and Inpainting Assume we are given an image I ∈ Ny ×Nx with additive noise and missing pixels; we here assume a monochrome image for simplicity, but color images are also readily handled, as demonstrated when presenting results. [sent-113, score-0.414]
35 As is done typically [6, 7], we partition the image into NB = (Ny − 2 B + 1) × (Nx − B + 1) overlapping blocks {xi }i=1,NB , for each of which xi ∈ B (B = 8 is typically used). [sent-114, score-0.232]
36 If there is only additive noise but no missing pixels, then the model in (2) can be readily applied for simultaneous dictionary learning and image denoising. [sent-115, score-0.598]
37 If there are both noise and missing pixels, instead of directly observing xi , we observe a subset of the pixels in each xi . [sent-116, score-0.29]
38 Note that here Ψ and {αi }i=1,NB , which are used to recover the original noise-free and complete image, are directly inferred from the data under test; one may also employ an appropriate training set D with which to learn a dictionary D offline, or for initialization of in situ learning. [sent-117, score-0.609]
39 In denoising and inpainting studies of this type (see for example [6, 7] and references therein), it is often assumed that either the variance is known and used as a “stopping” criteria, or that the sparsity level is pre-determined and fixed for all i ∈ {1, NB }. [sent-118, score-0.664]
40 In (2) the noise precision (inverse variance), γ , is assumed drawn from a non-informative gamma distribution, and a full posterior density function is inferred for γ (and all other model parameters). [sent-120, score-0.282]
41 , { αi 0 }i=1,N , is influenced by the parameters a and b in the beta prior in (2). [sent-124, score-0.154]
42 Examining the posterior p(πk |−) ∼ Beta(a/K + N N i=1 zik , b(K − 1)/K + N − i=1 zik ), conditioned on all other parameters, we find that most settings of a and b tend to be non-informative, especially in the case of sequential learning (discussed further in Section 5). [sent-125, score-0.15]
43 Therefore, the average sparsity level of the representation is inferred by the data itself and each sample xi has its own unique sparse representation based on the posterior, which renders much more flexibility than enforcing the same sparsity level for each sample. [sent-126, score-0.323]
44 2 Compressive sensing We consider CS in the manner employed in [12]. [sent-128, score-0.126]
45 Assume our objective is to measure an image I ∈ Ny ×Nx , with this image constituting the 8 × 8 blocks {xi }i=1,NB . [sent-129, score-0.279]
46 Rather than measuring the xi directly, pixel-by-pixel, in CS we perform the projection measurement v i = Φxi , where v i ∈ Np , with Np representing the number of projections, and Φ ∈ Np ×64 (assuming that xi is represented by a 64-dimensional vector). [sent-130, score-0.132]
47 Our goal is to have Np 64, thereby yielding compressive measurements. [sent-132, score-0.118]
48 Consider a potential dictionary Ψ, as discussed in Section 2. [sent-134, score-0.417]
49 It is assumed that for each of the {xi }i=1,NB from the image under test xi = Ψαi + i , for sparse αi and relatively small error i 2 . [sent-135, score-0.225]
50 The number of required projections Np needed for accurate estimation of αi is proportional to αi 0 [11], with this underscoring the desirability of learning a dictionary in which very sparse representations are manifested (as compared to using an “off-the-shelf” wavelets or DCT basis). [sent-136, score-0.481]
51 For CS inversion, the model in (2) is employed, and therefore the appropriate dictionary D is learned jointly while performing CS inversion, in situ on the image under test. [sent-137, score-0.576]
52 As discussed when presenting results, one may also learn the CS dictionary in advance, off-line, with appropriate training images (using the model in (2)). [sent-142, score-0.572]
53 However, the unique opportunity for joint CS inversion and learning of an appropriate parsimonious dictionary is deemed to be a significant advantage, as it does not presuppose that one would know an appropriate training set in advance. [sent-143, score-0.559]
54 The inpainting problem may be viewed as a special case of CS, in which each row of Φ corresponds to a delta function, locating a unique pixel on the image at which useful (unobscured) data are observed. [sent-144, score-0.49]
55 A CS camera designed around an inpainting construction has several advantages, from the standpoint of simplicity. [sent-148, score-0.356]
56 4 Exploiting Spatial Structure For the applications discussed above, the {xi }i=1,NB come from the single image under test, and consequently there is underlying (spatial) structure that should ideally be exploited. [sent-150, score-0.133]
57 5 Example Results For the denoising and inpainting results, we observed that the Gibbs sampler provided better performance than associated variational Bayesian inference. [sent-171, score-0.657]
58 For denoising and inpainting we may exploit shifted versions of the data, which accelerates convergence substantially (discussed in detail below). [sent-172, score-0.673]
59 Therefore, all denoising and inpainting results are based on efficient Gibbs sampling. [sent-173, score-0.585]
60 For CS we cannot exploit shifted images, and therefore to achieve fast inversion variational Bayesian (VB) inference [22] is employed; for this application VB has proven to be quite effective, as discussed below. [sent-174, score-0.32]
61 1 Denoising We consider denoising a 256 × 256 image, with comparison of the proposed approach to K-SVD [6] (for which the noise variance is assumed known and fixed); the true noise standard deviation is set at 15, 25 and 50 in the examples below. [sent-177, score-0.468]
62 We show results for three algorithms: (i) mismatched K-SVD (with noise standard deviation of 30), (ii) K-SVD when the standard deviation is properly matched, and (iii) the proposed BP approach. [sent-178, score-0.2]
63 For (iii) a non-informative prior is placed on the noise precision, and the same BP model is run for all three noise levels (with the underlying noise levels inferred). [sent-179, score-0.24]
64 In Figure 1 are shown the noisy images at the three different noise levels, as well as the reconstructions via BP and KSVD. [sent-181, score-0.12]
65 As seen within the bottom portion of the right part of Figure 1, the unused dictionary elements appear as random draws from the prior, since they are not used and hence influenced by the data. [sent-184, score-0.366]
66 Note that K-SVD works well when the set noise variance is at or near truth, but the method is undermined by mismatch. [sent-185, score-0.118]
67 The BP denoiser estimates a full posterior density function on the noise standard deviation; for the examples considered here, the modes of the inferred standard-deviation posteriors were 15. [sent-188, score-0.239]
68 To achieve these BP results, we employ a sequential implementation of the Gibbs sampler (a batch implementation converges to the same results but with higher computational cost); this is discussed in further detail below, when presenting inpainting results. [sent-192, score-0.566]
69 Figure 1: Left: Representative denoising results, with the top through bottom rows corresponding to noise standard deviations of 15, 25 and 50, respectively. [sent-193, score-0.309]
70 Right: Inferred BP dictionary elements for noise standard deviation 25, in order of importance (probability to be used) from the top-left. [sent-197, score-0.487]
71 For the mismatched K-SVD results, the noise stand deviation was fixed at 30. [sent-200, score-0.159]
72 2 Inpainting Our inpainting and denoising results were achieved by using the following sequential procedure. [sent-214, score-0.626]
73 Further, consider all B × B blocks with left-bottom pixels at {p + B, j + 6 30 25 PSNR 20 15 10 5 0 8 16 24 32 Learning round 40 48 56 64 Figure 2: Inpainting results. [sent-216, score-0.162]
74 This set of blocks is denoted data set Dpj , and considering 1 ≤ p ≤ B and 1 ≤ j ≤ B, there are a total of B 2 such shifted data sets. [sent-220, score-0.144]
75 In the first iteration of learning Ψ, we employ the blocks in D11 , and for this first round we initialize Ψ and αi based on a singular value decomposition (SVD) of the blocks in D11 (we achieved similar results when Ψ was initialized randomly). [sent-221, score-0.22]
76 These Ψ and αi are then used to initialize the Gibbs sampler in the second round, now applied to the B × B blocks in D11 ∪ D21 (for D21 the neighboring αi is used for initialization). [sent-223, score-0.114]
77 This is done B 2 = 64 times until at the end all shifted blocks are processed simultaneously. [sent-225, score-0.173]
78 This sequential process may be viewed as a sequential Gibbs burn in, after which all of the shifted blocks are processed. [sent-226, score-0.293]
79 Concerning computational costs, the inpainting and denoising algorithms scale linearly as a function of the block size, the dictionary size, the sparsity level, and the number of training samples; all results reported here were run efficiently in Matlab on PCs, with comparable costs as K-SVD. [sent-238, score-0.992]
80 3 Compressive sensing We consider a CS example, in which the image is divided into 8 × 8 patches, with these constituting the underlying data {xi }i=1,NB to be inferred. [sent-240, score-0.186]
81 For each of the NB blocks, a vector of CS measurements v i = Φxi is measured, where the number of projections per patch is Np , and the total number of CS projections is Np NB . [sent-241, score-0.105]
82 Each xi is assumed represented in terms of a dictionary xi = Dαi + i , and three constructions for D were considered: (i) a DCT expansion; (ii) learning of D using the beta process construction, using training images; (iii) using the beta process to perform joint CS inversion and learning of D. [sent-243, score-1.056]
83 The dictionary was set to K = 256, and the offline beta process inferred a dictionary of size M = 237. [sent-247, score-1.023]
84 The inversion results at left are based on a learned dictionary; except for the “online BP” results, all of these results employ the same dictionary D learned off-line as above, and the algorithms are distinguished by different ways of estimating {αi }i=1,NB . [sent-249, score-0.611]
85 The online BP results are quite competitive with those inferred off-line. [sent-251, score-0.123]
86 One also notes that the results based on a learned dictionary (left in Figure 3) are markedly better than those based on the DCT (right in Figure 3); similar results were achieved when the DCT was replaced by a wavelet representation. [sent-252, score-0.432]
87 For the DCT-based results, note that the DP- and PSBP-based BP CS inversion results are significantly better than those of all other CS inversion algorithms. [sent-253, score-0.278]
88 Note that CS inversion using the DP-based BP algorithm (as discussed in Section 4) yield the best results, significantly better than BP results not based on the DP, and better than all competing CS inversion algorithms (for both learned dictionaries and the DCT). [sent-255, score-0.431]
89 The DP-based results are very similar to those generated by the probit stick-breaking process (PSBP) [19], which enforces spatial information more explicitly; this suggests that the simpler DP-based results are adequate, at least for the wide class of examples considered. [sent-256, score-0.138]
90 Note that we also considered the DP and PSBP for the denoising and inpaiting examples above (those results were omitted, for brevity). [sent-257, score-0.261]
91 The DP-based BP CS inversion algorithm scales as O(NB · Np · B 2 ). [sent-261, score-0.139]
92 For the left results, the “Online BP” results simultaneously learned the dictionary and did CS inversion; the remainder of the left results are based on a dictionary learned offline on a training set. [sent-286, score-0.81]
93 A DCT dictionary is used for the results on the right. [sent-287, score-0.366]
94 The total number of pixels in the image is 480 × 320 = 153, 600. [sent-296, score-0.136]
95 6 Conclusions The non-parametric beta process has been presented for dictionary learning with the goal of image denoising, inpainting and compressive sensing, with very encouraging results relative to the state of the art. [sent-299, score-1.115]
96 In the context of noisy underlying data, the noise variance need not be known in advance, and it need not be spatially uniform. [sent-301, score-0.118]
97 K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. [sent-329, score-0.167]
98 Image denoising via sparse and redundant representations over learned dictionaries. [sent-335, score-0.345]
99 Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization. [sent-380, score-0.516]
100 Variational Bayesian multinomial probit regression with Gaussian process priors. [sent-454, score-0.138]
wordName wordTfidf (topN-words)
[('cs', 0.398), ('dictionary', 0.366), ('inpainting', 0.356), ('bp', 0.352), ('denoising', 0.229), ('psbp', 0.159), ('beta', 0.154), ('bcs', 0.141), ('dct', 0.139), ('inversion', 0.139), ('compressive', 0.118), ('nb', 0.113), ('psnr', 0.106), ('probit', 0.099), ('inferred', 0.098), ('gibbs', 0.089), ('np', 0.086), ('dp', 0.085), ('blocks', 0.084), ('image', 0.082), ('noise', 0.08), ('sparse', 0.077), ('sensing', 0.073), ('db', 0.069), ('vb', 0.066), ('xi', 0.066), ('dictionaries', 0.063), ('omp', 0.063), ('situ', 0.062), ('presenting', 0.06), ('shifted', 0.06), ('nx', 0.055), ('pixels', 0.054), ('employed', 0.053), ('discussed', 0.051), ('gamma', 0.051), ('bayesian', 0.049), ('mairal', 0.046), ('readily', 0.046), ('pursuits', 0.043), ('variational', 0.042), ('nc', 0.042), ('deviation', 0.041), ('sequential', 0.041), ('sparsity', 0.041), ('images', 0.04), ('zik', 0.04), ('learned', 0.039), ('imposes', 0.039), ('analytic', 0.039), ('process', 0.039), ('projections', 0.038), ('variance', 0.038), ('mismatched', 0.038), ('pursuit', 0.037), ('atom', 0.036), ('numbernumber', 0.035), ('ofmeasurements', 0.035), ('paisley', 0.035), ('sapiro', 0.035), ('rounds', 0.034), ('bernoulli', 0.034), ('reconstruction', 0.033), ('constructions', 0.033), ('mb', 0.033), ('infer', 0.033), ('considered', 0.032), ('ik', 0.032), ('priori', 0.032), ('pcs', 0.031), ('constituting', 0.031), ('sampler', 0.03), ('bilinear', 0.03), ('interest', 0.03), ('signal', 0.03), ('posterior', 0.029), ('measurements', 0.029), ('elad', 0.029), ('processed', 0.029), ('ponce', 0.028), ('may', 0.028), ('inference', 0.028), ('employ', 0.028), ('relevance', 0.027), ('designing', 0.027), ('handled', 0.027), ('basis', 0.027), ('inferring', 0.027), ('appropriate', 0.027), ('markedly', 0.027), ('goals', 0.025), ('ine', 0.025), ('nonuniform', 0.025), ('online', 0.025), ('pixel', 0.024), ('round', 0.024), ('retained', 0.024), ('missing', 0.024), ('classi', 0.024), ('drawn', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley
Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1
2 0.26893124 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
3 0.13154335 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing
Author: Ruben Coen-cagli, Peter Dayan, Odelia Schwartz
Abstract: A central hypothesis about early visual processing is that it represents inputs in a coordinate system matched to the statistics of natural scenes. Simple versions of this lead to Gabor–like receptive fields and divisive gain modulation from local surrounds; these have led to influential neural and psychological models of visual processing. However, these accounts are based on an incomplete view of the visual context surrounding each point. Here, we consider an approximate model of linear and non–linear correlations between the responses of spatially distributed Gaborlike receptive fields, which, when trained on an ensemble of natural scenes, unifies a range of spatial context effects. The full model accounts for neural surround data in primary visual cortex (V1), provides a statistical foundation for perceptual phenomena associated with Li’s (2002) hypothesis that V1 builds a saliency map, and fits data on the tilt illusion. 1
4 0.10909477 96 nips-2009-Filtering Abstract Senses From Image Search Results
Author: Kate Saenko, Trevor Darrell
Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1
5 0.10636326 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes
Author: Alan S. Willsky, Erik B. Sudderth, Michael I. Jordan, Emily B. Fox
Abstract: We propose a Bayesian nonparametric approach to the problem of modeling related time series. Using a beta process prior, our approach is based on the discovery of a set of latent dynamical behaviors that are shared among multiple time series. The size of the set and the sharing pattern are both inferred from data. We develop an efficient Markov chain Monte Carlo inference method that is based on the Indian buffet process representation of the predictive distribution of the beta process. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.
6 0.10554307 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares
7 0.1053246 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)
8 0.0991356 137 nips-2009-Learning transport operators for image manifolds
9 0.088152789 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
10 0.08509545 157 nips-2009-Multi-Label Prediction via Compressed Sensing
11 0.084841602 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
12 0.083511032 138 nips-2009-Learning with Compressible Priors
13 0.081645414 114 nips-2009-Indian Buffet Processes with Power-law Behavior
14 0.080493301 226 nips-2009-Spatial Normalized Gamma Processes
15 0.080400936 187 nips-2009-Particle-based Variational Inference for Continuous Systems
16 0.0766109 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis
17 0.076050408 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
18 0.075314611 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process
19 0.070722662 133 nips-2009-Learning models of object structure
20 0.070607767 100 nips-2009-Gaussian process regression with Student-t likelihood
topicId topicWeight
[(0, -0.206), (1, -0.097), (2, -0.054), (3, -0.02), (4, 0.047), (5, -0.034), (6, 0.085), (7, -0.028), (8, 0.182), (9, 0.075), (10, -0.069), (11, 0.05), (12, -0.043), (13, 0.112), (14, -0.096), (15, 0.05), (16, -0.034), (17, 0.111), (18, 0.092), (19, -0.019), (20, -0.067), (21, -0.126), (22, -0.026), (23, 0.127), (24, 0.142), (25, 0.155), (26, -0.034), (27, 0.068), (28, -0.083), (29, 0.176), (30, -0.015), (31, -0.157), (32, 0.063), (33, -0.019), (34, -0.063), (35, 0.021), (36, 0.111), (37, 0.022), (38, 0.088), (39, -0.124), (40, 0.078), (41, 0.013), (42, -0.007), (43, 0.057), (44, 0.037), (45, 0.006), (46, -0.048), (47, 0.112), (48, 0.091), (49, 0.12)]
simIndex simValue paperId paperTitle
same-paper 1 0.94686866 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley
Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1
2 0.75490433 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
3 0.53817368 138 nips-2009-Learning with Compressible Priors
Author: Volkan Cevher
Abstract: We describe a set of probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in p-compressible signals. A signal x ∈ RN is called p-compressible with magnitude R if its sorted coefficients exhibit a power-law decay as |x|(i) R · i−d , where the decay rate d is equal to 1/p. p-compressible signals live close to K-sparse signals (K N) in the r -norm (r > p) since their best K-sparse approximation error decreases with O R · K 1/r−1/p . We show that the membership of generalized Pareto, Student’s t, log-normal, Fr´ chet, and log-logistic distributions to the set of compresse ible priors depends only on the distribution parameters and is independent of N . In contrast, we demonstrate that the membership of the generalized Gaussian distribution (GGD) depends both on the signal dimension and the GGD parameters: the expected decay rate of N -sample iid realizations from the GGD with the shape parameter q is given by 1/ [q log (N/q)]. As stylized examples, we show via experiments that the wavelet coefficients of natural images are 1.67-compressible whereas their pixel gradients are 0.95 log (N/0.95)-compressible, on the average. We also leverage the connections between compressible priors and sparse signals to develop new iterative re-weighted sparse signal recovery algorithms that outperform the standard 1 -norm minimization. Finally, we describe how to learn the hyperparameters of compressible priors in underdetermined regression problems by exploiting the geometry of their order statistics during signal recovery. 1
4 0.48523399 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors
Author: Dilip Krishnan, Rob Fergus
Abstract: The heavy-tailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and super-resolution. α These distributions are well modeled by a hyper-Laplacian p(x) ∝ e−k|x| , typically with 0.5 ≤ α ≤ 0.8. However, the use of sparse distributions makes the problem non-convex and impractically slow to solve for multi-megapixel images. In this paper we describe a deconvolution approach that is several orders of magnitude faster than existing techniques that use hyper-Laplacian priors. We adopt an alternating minimization scheme where one of the two phases is a non-convex problem that is separable over pixels. This per-pixel sub-problem may be solved with a lookup table (LUT). Alternatively, for two specific values of α, 1/2 and 2/3 an analytic solution can be found, by finding the roots of a cubic and quartic polynomial, respectively. Our approach (using either LUTs or analytic formulae) is able to deconvolve a 1 megapixel image in less than ∼3 seconds, achieving comparable quality to existing methods such as iteratively reweighted least squares (IRLS) that take ∼20 minutes. Furthermore, our method is quite general and can easily be extended to related image processing problems, beyond the deconvolution application demonstrated. 1
5 0.4722009 96 nips-2009-Filtering Abstract Senses From Image Search Results
Author: Kate Saenko, Trevor Darrell
Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1
6 0.45084131 137 nips-2009-Learning transport operators for image manifolds
7 0.44423735 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares
8 0.44026166 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models
9 0.42036104 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis
10 0.41128716 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
11 0.40013161 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing
12 0.38529167 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity
13 0.37050346 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
14 0.36414865 157 nips-2009-Multi-Label Prediction via Compressed Sensing
15 0.36013499 226 nips-2009-Spatial Normalized Gamma Processes
16 0.35810381 67 nips-2009-Directed Regression
17 0.35691059 114 nips-2009-Indian Buffet Processes with Power-law Behavior
18 0.34659761 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis
19 0.34344584 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
20 0.34267655 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
topicId topicWeight
[(21, 0.012), (24, 0.038), (25, 0.075), (35, 0.115), (36, 0.129), (39, 0.044), (58, 0.079), (61, 0.011), (65, 0.194), (71, 0.061), (81, 0.026), (86, 0.101)]
simIndex simValue paperId paperTitle
1 0.86859965 14 nips-2009-A Parameter-free Hedging Algorithm
Author: Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu
Abstract: We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1
2 0.8672545 70 nips-2009-Discriminative Network Models of Schizophrenia
Author: Irina Rish, Benjamin Thyreau, Bertrand Thirion, Marion Plaze, Marie-laure Paillere-martinot, Catherine Martelli, Jean-luc Martinot, Jean-baptiste Poline, Guillermo A. Cecchi
Abstract: Schizophrenia is a complex psychiatric disorder that has eluded a characterization in terms of local abnormalities of brain activity, and is hypothesized to affect the collective, “emergent” working of the brain. We propose a novel data-driven approach to capture emergent features using functional brain networks [4] extracted from fMRI data, and demonstrate its advantage over traditional region-of-interest (ROI) and local, task-specific linear activation analyzes. Our results suggest that schizophrenia is indeed associated with disruption of global brain properties related to its functioning as a network, which cannot be explained by alteration of local activation patterns. Moreover, further exploitation of interactions by sparse Markov Random Field classifiers shows clear gain over linear methods, such as Gaussian Naive Bayes and SVM, allowing to reach 86% accuracy (over 50% baseline - random guess), which is quite remarkable given that it is based on a single fMRI experiment using a simple auditory task. 1
same-paper 3 0.86377919 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
Author: Mingyuan Zhou, Haojun Chen, Lu Ren, Guillermo Sapiro, Lawrence Carin, John W. Paisley
Abstract: Non-parametric Bayesian techniques are considered for learning dictionaries for sparse image representations, with applications in denoising, inpainting and compressive sensing (CS). The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size. The Dirichlet process and a probit stick-breaking process are also considered to exploit structure within an image. The proposed method can learn a sparse dictionary in situ; training images may be exploited if available, but they are not required. Further, the noise variance need not be known, and can be nonstationary. Another virtue of the proposed method is that sequential inference can be readily employed, thereby allowing scaling to large images. Several example results are presented, using both Gibbs and variational Bayesian inference, with comparisons to other state-of-the-art approaches. 1
4 0.80168802 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
Author: Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael J. Black
Abstract: Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain. 1
5 0.74922889 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions
Author: Ruslan Salakhutdinov
Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.
6 0.74843413 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks
7 0.74805135 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
8 0.74362141 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
9 0.74146938 23 nips-2009-Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models
10 0.74014938 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
11 0.73947966 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition
12 0.73854923 113 nips-2009-Improving Existing Fault Recovery Policies
13 0.73834181 129 nips-2009-Learning a Small Mixture of Trees
14 0.73832238 97 nips-2009-Free energy score space
15 0.73828501 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
16 0.73784673 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
17 0.73716527 100 nips-2009-Gaussian process regression with Student-t likelihood
18 0.73567331 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
19 0.73556459 187 nips-2009-Particle-based Variational Inference for Continuous Systems
20 0.73522645 34 nips-2009-Anomaly Detection with Score functions based on Nearest Neighbor Graphs