nips nips2006 nips2006-51 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 it Abstract This paper proposes a new approach to model-based clustering under prior knowledge. [sent-10, score-0.196]
2 The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. [sent-11, score-0.435]
3 To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. [sent-12, score-0.168]
4 We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e. [sent-13, score-0.463]
5 Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. [sent-16, score-0.181]
6 1 Introduction Most approaches to semi-supervised learning (SSL) see the problem from one of two (dual) perspectives: supervised classification with additional unlabelled data (see [20] for a recent survey); clustering with prior information or constraints (e. [sent-17, score-0.272]
7 The second perspective, usually termed semi-supervised clustering (SSC), is usually adopted when labels are totaly absent, but there are (usually pair-wise) relations that one wishes to enforce or encourage. [sent-20, score-0.308]
8 , grouping constraints [15, 17]), or have the form of a prior under which probabilistic clustering is performed [4, 11]. [sent-24, score-0.314]
9 However, the previous EM-type algorithms for this class of methods have a major drawback: the presence of the prior makes the E-step non-trivial, forcing the use of expensive Gibbs sampling [11] or suboptimal methods such as the iterated conditional modes algorithm [4]. [sent-26, score-0.166]
10 The keystone is the formulation of SSC as a penalized logistic regression problem, where the labels are only indirectly observed. [sent-28, score-0.297]
11 the missing group labels, underlies the simplicity of the resulting GEM algorithm. [sent-32, score-0.086]
12 When applied to image segmentation, our method allows using spatial priors which are typical of image estimation problems (e. [sent-33, score-0.567]
13 Under these priors, the M-step of our GEM algorithm reduces to a simple image denoising procedure, for which there are several extremely efficient algorithms. [sent-36, score-0.334]
14 2 Formulation We start from the standard formulation of finite mixture models: X = {x1 , . [sent-37, score-0.13]
15 , xn } is an observed data set, where each xi ∈ I d was generated (independently) according to one of a set of K probR ability (density or mass) functions {p(·|φ(1) ), . [sent-40, score-0.089]
16 In image segmentation, each xi is a pixel value (gray scale, d = 1; color, d = 3) or a vector of local (e. [sent-44, score-0.276]
17 Associated (1) (K) with X , there is a hidden label set Y = {y1 , . [sent-47, score-0.075]
18 , yi ]T ∈ {0, 1}K , with (k) yi = 1 if and only if xi was generated by source k (the so-called “1-of-K” binary encoding). [sent-53, score-0.763]
19 Thus, K n (k) p (X |Y, φ ) = p(xi |φ k=1 (k) i: yi K )= p(xi |φ i=1 =1 (k) (k) ) yi , (1) k=1 where φ = (φ(1) , . [sent-54, score-0.674]
20 In standard mixture models, all the yi are assumed to be independent and identically distributed samples following a multinomial distribution with probabilities {η (1) , . [sent-58, score-0.498]
21 This is the part of standard mixture models that has to be modified in order to i k (η insert grouping constraints [15] or a grouping prior p(Y) [4, 11]. [sent-64, score-0.317]
22 However, this prior destroys the simplicity of the standard E-step for finite mixtures, which is critically based on the independence assumption. [sent-65, score-0.138]
23 , zi ]T ∈ I K following a multinomial logistic model [5]: R , n K P (Y|Z) = (k) (k) (k) P [yi = 1|zi ] yi , where (k) P [yi = 1|zi ] = i=1 k=1 ezi (l) K zi l=1 e . [sent-76, score-1.078]
24 , zn ]T ; of course, Z can be seen as an n × (K − 1) matrix, where z(k) is the k-th column and zi is the i-th row. [sent-93, score-0.327]
25 With this formulation, certain grouping preferences may be expressed by a prior p(Z). [sent-94, score-0.223]
26 (4) For image segmentation, each z(k) is an image with real-valued elements and a natural choice for A is to have Ai,j = λ, if i and j are neighbors, and zero otherwise. [sent-99, score-0.374]
27 Assuming periodic boundary conditions for the neighborhood system, ∆ is a block-circulant matrix with circulant blocks [2]. [sent-100, score-0.174]
28 However, as shown below, other more sophisticated priors (such as wavelet-based priors) can also be used at no additional computational cost [1]. [sent-101, score-0.141]
29 1 Model Estimation Marginal Maximum A Posteriori and the GEM Algorithm Based on the formulation presented in the previous section, SSC is performed by estimating Z and φ, seeing Y as missing data. [sent-103, score-0.113]
30 The marginal maximum a posteriori estimate is obtained by marginalizing out the hidden labels (over all the possible label configurations), Z, φ = arg max Z,φ Y p(X , Y, Z|φ) = arg max Z,φ p(X |Y, φ) P (Y|Z) p(Z), (5) Y where we’re assuming a flat prior for φ. [sent-104, score-0.43]
31 This is in contrast Markov random field approaches to image segmentation, which lead to hard combinatorial problems, since they perform optimization directly with respect to the (discrete) label variables Y. [sent-106, score-0.318]
32 By finding arg maxk P [yi = 1|zi ], for every i, one may obtain a hard clustering/segmentation. [sent-108, score-0.098]
33 , by applying the following iterative procedure (until some convergence criterion is satisfied): E-step: Compute the conditional expectation of the complete log-posterior, given the current estimates (Z, φ) and the observations X : Q(Z, φ|Z, φ) = EY log p(Y, Z, φ|X ) Z, φ, X . [sent-111, score-0.099]
34 log p(Y, Z, φ|X ) = log p(X |Y, φ) + log P (Y|Z) + log p(Z) n . [sent-116, score-0.256]
35 = K n (k) yi (k) log p(xi |φ K K (k) (k) y i zi − )+ i=1 k=1 i=1 k=1 (k) ezi log + log p(Z) (8) k=1 . [sent-117, score-0.885]
36 (9) = 1|zi ] Notice that this is the same as the E-step for a standard finite mixture, where the probabilities (k) P [yi = 1|zi ] (given by (2)) play the role of the probabilities of the classes/components. [sent-125, score-0.104]
37 Finally, (k) the Q function is obtained by plugging the expectations yi into (8). [sent-126, score-0.368]
38 to each φ(k) , (k) n (k) φnew = arg max yi φ(k) i=1 log p(xi |φ(k) ). [sent-134, score-0.503]
39 In supervised image segmentation, these parameters are known (e. [sent-139, score-0.263]
40 In unsupervised image segmentation, φ is unknown and (10) will have to be applied. [sent-142, score-0.245]
41 To update the estimate of Z, we need to maximize (or at least improve, see (7)) n K K (k) (k) L(Z|Y) ≡ i=1 k=1 (k) ezi yi zi − log + log p(Z). [sent-143, score-0.85]
42 (11) k=1 Without the prior, this would be a simple logistic regression (LR) problem, with an identity design (k) (k) matrix [5]; however, instead of the usual hard labels yi ∈ {0, 1}, we have “soft” labels yi ∈ [0, 1]. [sent-144, score-0.925]
43 Thus, the iteration θ new = arg max R(θ, θ) = θ + B−1 g(θ) θ (13) is guaranteed to monotonically improve E(θ), i. [sent-153, score-0.102]
44 It was shown in [5] that the gradient and the Hessian of the logistic log-likelihood function, i. [sent-156, score-0.076]
45 , (11) without the log-prior, verify (with Ia denoting an a × a identity matrix and 1a a vector of a ones) g(z) = y − η(z) (1) (1) H(z) and (2) − 1 2 IK−1 − 1K−1 1T K−1 K ⊗ In ≡ −B, (14) (K−1) ]T denotes the lexicographic vectorization of Z, y denotes where z = [z1 , . [sent-158, score-0.249]
46 , zn (1) (1) (2) (K−1) T the corresponding lexicographic vectorization of Y, and η(z) = [p1 , . [sent-164, score-0.258]
47 Defining v = z + B−1 (y − η(z)), the MM update equation for solving (11) is thus znew (v) = arg min z 1 T (z − v) B (z − v) − log p(z) , 2 (15) where p(z) is equivalent to p(Z), because z is simply the lexicographic vectorization of Z. [sent-171, score-0.475]
48 (Generalized) M-step: Apply one or more iterations (15), keeping y fixed, that is, loop through the following two steps: v ← z + B−1 (y − η(z)) and z ← znew (v). [sent-179, score-0.134]
49 4 Speeding Up the Algorithm In image segmentation, the MM update equation (15) is formally equivalent to the MAP estimation of an image with n pixels in I K−1 , under prior p(z), where v plays the role of observed image, and R B is the inverse covariance matrix of the noise. [sent-181, score-0.565]
50 Due to the structure of B, even if the prior models the several z(k) as independent, i. [sent-182, score-0.097]
51 , if log p(z) = log p(z(1) ) + · · · + log p(z(K−1) ), (15) can not be decoupled into the several components {z(1) , . [sent-184, score-0.245]
52 Since the eigenvalues of the Kronecker product are the products of the eigenvalues of the matrices, λmax (B) = λmax (I − (1/K) 1 1T )/2. [sent-192, score-0.1]
53 Since 1 1T is a rank-1 matrix with eigenvalues {0, . [sent-193, score-0.08]
54 , K − 1, (16) z(k) (v(k) ) = arg min z − v(k) − log p(z(k) ) , new 2 z(k) where v(k) = z(k) + (1/ξK )(y(k) − η (k) (z(k) )). [sent-203, score-0.125]
55 5 Stationary Gaussian Field Priors Consider a Gaussian prior of form (3), where Ai,j only depends on the relative position of i and j and the neighborhood system defined by A has periodic boundary conditions. [sent-206, score-0.183]
56 In this case, both A and ∆ are block-circulant matrices, with circulant blocks [2], thus diagonalizable by a 2D discrete Fourier transform (2D-DFT). [sent-207, score-0.115]
57 Formally, ∆ = UH DU, where D is diagonal, U is the orthogonal matrix representing the 2D-DFT, and (·)H denotes conjugate transpose. [sent-208, score-0.067]
58 1 the DFT domain, log p(z(k) ) = 2 (Uz(k) )H D(Uz(k) ), and the solution of (16) is −1 z(k) (v(k) ) = ξK UH [ξK In + D] new U v(k) , for k = 1, . [sent-210, score-0.064]
59 (17) (k) Observe that (17) corresponds to filtering each image v , in the DFT domain, with a fixed filter −1 with frequency response [ξK In + D] ; this inversion can be computed off-line and is trivial because ξK In + D is diagonal. [sent-214, score-0.232]
60 Finally, it’s worth stressing that the matrix-vector products by U and UH are not carried out explicitly but more efficiently via the FFT algorithm, with cost O(n log n). [sent-215, score-0.064]
61 6 Wavelet-Based Priors for Segmentation It’s known that piece-wise smooth images have sparse wavelet-based representations (see [12] and the many references therein); this fact underlies the state-of-the-art denoising performance of wavelet-based methods. [sent-217, score-0.194]
62 Consider a wavelet expansion of each z(k) z(k) = Wθ (k) , k = 1, . [sent-219, score-0.137]
63 , K − 1, (18) (k) where the θ are sets of coefficients and W is the matrix representation of an inverse wavelet transform; W may be orthogonal or have more columns than lines (over-complete representations) [12]. [sent-222, score-0.239]
64 A wavelet-based prior for z(k) is induced by placing a prior on the coefficients θ (k) . [sent-223, score-0.194]
65 A classical choice for p(θ (k) ) is a generalized Gaussian [14]. [sent-224, score-0.082]
66 Without going into details, under this class of priors (and others), (16) becomes a non-linear wavelet-based denoising step, which has been widely studied in the image processing literature. [sent-225, score-0.475]
67 This is computationally very efficient, due to the existence of fast algorithms for computing direct and inverse wavelet transforms; e. [sent-227, score-0.172]
68 , O(n) for an orthogonal wavelet transform or O(n log n) for a shift-invariant redundant transform. [sent-229, score-0.295]
69 1 Extensions Semi-Supervised Segmentation For semi-supervised image segmentation, the user defines regions in the image for which the true label is known. [sent-231, score-0.46]
70 Our GEM algorithm is trivially modified to handle this case: if at location i the label (k) (j) is known to be (say) k, we freeze yi = 1, and yi = 0, for j = k. [sent-232, score-0.716]
71 2 Discriminative Features Our formulation (as most probabilistic segmentation methods) adopts a generative perspective, where each p(·|φ(k) ) models the data generation mechanism in the corresponding class. [sent-236, score-0.529]
72 However, discriminative methods (such as support vector machines) are seen as the current state-of-the-art in classification [7]. [sent-237, score-0.107]
73 We will now show how a pre-trained discriminative classifier can be used in our GEM algorithm instead of the generative likelihoods. [sent-238, score-0.139]
74 The E-step (see (9)) obtains the posterior probability that xi was generated by the k-th model, by (k) combining (via Bayes law) the corresponding likelihood p(xi |φ ) with the local prior probability (k) P [yi = 1|zi ]. [sent-239, score-0.215]
75 Consider that, instead of likelihoods derived from generative models, we have a discriminative classifier, i. [sent-240, score-0.139]
76 , one that directly provides estimates of the posterior class probabilities (k) P [yi = 1|xi ]. [sent-242, score-0.081]
77 To use these values in our segmentation algorithm, we need a way to bias these (k) estimates according to the local prior probabilities P [yi = 1|zi ], which are responsible for encouraging spatial coherence. [sent-243, score-0.588]
78 Let us assume that we know that the discriminative classifier was trained using mk samples from the k-th class. [sent-244, score-0.16]
79 It can thus be assumed that these posterior class probabilities (k) (k) verify P [yi = 1|xi ] ∝ mk p(xi |yi = 1). [sent-245, score-0.166]
80 It is then possible to “bias” these classifiers, with the (k) local prior probabilities P [yi = 1|zi ], simply by computing −1 K (k) (k) (j) (j) P [yi = 1|xi ] P [yi = 1|zi ] P [yi = 1|xi ] P [yi = 1|zi ] (k) P [yi = 1|xi ] = . [sent-246, score-0.149]
81 mk mj j=1 5 Experiments In this section we will show experimental results of image segmentation in supervised, unsupervised, semi-supervised, and discriminative modes. [sent-247, score-0.734]
82 Assessing the performance of a segmentation method is not a trivial task. [sent-248, score-0.432]
83 Moreover, the performance of segmentation algorithms depends more critically on the adopted features (which is not the focus of this paper) than on the spatial coherence prior. [sent-249, score-0.512]
84 For these reasons, we will not present any careful comparative study, but simply a set of experimental examples testifying for the promising behavior of the proposed approach. [sent-250, score-0.058]
85 1, illustrates the algorithm on a synthetic gray scale image with four Gaussian classes of means 1, 2, 3, and 4, and standard deviation 0. [sent-253, score-0.187]
86 For this image, both supervised and unsupervised segmentation lead to almost visually indistinguishable results, so we only show the supervised segmentation results. [sent-255, score-0.984]
87 For wavelet-based segmentation, we have used undecimated Haar wavelets and the Bayes-shrink denoising procedure [6]. [sent-257, score-0.147]
88 In the example in the first row, the goal is to segment the image into skin, cloth, and background regions; in the second example, the goal is to segment the horses from the background. [sent-263, score-0.283]
89 These examples show how the semi-supervised mode of our algorithm is able to segment the image into regions which “look like” the seed regions provided by the user. [sent-264, score-0.372]
90 Figure 2: From left to right (in each row): observed image with regions indicated by the user as belonging to each class, segmentation result, region boundaries. [sent-265, score-0.618]
91 3 Discriminative Texture Segmentation Finally, we illustrate the behavior of the algorithm when used with discriminative classifiers by applying it to texture segmentation. [sent-267, score-0.189]
92 We build on the work in [8], where SVM classifiers are used for texture classification (see [8] for complete details about the kernels and texture features used). [sent-268, score-0.199]
93 3 shows two experiments; one with a two-texture 256×512 image and the other with a 5-texture 256 × 256 image. [sent-270, score-0.187]
94 These examples show that our method is able to take class predictions produced by a classifier lacking any spatial prior and produce segmentations with a high degree of spatial coherence. [sent-279, score-0.245]
95 6 Conclusions We have introduced an approach to probabilistic semi-supervised clustering which is particularly suited for image segmentation. [sent-280, score-0.353]
96 The formulation allows supervised, unsupervised, semi-supervised, and discriminative modes, and can be used with classical standard image priors (such as Gaussian fields, or wavelet-based priors). [sent-281, score-0.542]
97 Unlike the usual Markov random field approaches, which involve combinatorial optimization, our segmentation algorithm consists of a simple generalized EM algorithm. [sent-282, score-0.488]
98 Ongoing work includes a thorough experimental comparison with state-of-the-art segmentation algorithms, namely, spectral methods [16] and techniques based on “graph-cuts” [19]. [sent-284, score-0.387]
99 e Figure 3: From left to right (in each row): observed image, direct SVM segmentation, segmentation produced by our algorithm. [sent-286, score-0.387]
100 “Analysis of multiresolution image denoising schemes using generalized - Gaussian and complexity priors,” IEEE Trans. [sent-386, score-0.383]
wordName wordTfidf (topN-words)
[('segmentation', 0.387), ('gem', 0.349), ('yi', 0.337), ('zi', 0.256), ('image', 0.187), ('ssc', 0.167), ('denoising', 0.147), ('priors', 0.141), ('wavelet', 0.137), ('znew', 0.134), ('mm', 0.118), ('discriminative', 0.107), ('ezi', 0.1), ('lexicographic', 0.1), ('uh', 0.1), ('clustering', 0.099), ('prior', 0.097), ('em', 0.097), ('xi', 0.089), ('vectorization', 0.087), ('texture', 0.082), ('grouping', 0.082), ('supervised', 0.076), ('logistic', 0.076), ('formulation', 0.074), ('zn', 0.071), ('dft', 0.067), ('log', 0.064), ('arg', 0.061), ('uz', 0.058), ('circulant', 0.058), ('instituto', 0.058), ('unsupervised', 0.058), ('transform', 0.057), ('mixture', 0.056), ('labels', 0.054), ('multinomial', 0.053), ('mk', 0.053), ('wishes', 0.053), ('decoupled', 0.053), ('probabilities', 0.052), ('combinatorial', 0.052), ('spatial', 0.052), ('eigenvalues', 0.05), ('mode', 0.049), ('generalized', 0.049), ('segment', 0.048), ('classi', 0.047), ('underlies', 0.047), ('indirectly', 0.047), ('law', 0.046), ('penalized', 0.046), ('trivial', 0.045), ('preferences', 0.044), ('rgb', 0.044), ('segmentations', 0.044), ('periodic', 0.044), ('cheng', 0.044), ('regions', 0.044), ('hessian', 0.043), ('label', 0.042), ('neighborhood', 0.042), ('max', 0.041), ('lr', 0.041), ('critically', 0.041), ('pn', 0.041), ('er', 0.041), ('relations', 0.04), ('missing', 0.039), ('orthogonal', 0.037), ('hard', 0.037), ('diego', 0.037), ('gaussian', 0.037), ('probabilistic', 0.036), ('modes', 0.036), ('complete', 0.035), ('svm', 0.035), ('inverse', 0.035), ('survey', 0.034), ('hidden', 0.033), ('suboptimal', 0.033), ('classical', 0.033), ('adopted', 0.032), ('verify', 0.032), ('generative', 0.032), ('coef', 0.032), ('suited', 0.031), ('expectations', 0.031), ('enforce', 0.03), ('matrix', 0.03), ('posterior', 0.029), ('update', 0.029), ('perspective', 0.029), ('promising', 0.029), ('encoding', 0.029), ('testifying', 0.029), ('possession', 0.029), ('moulin', 0.029), ('banerjee', 0.029), ('basu', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
2 0.20227721 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
Author: Oliver Williams
Abstract: This paper describes a Gaussian process framework for inferring pixel-wise disparity and bi-layer segmentation of a scene given a stereo pair of images. The Gaussian process covariance is parameterized by a foreground-backgroundocclusion segmentation label to model both smooth regions and discontinuities. As such, we call our model a switched Gaussian process. We propose a greedy incremental algorithm for adding observations from the data and assigning segmentation labels. Two observation schedules are proposed: the first treats scanlines as independent, the second uses an active learning criterion to select a sparse subset of points to measure. We show that this probabilistic framework has comparable performance to the state-of-the-art. 1
3 0.16770935 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
4 0.13313752 85 nips-2006-Geometric entropy minimization (GEM) for anomaly detection and localization
Author: Alfred O. Hero
Abstract: We introduce a novel adaptive non-parametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N → ∞ their span recovers the entropy minimizing set that supports at least ρ = K/N (100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α = 1 − ρ. A method for implementing this non-parametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces. 1
5 0.13220705 153 nips-2006-Online Clustering of Moving Hyperplanes
Author: René Vidal
Abstract: We propose a recursive algorithm for clustering trajectories lying in multiple moving hyperplanes. Starting from a given or random initial condition, we use normalized gradient descent to update the coefficients of a time varying polynomial whose degree is the number of hyperplanes and whose derivatives at a trajectory give an estimate of the vector normal to the hyperplane containing that trajectory. As time proceeds, the estimates of the hyperplane normals are shown to track their true values in a stable fashion. The segmentation of the trajectories is then obtained by clustering their associated normal vectors. The final result is a simple recursive algorithm for segmenting a variable number of moving hyperplanes. We test our algorithm on the segmentation of dynamic scenes containing rigid motions and dynamic textures, e.g., a bird floating on water. Our method not only segments the bird motion from the surrounding water motion, but also determines patterns of motion in the scene (e.g., periodic motion) directly from the temporal evolution of the estimated polynomial coefficients. Our experiments also show that our method can deal with appearing and disappearing motions in the scene.
6 0.1320999 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
7 0.1261733 128 nips-2006-Manifold Denoising
8 0.12409122 80 nips-2006-Fundamental Limitations of Spectral Clustering
9 0.11461881 45 nips-2006-Blind Motion Deblurring Using Image Statistics
10 0.11341407 186 nips-2006-Support Vector Machines on a Budget
11 0.11182925 130 nips-2006-Max-margin classification of incomplete data
12 0.10586496 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
13 0.10571314 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
14 0.099202745 7 nips-2006-A Local Learning Approach for Clustering
15 0.098158836 126 nips-2006-Logistic Regression for Single Trial EEG Classification
16 0.09647271 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
17 0.095324211 42 nips-2006-Bayesian Image Super-resolution, Continued
18 0.089104213 20 nips-2006-Active learning for misspecified generalized linear models
19 0.088366859 175 nips-2006-Simplifying Mixture Models through Function Approximation
20 0.084484823 122 nips-2006-Learning to parse images of articulated bodies
topicId topicWeight
[(0, -0.303), (1, 0.096), (2, 0.128), (3, 0.041), (4, -0.004), (5, 0.021), (6, -0.071), (7, -0.044), (8, 0.053), (9, 0.114), (10, 0.025), (11, -0.007), (12, -0.028), (13, 0.14), (14, -0.01), (15, 0.268), (16, 0.175), (17, -0.089), (18, 0.082), (19, 0.152), (20, -0.04), (21, 0.003), (22, 0.11), (23, 0.008), (24, 0.148), (25, -0.083), (26, 0.017), (27, 0.045), (28, -0.081), (29, 0.131), (30, -0.114), (31, 0.023), (32, -0.038), (33, -0.079), (34, 0.043), (35, 0.086), (36, -0.001), (37, 0.008), (38, 0.014), (39, -0.048), (40, -0.011), (41, 0.016), (42, -0.08), (43, 0.073), (44, 0.052), (45, 0.06), (46, -0.068), (47, 0.027), (48, -0.033), (49, 0.079)]
simIndex simValue paperId paperTitle
same-paper 1 0.96706188 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
2 0.70500523 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
3 0.69560379 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
Author: Siwei Lyu, Eero P. Simoncelli
Abstract: The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures (GSMs). Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. Many successful methods in image processing and computer vision rely on statistical models for images, and it is thus of continuing interest to develop improved models, both in terms of their ability to precisely capture image structures, and in terms of their tractability when used in applications. Constructing such a model is difficult, primarily because of the intrinsic high dimensionality of the space of images. Two simplifying assumptions are usually made to reduce model complexity. The first is Markovianity: the density of a pixel conditioned on a small neighborhood, is assumed to be independent from the rest of the image. The second assumption is homogeneity: the local density is assumed to be independent of its absolute position within the image. The set of models satisfying both of these assumptions constitute the class of homogeneous Markov random fields (hMRFs). Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as “wavelets”) have revealed striking nonGaussian regularities and inter and intra-subband dependencies. For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. A number of authors have developed non-Gaussian MRF models based on this sort of local description [6, 7, 8], among which the recently developed fields of experts model [7] has demonstrated impressive performance in denoising (albeit at an extremely high computational cost in learning model parameters). An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. Local GSM-based methods represent the current state-of-the-art in image denoising [12]. The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e.g., 13, 14]. In order to maintain tractability, these models are arranged such that coefficients are grouped in non-overlapping clusters, allowing a graphical probability model with no loops. Despite their global consistency, the artificially imposed cluster boundaries lead to substantial artifacts in applications such as denoising. In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d √ d independent random variable z, as x = zu, where = denotes equality in distribution. The density of x is determined by the covariance of the Gaussian vector, Σ, and the density of the multiplier, p z (z), through the integral Nx (0, zΣ)pz (z)dz ∝ p(x) = z z xT Σ−1 x 1 pz (z)dz. exp − √ 2z z|Σ| (1) A key property of GSMs is that when z determines the scale of the conditional variance of x given z, which is a Gaussian variable with zero mean and covariance zΣ. In addition, the normalized variable √ x z is a zero mean Gaussian with covariance matrix Σ. The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). Formally, we define a FoGSM x as the product of two mutually independent MRFs, √ d x = u ⊗ z, (2) where u is a zero-mean hGMRF, and z is a field of positive multipliers that control the local coefficient variances. The operator ⊗ denotes element-wise multiplication, and the square root operation is applied to each component. Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. Analogous to the local GSM, when conditioned on z, x is an inhomogeneous GMRF p(x|z) ∝ √ 1 |Qu | exp − xT D z zi 2 i −1 Qu D √ z −1 x = 1 |Qu | exp − (x zi 2 i √ T z) Qu (x √ z) , (3) where Qu is the inverse covariance matrix of u (also known as the precision matrix), and D(·) denotes the operator that form a diagonal matrix from an input vector. Note also that the element√ wise division of the two fields, x z, yields a hGMRF with precision matrix Q u . To complete the FoGSM model, we need to specify the structure of the multiplier field z. For tractability, we use another hGMRF as a substrate, and map it into positive values by exponentiation, x u log z Fig. 1. Decomposition of a subband from image “boat” (left) into the normalized subband u (middle) and the multiplier field z (right, in the logarithm domain). Each image is rescaled individually to fill the full range of grayscale intensities. as was done in [10]. To be more specific, we model log(z) as a hGMRF with mean µ and precision matrix Qz , where the log operator is applied element-wise, from which the density of z follows as: pz (z) ∝ |Qz | 1 exp − (log z − µ)T Qz (log z − µ) . zi 2 i (4) This is a natural extension of the univariate lognormal prior used previously for the scalar multiplier in the local GSM model [12]. The restriction to hGMRFs greatly simplifies computation with FoGSM. Particularly, we take advantage of the fact that a 2D hGMRF with circular boundary handling has a sparse block-circulant precision matrix with a generating kernel θ specifying its nonzero elements. A block-circulant matrix is diagonalized by the Fourier transform, and its multiplication with a vector corresponds to convolution with the kernel θ. The diagonalizability with a fixed and efficiently computed transform makes the parameter estimation, sampling, and inference with a hGMRF substantially more tractable than with a general MRF. Readers are referred to [15] for a detailed description of hGMRFs. Parameter estimation: The estimation of the latent multiplier field z and the model parameters (µ, Qz , Qu ) may be achieved by maximizing log p(x, z; Q u, Qz , µ) with an iterative coordinate-ascent method, which is guaranteed to converge. Specifically, based on the statistical dependency structures in the FoGSM model, the following three steps are repeated until convergence: (i) z(t+1) = argmaxz log p(x|z; Q(t) ) + log p(z; Q(t) , µ(t) ) u z (ii) Q(t+1) = argmaxQu log p(x|z(t+1); Qu ) u (iii) (Q(t+1) , µ(t+1) ) = argmaxQz ,µ log p(z(t+1) ; Qz , µ) z (5) According to the FoGSM model structure, steps (ii) and (iii) correspond to maximum likelihood √ z(t+1) and log z(t+1) , respectively. Because of this, estimates of the parameters of hGMRFs, x both steps may be efficiently implemented by exploiting the diagonalization of the precision matrices with 2D Fourier transforms [15]. Step (i) in (5) may be implemented with conjugate gradient ascent [16]. To simplify description and computation, we introduce a new variable for the element-wise inverse square root of the multiplier: √ s=1 z. The likelihood in (3) is then changed to: p(x|s) ∝ i 1 si exp − (x ⊗ s)T Qu (x ⊗ s) = 2 i 1 si exp − sT D(x)Qu D(x)s . 2 (6) The joint density of s is obtained from (4), using the relations between densities of transformed variables, as 1 1 exp − (2 log s + µ)T Qz (2 log s + µ) . (7) p(s) ∝ si 2 i ˆ Combining . (6) and (7), step (i) in (5) is equivalent to computing s = argmaxs log p(x|s; Qu ) + log p(s; Qz , µ), which is further simplified into: argmin s 1 T 1 s D (x) Qu D (x) s + (2 log s + µ)T Qz (2 log s + µ) . 2 2 (8) boat house peppers x x x x log p(x) Barbara Fig. 2. Empirical marginal log distributions of coefficients from a multi-scale decomposition of photographic images (blue dot-dashed line), synthesized FoGSM samples from the same subband (red solid line), and a Gaussian with the same standard deviation (red dashed line). ˆ ˆ and the optimal z is then recovered as z = 1 (ˆ ⊗ s). We the optimize (8) with conjugate gradient s ˆ ascent [16]. Specifically, the negative gradient of the objective function in (8) with respect to s is − ∂ log p(x|s)p(s) ∂s = D (x) Qu D (x) s + 2 D(s)−1 Qz (2 log s + µ) = x ⊗ (θu (x ⊗ s)) + 2(θz (2 log s + µ)) s, and the multiplication of any vector h with the Hessian matrix can be computed as: ∂2 log p(x|s)p(s) h = x ⊗ (θu (x ⊗ h)) + 4 (θz (h s)) s − 2 θz (log s + µ) ⊗ h (s ⊗ s). ∂s2 Both operations can be expressed entirely in terms of element-wise operations ( and ⊗) and 2D convolutions ( ) with the generating kernels of the two precision matrices θ u and θz , which allows for efficient implementation. 3 Modeling photographic images We have applied the FoGSM model to subbands of a multi-scale image representation known as a steerable pyramid [17]. This decomposition is a tight frame, constructed from oriented multiscale derivative operators, and is overcomplete by a factor of 4K/3, where K is the number of orientation bands. Note that the marginal and joint statistics we describe are not specific to this decomposition, and are similar for other multi-scale oriented representations. We fit a FoGSM model to each subband of a decomposed photographic image, using the algorithms described in the previous section. For precision matrices Q u and Qz , we assumed a 5 × 5 Markov neighborhood (corresponding to a 5 × 5 convolution kernel), which was loosely chosen to optimize the tradeoff between accuracy and overfitting. Figure 1 shows the result of fitting a FoGSM model to an example subband from the “boat” image (left panel). The subband is decomposed into the product of the u field (middle panel) and the z field (right panel, in the logarithm domain), along with model parameters Q u , µ and Qz (not shown). Visually, the changing spatial variances are represented in the estimated log z field, and the estimated u is much more homogeneous than the original subband and has a marginal distribution close to Gaussian.1 However, the log z field still has a non-Gaussian marginal distribution and is spatially inhomogeneous, suggesting limitations of FoGSM for modeling photographic image wavelet coefficients (see Discussion). The statistical dependencies captured by the FoGSM model can be further revealed by examining marginal and joint statistics of samples synthesized with the estimated model parameters. A sample √ from FoGSM can be formed by multiplying samples of u and z. The former is obtained by sampling from hGMRF u, and the latter is obtained from the element-wise exponentiation followed by a element-wise square root operation of a sample of hGMRF log z. This procedure is again efficient for FoGSM due to the use of hGMRFs as building blocks [15]. Marginal distributions: We start by comparing the marginal distributions of the samples and the original subband. Figure 2 shows empirical histograms in the log domain of a particular subband 1 This ”Gaussianizing” behavior was first noted in photographic images by Ruderman [18], who observed that image derivative measurements that were normalized by a local estimate of their standard deviation had approximately Gaussian marginal distributions. close ∆ = 1 near ∆ = 4 far ∆ = 32 orientation scale real sim real sim Fig. 3. Examples of empirically observed distributions of wavelet coefficient pairs, compared with distributions from synthesized samples with the FoGSM model. See text for details. from four different photographic images (blue dot-dashed line), and those of the synthesized samples of FoGSM models learned from each corresponding subband (red solid line). For comparison, a Gaussian with the same standard deviation as the image subband is also displayed (red dashed line). Note that the synthesized samples have conspicuous non-Gaussian characteristics similar to the real subbands, exemplified by the high peak and heavy tails in the marginal distributions. On the other hand, they are typically less kurtotic than the real subbands. We believe this arises from the imprecise Gaussian approximation of log z (see Discussion). Joint distributions: In addition to one-dimensional marginal statistics, the FoGSM model is capable of capturing the joint behavior of wavelet coefficients. As described in [4, 9], wavelet coefficients of photographic images present non-Gaussian dependencies. Shown in the first and the third rows in Fig. 3 are empirical joint and conditional histograms for one subband of the “boat” image, for five pairs of coefficients, corresponding to basis functions with spatial separations of ∆ = {1, 4, 32} samples, two orthogonal orientations and two adjacent scales. Contour lines in the joint histogram are drawn at equal intervals of log probability. Intensities in the conditional histograms correspond to probability, except that each column is independently rescaled to fill the full range of intensity. For a pair of adjacent coefficients, we observe an elliptical joint distribution and a “bow-tie” shaped conditional distribution. The latter is indicative of strong non-Gaussian dependencies. For coefficients that are distant, the dependency becomes weaker and the corresponding joint and conditional histograms become more separable, as would be expected for two independent random variables. Random samples drawn from a FoGSM model, with parameters fitted to the corresponding subband, have statistical characteristics consistent with the general description of wavelet coefficients of photographic images. Shown in the second and the fourth rows of Fig. 3 are the joint and conditional histograms of synthesized samples from the FoGSM model estimated from the same subband as in the first and the third rows. Note that the joint and conditional histograms of the synthesized samples have similar transition of spatial dependencies as the separation increases (column 1,2 and 3), suggesting that the FoGSM accounts well for pairwise joint dependencies of coefficients over a full range of spatial separations. On the other hand, the dependencies between subbands of different orientations and scales are not properly modeled by FoGSM (column 4 and 5). This is especially true for subbands at different scales, which exhibit strong dependencies. The current FoGSM model original image noisy image (σ = 50) (PSNR = 14.15dB) GSM-BLS (PSNR = 26.34dB) FoGSM (PSNR = 27.01dB) Fig. 4. Denoising results using local GSM [12] and FoGSM. Performances are evaluated in peaksignal-to-noise-ratio (PSNR), 20 log10 (255/σe ), where σe is the standard deviation of the error. does not exhibit those dependencies as only spatial neighbors are used to make use the 2D hGMRFs (see Discussion). 4 Application to image denoising Let y = x+w be a wavelet subband of an image that has been corrupted with white Gaussian noise of known variance. In an overcomplete wavelet domain such as steerable pyramid, the white Gaussian noise is transformed into correlated Gaussian noise w ∼ N w (0, Σw ), whose covariance Σ w can be derived from the basis functions of the pyramid transform. With FoGSM as prior over x, commonly used denoising methods involve expensive high-dimensional integration: for instance, maximum ˆ a posterior estimate, x MAP = argmaxx log p(x|y), requires a high-dimensional integral over z, and ˆ the Bayesian least square estimation, x BLS = E(x|y) requires a double high-dimensional integral over x and z. Although it is possible to optimize with these criteria using Monte-Carlo Markov sampling or other approximations, we instead develop a more efficient deterministic algorithm that takes advantage of the hGMRF structure in the FoGSM model. Specifically, we compute (ˆ , z, Qu , Qz , µ) = argmaxx,z,Qu ,Qz ,µ log p(x, z|y; Q u, Qz , µ) x ˆ ˆ ˆ ˆ (9) ˆ and take x as the optimal denoised subband. Note that the model parameters are learned within the inference process rather than in a separate parameter learning step. This strategy, known as partial optimal solution [19], greatly reduces the computational complexity. We optimize (9) with coordinate ascent, iterating between maximizing each of (x, z, Q u , Qz , µ) while fixing the others. With fixed estimates of (z, Q u , Qz , µ), the optimization of x is argmaxx log p(x, z|y; Q u, Qz , µ) = argmaxx log p(x|z, y; Q u, Qz , µ) + log p(z|y; Q u, Qz , µ) , which reduces to argmax x log p(x|z, y; Q u), with the second term independent of x and can be dropped from optimization. Given the Gaussian structure of x given z, this step is then equivalent to a Wiener filter (linear in y). Fixing (x, Q u , Qz , µ), the optimization of z is argmaxz log p(x, z|y; Q u, Qz , µ)= argmaxz log p(y|x, z; Q u)+log p(x, z; Q u, Qz , µ)−log p(y; Qu , Qz , µ) , which is further reduced to argmax z log p(x, z; Qu , Qz , µ). Here, the first term was dropped since y is independent of z when conditioned on x. The last term was also dropped since it is also independent of z. Therefore, optimizing z given (x, Q u , Qz , µ) is equivalent to the first step of the algorithm in section 2, which can be implemented with efficient gradient descent. Finally, given (x, z), the FoGSM √ z(t+1) and log z(t+1) , similar to the model parameters (Q u , Qz , µ) are estimated from hGMRFs x second and third step in the algorithm of section 2. However, to reduce the overall computation time, instead of a complete maximum likelihood estimation, these parameters are estimated with a maximum pseudo-likelihood procedure [20], which finds the parameters maximizing the product of all conditional distributions (which are 1D Gaussians in the GMRF case), followed by a projection to the subspace of FoGSM parameters that results in positive definite precision matrices. We tested this denoising method on a standard set of test images [12]. The noise corrupted images were first decomposed these into a steerable pyramid with multiple levels (5 levels for a 512 × 512 image and 4 levels for a 256 × 256 image ) and 8 orientations. We assumed a FoGSM model for each subband, with a 5 × 5 neighborhood for field u and a 1 × 1 neighborhood for field log z. These sizes were chosen to provide a reasonable combination of performance and computational efficiency. We then estimate the optimal x with the algorithm described previously, with the initial values of x and z computed from subband denoised with the local GSM model [12]. Shown in Fig. 4 is an example of denoising the “boat” image corrupted with simulated additive white Gaussian noise of strength σ = 50, corresponding to a peak-signal-to-noise-ratio (PSNR), of 14.15 dB. We compare this with the local GSM method in [12], which, assuming a local GSM model for the neighborhood consisting of 3 × 3 spatial neighbors plus parent in the next coarsest scale, computes a Bayes least squares estimate of each coefficient conditioned on its surrounding neighborhood. The FoGSM denoising achieves substantial improvement (+0.68 in PSNR) and is seen to exhibit better contrast and continuation of oriented features (see Fig. 4). On the other hand, FoGSM introduces some noticeable artifacts in low contrast areas, which is caused by numerical instability at locations with small z. We find that the improvement in terms of PSNR is consistent across photographic images and noise levels, as reported in Table 1. But even with a restricted neighborhood for the multiplier field, this PSNR improvement does come at a substantial computational cost. As a rough indication, running on a PowerPC G5 workstation with 2.3 Ghz processor and 16 Gb RAM memory, using unoptimized MATLAB (version R14) code, denoising a 512 × 512 image takes on average 4.5 hours (results averaging over 5 images), and denoising a 256×256 image takes on average 2.4 hours (result averaging over 2 images), to a convergence precision producing the reported results. Our preliminary investigation indicates that the slow running time is mainly due to the nature of coordinate ascent and the landscape of (9), which requires many iterations to converge. 5 Discussion We have introduced fields of Gaussian scale mixtures as a flexible and efficient tool for modeling the statistics of wavelet coefficients of photographic images. We developed a feasible (although admittedly computationally costly) parameter estimation method, and showed that samples synthesized from the fitted FoGSM model are able to capture structures in the marginal and joint wavelet statistics of photographic images. Preliminary results of applying FoGSM to image denoising indicate substantial improvements over the state-of-the-art methods based on the local GSM model. Although FoGSM has a structure that is similar to the local scale mixture model [9, 10], there is a fundamental difference between them. In FoGSM, hGMRF structures are enforced in u and log z, while the local scale mixture models impose minimal statistical structure on these variables. Because of this, our model easily extends to images of arbitrary size, while the local scale mixture models are essentially confined to describing small image patches (the curse of dimensionality, and the increase in computational cost prevent one from scaling the patch size up). On the other hand, the close relation to Gaussian MRF makes the analysis and computation of FoGSM significantly easier than other non-Gaussian MRF based image models [6, 7, 5]. We envision, and are currently working on, a number of model improvements. First, the model should benefit from the introduction of more general Markov neighborhoods, including wavelet coefficients from subbands at other scales and orientations [4, 12], since the current model is clearly not accounting for these dependencies (see Fig. 3). Secondly, the log transformation used to derive the multiplier field from a hGMRF is somewhat ad hoc, and we believe that substitution of another nonlinear transformation (e.g., a power law [14]) might lead to a more accurate description of the image statistics. Thirdly, the current denoising method estimates model parameter during the process of denoising, which produces image adaptive model parameters. We are exploring the possibility of using a set of generic model parameter learned a priori on a large set of photographic images, so that a generic statistical model for all photographic images based on FoGSM can be built. Finally, there exist residual inhomogeneous structures in the log z field (see Fig. 1) that can likely be captured by explicitly incorporating local orientation [21] or phase into the model. Finding tractable models and algorithms for handling such circular variables is challenging, but we believe their inclusion will result in substantial improvements in modeling and in denoising performance. σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 Barbara 35.01 (34.01) 30.10 (29.07) 26.40 (25.45) 23.01 (22.61) Flintstones 32.47 (31.78) 28.29 (27.48) 24.82 (24.02) 21.24 (20.49) barco 35.05 (34.42) 30.44 (29.73) 27.36 (26.63) 24.44 (23.84) house 35.63 (35.27) 31.64 (31.32) 28.51 (28.23) 25.33 (25.31) boat 34.12 (33.58) 30.03 (29.34) 27.01 (26.35) 24.20 (23.79) Lena 35.94 (35.60) 32.11 (31.70) 29.12 (28.62) 26.12 (25.77) fingerprint 33.28 (32.45) 28.45 (27.44) 25.11 (24.13) 21.78 (21.21) peppers 34.38 (33.73) 29.78 (29.18) 26.43 (25.93) 23.17 (22.80) Table 1. Denoising results with FoGSM on different images and different noise levels. Shown in the table are PSNRs (20 log10 (255/σe ), where σe is the standard deviation of the error) of the denoised images, and in the parenthesis are the PSNRs of the same images denoised with a local GSM model [12]. References [1] P. J. Burt. Fast filter transforms for image processing. Comp. Graph. Image Proc., 16:20–51, 1981. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am., 4(12):2379–2394, 1987. [3] B. Wegmann and C. Zetzsche. Statistical dependencies between orientation filter outputs used in human vision based image code. In Proc. Visual Comm. and Image Proc., volume 1360, pages 909–922, 1990. [4] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. on Image Proc., 8(12):1688–1701, 1999. [5] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton. Energy-based models for sparse overcomplete representations. J. of Machine Learning Res., 4:1235–1260, 2003. [6] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. Int’l. J. Comp. Vis., 27(2):107–126, 1998. [7] S. Roth and M. J. Black. Fields of experts: a framework for learning image priors. In IEEE Conf. on Comp. Vis. and Pat. Rec., volume 2, pages 860–867, 2005. [8] P. Gehler and M. Welling. Products of ”edge-perts”. In Adv. in Neural Info. Proc. Systems (NIPS*05). MIT Press, 2006. [9] M. J. Wainwright and E. P. Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In Adv. Neural Info. Proc. Sys. (NIPS*99), volume 12, pages 855–861, May 2000. [10] Y. Karklin and M. S. Lewicki. A hierarchical Bayesian model for learning non-linear statistical regularities in non-stationary natural signals. Neural Computation, 17(2):397–423, 2005. [11] A. Hyv¨ rinen, P. O. Hoyer, and M. Inki. Topographic ICA as a model of natural image statistics. In the a First IEEE Int’l. Workshop on Bio. Motivated Comp. Vis., London, UK, 2000. [12] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. on Image Proc., 12(11):1338–1351, 2003. [13] J. Romberg, H. Choi, and R. G. Baraniuk. Bayesian tree-structured image modeling using wavelet domain hidden Markov models. IEEE Trans. on Image Proc., 10(7):303–347, 2001. [14] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl. and Comp. Harm. Ana., 11(1):89–123, 2001. [15] H. Rue and L. Held. Gaussian Markov Random Fields: Theory And Applications. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC, 2005. [16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. Cambridge, 2nd edition, 2002. [17] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In IEEE Int’l. Conf. on Image Proc., volume 3, pages 444–447, 1995. [18] D. Ruderman. The statistics of natural images. Network : Comp. in Neural Sys., 5:598–605, 1994. [19] M. Figueiredo and J. Leit¨ o. Unsupervised image restoration and edge location using compound Gaussa Markov random fields and MDL principle. IEEE Trans. on Image Proc., 6(8):1089–1122, 1997. [20] J. Besag. On the statistical analysis of dirty pictures. J. of the Royal Stat. Soc., Series B, 48:259–302, 1986. [21] D. K. Hammond and E. P. Simoncelli. Image denoising with an orientation-adaptive Gaussian scale mixture model. In Proc. 13th IEEE Int’l. Conf. on Image Proc., pages 1433–1436, October 2006.
4 0.67092073 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
Author: Oliver Williams
Abstract: This paper describes a Gaussian process framework for inferring pixel-wise disparity and bi-layer segmentation of a scene given a stereo pair of images. The Gaussian process covariance is parameterized by a foreground-backgroundocclusion segmentation label to model both smooth regions and discontinuities. As such, we call our model a switched Gaussian process. We propose a greedy incremental algorithm for adding observations from the data and assigning segmentation labels. Two observation schedules are proposed: the first treats scanlines as independent, the second uses an active learning criterion to select a sparse subset of points to measure. We show that this probabilistic framework has comparable performance to the state-of-the-art. 1
5 0.60076666 153 nips-2006-Online Clustering of Moving Hyperplanes
Author: René Vidal
Abstract: We propose a recursive algorithm for clustering trajectories lying in multiple moving hyperplanes. Starting from a given or random initial condition, we use normalized gradient descent to update the coefficients of a time varying polynomial whose degree is the number of hyperplanes and whose derivatives at a trajectory give an estimate of the vector normal to the hyperplane containing that trajectory. As time proceeds, the estimates of the hyperplane normals are shown to track their true values in a stable fashion. The segmentation of the trajectories is then obtained by clustering their associated normal vectors. The final result is a simple recursive algorithm for segmenting a variable number of moving hyperplanes. We test our algorithm on the segmentation of dynamic scenes containing rigid motions and dynamic textures, e.g., a bird floating on water. Our method not only segments the bird motion from the surrounding water motion, but also determines patterns of motion in the scene (e.g., periodic motion) directly from the temporal evolution of the estimated polynomial coefficients. Our experiments also show that our method can deal with appearing and disappearing motions in the scene.
6 0.53245223 174 nips-2006-Similarity by Composition
7 0.5315693 45 nips-2006-Blind Motion Deblurring Using Image Statistics
8 0.52130723 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
9 0.51622283 42 nips-2006-Bayesian Image Super-resolution, Continued
10 0.50471288 52 nips-2006-Clustering appearance and shape by learning jigsaws
11 0.50200218 85 nips-2006-Geometric entropy minimization (GEM) for anomaly detection and localization
12 0.48717186 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
13 0.48423427 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
14 0.44685906 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis
15 0.43826067 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
16 0.43500146 122 nips-2006-Learning to parse images of articulated bodies
17 0.42537832 130 nips-2006-Max-margin classification of incomplete data
18 0.42517084 7 nips-2006-A Local Learning Approach for Clustering
19 0.4206174 129 nips-2006-Map-Reduce for Machine Learning on Multicore
20 0.41895682 186 nips-2006-Support Vector Machines on a Budget
topicId topicWeight
[(1, 0.142), (3, 0.02), (7, 0.083), (8, 0.032), (9, 0.058), (19, 0.077), (20, 0.046), (22, 0.071), (25, 0.011), (40, 0.071), (44, 0.068), (57, 0.089), (65, 0.044), (67, 0.012), (69, 0.05), (71, 0.01), (90, 0.013), (93, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.92964131 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
2 0.87618512 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf
Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1
3 0.873456 65 nips-2006-Denoising and Dimension Reduction in Feature Space
Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1
4 0.86601734 167 nips-2006-Recursive ICA
Author: Honghao Shan, Lingyun Zhang, Garrison W. Cottrell
Abstract: Independent Component Analysis (ICA) is a popular method for extracting independent features from visual data. However, as a fundamentally linear technique, there is always nonlinear residual redundancy that is not captured by ICA. Hence there have been many attempts to try to create a hierarchical version of ICA, but so far none of the approaches have a natural way to apply them more than once. Here we show that there is a relatively simple technique that transforms the absolute values of the outputs of a previous application of ICA into a normal distribution, to which ICA maybe applied again. This results in a recursive ICA algorithm that may be applied any number of times in order to extract higher order structure from previous layers. 1
5 0.8650279 158 nips-2006-PG-means: learning the number of clusters in data
Author: Yu Feng, Greg Hamerly
Abstract: We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods. 1
6 0.86465085 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
7 0.86142963 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
8 0.86120838 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
9 0.85687101 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
11 0.85641599 175 nips-2006-Simplifying Mixture Models through Function Approximation
12 0.85549808 138 nips-2006-Multi-Task Feature Learning
13 0.8553555 20 nips-2006-Active learning for misspecified generalized linear models
14 0.85474426 169 nips-2006-Relational Learning with Gaussian Processes
15 0.85460609 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
16 0.85189617 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
17 0.85009021 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
18 0.84999031 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
19 0.84966248 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
20 0.84964603 117 nips-2006-Learning on Graph with Laplacian Regularization