nips nips2003 nips2003-88 knowledge-graph by maker-knowledge-mining

88 nips-2003-Image Reconstruction by Linear Programming


Source: pdf

Author: Koji Tsuda, Gunnar Rätsch

Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. [sent-7, score-0.905]

2 However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. [sent-8, score-0.89]

3 We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. [sent-9, score-0.765]

4 The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. [sent-10, score-0.592]

5 Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. [sent-11, score-0.369]

6 Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e. [sent-12, score-0.191]

7 The basic idea is to penalize boundary points and interior points of the occluded area differently. [sent-15, score-0.306]

8 1 Introduction Image denoising is an important subfield of computer vision, which has extensively been studied (e. [sent-18, score-0.307]

9 The aim of image denoising is to restore the image corrupted by noise as close as possible to the original one. [sent-21, score-0.784]

10 When one does not have any prior knowledge about the distribution of images, the image is often denoised by simple smoothing (e. [sent-22, score-0.285]

11 When one has a set of template images, it is preferable to project the noisy image to the linear manifold made by PCA, which is schematically illustrated in Fig. [sent-25, score-0.417]

12 The projection amounts to finding the closest point in the manifold according to some distance. [sent-28, score-0.269]

13 the least squares projection), one can adopt a robust loss such as Huber’s loss as the distance, which often gives a better result (robust projection [9]). [sent-31, score-0.52]

14 However, a major drawback of these projection approaches is that all pixels are updated by the projection. [sent-32, score-0.556]

15 However, typically only a few pixels are corrupted by noise, thus non-noise pixels should best be left untouched. [sent-33, score-0.763]

16 This paper proposes a new denoising approach by linear programming, where the 1 -norm regularizer is adopted for automatic identification of noisy pixels – only these are updated. [sent-34, score-0.816]

17 The identification and updating of noisy pixels are neatly formulated as one linear program. [sent-35, score-0.477]

18 Also, the optimality con- ||α||1 < C Least Squares Projection Robust projection Off-manifold Solution On-Manifold Solution Figure 1: Difference between projection methods (left) and our LP method (right). [sent-38, score-0.318]

19 In particular, we can explicitly specify the fraction of noisy pixels by means of the ν-trick originally developed for SVMs [8] which was later applied to Boosting [7]. [sent-40, score-0.468]

20 In some cases the noisy pixels are not scattered over the image (“impulse noise”), but form a considerably large connected region (“block noise”), e. [sent-41, score-0.638]

21 By using the prior knowledge that the noisy pixels form blocks, we should be able to improve the denoising performance. [sent-44, score-0.764]

22 We will show that a very simple modification of the linear program has the effect that we can control how blockshape like the identified and reconstructed region is. [sent-48, score-0.258]

23 In the experimental section we will show impressive results on face images from the MPI face data base corrupted by impulse and block noises. [sent-49, score-0.762]

24 The linear manifold of admissible images is described as J T = t|t= j=1 βj tj , βj ∈ Now we would like to denoise a noisy image x ∈ N . [sent-51, score-0.639]

25 In order that the denoised image x is similar to admissible images, x should be close to the manifold: J ¯ βj t j ≤ 1 , (1) min d1 x, j=1 β ¯ where d1 is a distance between two images. [sent-53, score-0.361]

26 Also, we have to constrain x to be close to x, otherwise the denoised image becomes completely independent from the original image: x d2 (¯ , x) ≤ 2, (2) where d2 is another distance. [sent-54, score-0.29]

27 A number of denoising methods can be produced by choosing different distances and changing how to minimize the two competing objectives (1) and (2). [sent-55, score-0.307]

28 In projection methods, 1 is simply set to zero and 2 is minimized with d2 being set to the Euclidean distance or a robust loss. [sent-56, score-0.243]

29 A Linear Programming Formulation Our wish is that most pixels of x stay unchanged ¯ ¯ in x, in other words, the difference vector α = x − x should be sparse. [sent-57, score-0.333]

30 It is equivalent to N 1 min |αn | + ν n=1 α,β, N xn + αn − J j=1 βj tjn ≤ , (5) n = 1, . [sent-73, score-0.321]

31 n n Then (5) is rewritten as the following linear programming problem: N 1 (α+ + α− ) + ν (6) min n n n=1 N α± ,β, α+ , α− ≥ 0, n n xn + α+ − α− − n n J j=1 βj tjn ≤ , n = 1, . [sent-82, score-0.479]

32 n n The ν-Trick In the above optimization problem, the regularization constant ν should be determined to control the fraction of updated pixels. [sent-87, score-0.192]

33 If one of these pixels is modified, then it will likely lead to a different solution, while changing any of the other N − N p − Nc pixels locally does not change the optimal solution. [sent-90, score-0.666]

34 In terms of images, one can bound the anticipated fraction of noise pixels by ν. [sent-104, score-0.482]

35 3 Dealing with Block Noises Preliminaries When noises are clustered as blocks, this prior knowledge is considered to lead to an increased denoising performance. [sent-106, score-0.552]

36 So far we could only control the number of modified pixels which corresponds to the area of reconstruction. [sent-107, score-0.428]

37 In the first case (left) the occlusion forms a block, in the second case the letters “lp” and in the third case the pixels are randomly distributed. [sent-111, score-0.461]

38 We will now define two measures of how much an occlusion pattern mismatches the block shape. [sent-114, score-0.347]

39 ) We distinguish between two types of penalties: first, the ones which occur when a reconstructed pixel is a neighbor of an untouched pixel (“boundary point”) and second, if a reconstructed pixel is neighbor of another such pixel, but the corrections are in different directions (“inversion point”). [sent-118, score-0.443]

40 Let N i− be the number of pixels n with αn αm < 0 for at least one m ∈ G(n) and α n αm ≤ 0 for all m ∈ G(n) − (single inversion point). [sent-121, score-0.439]

41 + • Let Nb be the number of pixels n which satisfy: (a) α n = 0 and there exists m ∈ G(n) such that αm = 0 (outer boundary point) or (b) α n = 0 and there exists m ∈ G(n) with αm = 0 (inner boundary point). [sent-123, score-0.493]

42 Let N i+ be the number of pixels n with αn αm < 0 for at least one m ∈ G(n) (inversion point). [sent-124, score-0.389]

43 The main difference between the two scores is that S + counts the length of the inner and outer boundary, while S − only counts the outer boundary. [sent-126, score-0.223]

44 We control the contribution of the γ’s with the one of the α’s by introducing a new parameter λ ∈ (0, 1) – if λ = 0, then the original LP is recovered: min γ≥0,α, ≥0,β λ N N n=1 γn + xn + αn − 1−λ N J j=1 |αn − αm | ≤ γn N n=1 βj tj,n ≤ |αn | + ν (8) for all n = 1, . [sent-130, score-0.185]

45 , N for all m ∈ G(n) We will show in the experimental part that these novel constraints lead to substantial improvements for block noises. [sent-133, score-0.27]

46 The analysis of this linear program is considerably more difficult than of the previous one. [sent-134, score-0.157]

47 Let Nc the number of crucial pixels and N p the number of updated pixels (as before). [sent-137, score-0.747]

48 The λ-weighted average between area of the occlusion and score S − is not greater than νN , i. [sent-140, score-0.27]

49 If λ < 2+|G| , then the λ-weighted average between area of the occlusion and score S + is not smaller than νN minus 2N c , i. [sent-143, score-0.236]

50 For example, when the squared loss is adopted as d 1 , the optimization problem (3) is rewritten as 2 N J 1 xn + αn − βj tjn + ν|αn |. [sent-157, score-0.437]

51 (11) min n=1 j=1 α,β N This is a quadratic program (QP), which can also be solved by standard algorithms. [sent-158, score-0.166]

52 In our experience, QP takes longer time to solve than LP and the denoising performance is more or less the same. [sent-159, score-0.307]

53 The QP can partially be solved analytically with respect to α: min β N n=1 ρ xn − J j=1 βj tjn , (12) where ρ is the Huber’s loss t2 N − Nν ≤ t ≤ Nν 2 2 |t| − otherwise. [sent-162, score-0.405]

54 Thus, the on-manifold solution of (11) corresponds to the robust projection by the Huber’s loss. [sent-163, score-0.307]

55 In other words, α is considered as a set of slack variables in the robust projection. [sent-164, score-0.211]

56 It is worthwhile to notice another choice of slack variables proposed in [2]: 2 N J 1 1 zn xn − βj tjn + γ . [sent-165, score-0.431]

57 Here the slack variables are denoted as z, which is called the outlier process [2]. [sent-170, score-0.172]

58 Then the inside problem with respect to zn can be analytically solved, and we have the reduced problem as ρ(t) = min β N n=1 N ν2 4 hγ xn − J j=1 βj tjn 2 (14) t where hγ (t) is again the Huber’s loss function: h γ (t) = 2γ + γ if |t| < γ and |t| if |t| ≥ γ. [sent-173, score-0.422]

59 2 The outlier process tells one which pixels are ignored, but it does not directly represent the denoised image. [sent-174, score-0.501]

60 This dataset has 200 face images (100 males and 100 females) and each image is rescaled to 44×64. [sent-177, score-0.344]

61 The images are artificially corrupted by impulse and block noises. [sent-178, score-0.608]

62 As impulse noises, 20% of the pixels are chosen randomly and set to 0. [sent-179, score-0.508]

63 For block noises, a rectangular region (10% of the pixels) is set to zero to hide the eyes. [sent-180, score-0.296]

64 The task is to recover the original image based on the remaining 199 images (i. [sent-182, score-0.297]

65 Our linear program is compared against the least squares projection and the robust projection using Huber’s loss (i. [sent-188, score-0.742]

66 However, we will not trade convexity with denoising performance here, because local minima often put practitioners into trouble. [sent-195, score-0.346]

67 As a reference, we also consider an idealistic denoising method, to which we give the true position of noises. [sent-196, score-0.417]

68 Here, the pixel values of noisy positions are estimated by the least squares projection only with respect to the non-noise pixels. [sent-197, score-0.497]

69 The linear manifold is made by PCA from the remaining 199 images. [sent-199, score-0.168]

70 For impulse and block noise images, it turned out to be 110 and 30, respectively. [sent-201, score-0.444]

71 The reconstruction errors of LP and QP for impulse noises are shown in Fig. [sent-202, score-0.536]

72 Both in LP and QP, the off-manifold solution outperforms a: original image c: least squares proj. [sent-207, score-0.413]

73 4 (454) Figure 3: A typical result of denoising impulse noise. [sent-209, score-0.482]

74 (c) Reconstruction by the least squares projection to the PCA basis. [sent-212, score-0.315]

75 the on-manifold one, which confirms our intuition that it is effective to keep most pixels unchanged. [sent-217, score-0.333]

76 Compared with the least squares projection, the difference is so large that one can easily see it in the reconstructed images (Fig. [sent-218, score-0.357]

77 4, left and right) performed significantly better than the on-manifold solution of QP, which corresponds to the robust projection using Huber’s loss (cf. [sent-222, score-0.361]

78 2 Figure 4: Reconstruction errors of LP and QP methods for impulse noise. [sent-241, score-0.175]

79 The flat lines correspond to the least squares projection and the unrealistic setting where the correct positions of noises are given. [sent-243, score-0.535]

80 5 1400 PCA 2000 average reconstruction error average reconstruction error 2500 PCA 1200 0. [sent-246, score-0.282]

81 4 on-manifold solution 1000 upper bound (10) off-manifold solution 0. [sent-247, score-0.194]

82 8 1 Figure 5: Reconstruction errors of the LP method for block noises. [sent-258, score-0.219]

83 (Left) the reconstruction error of the “plain” LP, where the block constraints are not taken into account (λ = 0). [sent-259, score-0.411]

84 As in the case with impulse noise, the error is smaller than that of the least squares regression (PCA projection), and the minimum is attained around ν = 1/2. [sent-275, score-0.344]

85 Actually, there is not much room for improvements, since even the idealistic case where the position of the occlusion is know is not much better. [sent-278, score-0.238]

86 An example of reconstructed images are shown in Fig. [sent-279, score-0.188]

87 When λ = 0, nonzero α’s appear not only in occluded part but also for instance along the face edge (Fig. [sent-282, score-0.268]

88 When λ = 1/2, nonzero α’s are more concentrated in the occluded part, because the block constraints suppress a isolated nonzero values (Fig. [sent-284, score-0.496]

89 7:i, one can see high γ’s in the edge pixels of occluded region, which indicates that the block constraints are active for those pixels. [sent-287, score-0.695]

90 6 Concluding Remarks In summary, we have presented a new image denoising method based on linear programming. [sent-292, score-0.502]

91 The on-manifold solution of our method is related to existing robust statistical approaches. [sent-294, score-0.161]

92 Remarkably, our method can deal with block noises while retaining the convexity of the optimization problem (every linear program is convex). [sent-295, score-0.659]

93 [9]) tend to rely on non-convex optimization to include the prior knowledge that the noises form blocks. [sent-298, score-0.296]

94 We are looking forward to apply the linear programming to other computer vision problems which involve combinatorial optimization, e. [sent-300, score-0.158]

95 a: original image b: noisy image f: Off Manifold λ=0. [sent-313, score-0.429]

96 5] Figure 7: A typical result of denoising block noises (ν = 0. [sent-318, score-0.746]

97 The image (d) shows the denoising result when the block constraints are not taken into account (λ = 0, ν = 1/2). [sent-321, score-0.727]

98 This result improves by imposing the block constraints (λ = 1/2, ν = 1/4) as shown in (f) and (g), which are the off and on-manifold solutions, respectively. [sent-322, score-0.27]

99 The images (e),(h) and (i) show the parameter values obtained as the result of linear programming (see the text for details). [sent-323, score-0.245]

100 On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. [sent-334, score-0.155]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pixels', 0.333), ('denoising', 0.307), ('lp', 0.283), ('qp', 0.245), ('noises', 0.22), ('block', 0.219), ('tjn', 0.194), ('impulse', 0.175), ('image', 0.15), ('projection', 0.146), ('reconstruction', 0.141), ('occlusion', 0.128), ('huber', 0.123), ('manifold', 0.123), ('images', 0.117), ('pca', 0.115), ('squares', 0.113), ('denoised', 0.11), ('noisy', 0.099), ('corrupted', 0.097), ('robust', 0.097), ('occluded', 0.092), ('np', 0.088), ('slack', 0.088), ('proposition', 0.086), ('program', 0.085), ('nc', 0.084), ('programming', 0.083), ('pixel', 0.083), ('idealistic', 0.083), ('nb', 0.082), ('boundary', 0.08), ('face', 0.077), ('xn', 0.076), ('reconstructed', 0.071), ('nonzero', 0.067), ('area', 0.067), ('solution', 0.064), ('outer', 0.061), ('outlier', 0.058), ('least', 0.056), ('tj', 0.055), ('loss', 0.054), ('identi', 0.053), ('lkopf', 0.052), ('min', 0.051), ('optimization', 0.051), ('constraints', 0.051), ('admissible', 0.05), ('sch', 0.05), ('inversion', 0.05), ('noise', 0.05), ('graf', 0.048), ('hide', 0.048), ('lncs', 0.048), ('mpi', 0.048), ('updated', 0.048), ('zn', 0.047), ('linear', 0.045), ('scores', 0.045), ('takahashi', 0.044), ('smola', 0.042), ('score', 0.041), ('bound', 0.039), ('convexity', 0.039), ('convex', 0.036), ('blocks', 0.036), ('fraction', 0.036), ('interior', 0.035), ('greater', 0.034), ('tsch', 0.034), ('mika', 0.034), ('crucial', 0.033), ('adopted', 0.032), ('penalize', 0.032), ('instance', 0.032), ('let', 0.031), ('rewritten', 0.03), ('vision', 0.03), ('solved', 0.03), ('original', 0.03), ('regularization', 0.029), ('region', 0.029), ('limitations', 0.029), ('drawback', 0.029), ('control', 0.028), ('sparsity', 0.028), ('counts', 0.028), ('considerably', 0.027), ('nitions', 0.027), ('position', 0.027), ('upper', 0.027), ('neighbor', 0.026), ('optimality', 0.026), ('variables', 0.026), ('germany', 0.025), ('knowledge', 0.025), ('anticipated', 0.024), ('functioning', 0.024), ('acknowledgment', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000012 88 nips-2003-Image Reconstruction by Linear Programming

Author: Koji Tsuda, Gunnar Rätsch

Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

2 0.22362526 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman

Abstract: Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-specific image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a significant improvement over other common multiple-image super-resolution techniques. 1

3 0.13019727 149 nips-2003-Optimal Manifold Representation of Data: An Information Theoretic Approach

Author: Denis V. Chigirev, William Bialek

Abstract: We introduce an information theoretic method for nonparametric, nonlinear dimensionality reduction, based on the infinite cluster limit of rate distortion theory. By constraining the information available to manifold coordinates, a natural probabilistic map emerges that assigns original data to corresponding points on a lower dimensional manifold. With only the information-distortion trade off as a parameter, our method determines the shape of the manifold, its dimensionality, the probabilistic map and the prior that provide optimal description of the data. 1 A simple example Some data sets may not be as complicated as they appear. Consider the set of points on a plane in Figure 1. As a two dimensional set, it requires a two dimensional density ρ(x, y) for its description. Since the data are sparse the density will be almost singular. We may use a smoothing kernel, but then the data set will be described by a complicated combination of troughs and peaks with no obvious pattern and hence no ability to generalize. We intuitively, however, see a strong one dimensional structure (a curve) underlying the data. In this paper we attempt to capture this intuition formally, through the use of the infinite cluster limit of rate distortion theory. Any set of points can be embedded in a hypersurface of any intrinsic dimensionality if we allow that hypersurface to be highly “folded.” For example, in Figure 1, any curve that goes through all the points gives a one dimensional representation. We would like to avoid such solutions, since they do not help us discover structure in the data. Looking for a simpler description one may choose to penalize the curvature term [1]. The problem with this approach is that it is not easily generalized to multiple dimensions, and requires the dimensionality of the solution as an input. An alternative approach is to allow curves of all shapes and sizes, but to send the reduced coordinates through an information bottleneck. With a fixed number of bits, position along a highly convoluted curve becomes uncertain. This will penalize curves that follow the data too closely (see Figure 1). There are several advantages to this approach. First, it removes the artificiality introduced by Hastie [2] of adding to the cost function only orthogonal errors. If we believe that data points fall out of the manifold due to noise, there is no reason to treat the projection onto the manifold as exact. Second, it does not require the dimension- 9 8 Figure 1: Rate distortion curve for a data set of 25 points (red). We used 1000 points to represent the curve which where initialized by scattering them uniformly on the plane. Note that the produced curve is well defined, one dimensional and smooth. 7 6 5 4 3 2 1 0 2 4 6 8 10 12 ality of the solution manifold as an input. By adding extra dimensions, one quickly looses the precision with which manifold points are specified (due to the fixed information bottleneck). Hence, the optimal dimension emerges naturally. This also means that the method works well in many dimensions with no adjustments. Third, the method handles sparse data well. This is important since in high dimensional spaces all data sets are sparse, i.e. they look like points in Figure 1, and the density estimation becomes impossible. Luckily, if the data are truly generated by a lower dimensional process, then density estimation in the data space is not important (from the viewpoint of prediction or any other). What is critical is the density of the data along the manifold (known in latent variable modeling as a prior), and our algorithm finds it naturally. 2 Latent variable models and dimensionality reduction Recently, the problem of reducing the dimensionality of a data set has received renewed attention [3,4]. The underlying idea, due to Hotelling [5], is that most of the variation in many high dimensional data sets can often be explained by a few latent variables. Alternatively, we say that rather than filling the whole space, the data lie on a lower dimensional manifold. The dimensionality of this manifold is the dimensionality of the latent space and the coordinate system on this manifold provides the latent variables. Traditional tools of principal component analysis (PCA) and factor analysis (FA) are still the most widely used methods in data analysis. They project the data onto a hyperplane, so the reduced coordinates are easy to interpret. However, these methods are unable to deal with nonlinear correlations in a data set. To accommodate nonlinearity in a data set, one has to relax the assumption that the data is modeled by a hyperplane, and allow a general low dimensional manifold of unknown shape and dimensionality. The same questions that we asked in the previous section apply here. What do we mean by requiring that “the manifold models the data well”? In the next section, we formalize this notion by defining the manifold description of data as a doublet (the shape of the manifold and the projection map). Note that we do not require the probability distribution over the manifold (known for generative models [6,7] as a prior distribution over the latent variables and postulated a priori). It is completely determined by the doublet. Nonlinear correlations in data can also be accommodated implicitly, without constructing an actual low dimensional manifold. By mapping the data from the original space to an even higher dimensional feature space, we may hope that the correlations will become linearized and PCA will apply. Kernel methods [8] allow us to do this without actually constructing an explicit map to feature space. They introduce nonlinearity through an a priori nonlinear kernel. Alternatively, autoassociative neural networks [9] force the data through a bottleneck (with an internal layer of desired dimensionality) to produce a reduced description. One of the disadvantages of these methods is that the results are not easy to interpret. Recent attempts to describe a data set with a low dimensional representation generally follow into two categories: spectral methods and density modeling methods. Spectral methods (LLE [3], ISOMAP [4], Laplacian eigenmaps [10]) give reduced coordinates of an a priori dimensionality by introducing a quadratic cost function in reduced coordinates (hence eigenvectors are solutions) that mimics the relationships between points in the original data space (geodesic distance for ISOMAP, linear reconstruction for LLE). Density modeling methods (GTM [6], GMM [7]) are generative models that try to reproduce the data with fewer variables. They require a prior and a parametric generative model to be introduced a priori and then find optimal parameters via maximum likelihood. The approach that we will take is inspired by the work of Kramer [9] and others who tried to formulate dimensionality reduction as a compression problem. They tried to solve the problem by building an explicit neural network encoder-decoder system which restricted the information implicitly by limiting the number of nodes in the bottleneck layer. Extending their intuition with the tools of information theory, we recast dimensionality reduction as a compression problem where the bottleneck is the information available to manifold coordinates. This allows us to define the optimal manifold description as that which produces the best reconstruction of the original data set, given that the coordinates can only be transmitted through a channel of fixed capacity. 3 Dimensionality reduction as compression Suppose that we have a data set X in a high dimensional state space RD described by a density function ρ(x). We would like to find a “simplified” description of this data set. One may do so by visualizing a lower dimensional manifold M that “almost” describes the data. If we have a manifold M and a stochastic map PM : x → PM (µ|x) to points µ on the manifold, we will say that they provide a manifold description of the data set X. Note that the stochastic map here is well justified: if a data point does not lie exactly on the manifold then we should expect some uncertainty in the estimation of the value of its latent variables. Also note that we do not need to specify the inverse (generative) map: M → RD ; it can be obtained by Bayes’ rule. The manifold description (M, PM ) is a less than faithful representation of the data. To formalize this notion we will introduce the distortion measure D(M, PM , ρ): ρ(x)PM (µ|x) x − µ 2 dD xDµ. D(M, PM , ρ) = x∈RD (1) µ∈M Here we have assumed the Euclidean distance function for simplicity. The stochastic map, PM (µ|x), together with the density, ρ(x), define a joint probability function P (M, X) that allows us to calculate the mutual information between the data and its manifold representation: I(X, M) = P (x, µ) log x∈X µ∈M P (x, µ) dD xDµ. ρ(x)PM (µ) (2) This quantity tells us how many bits (on average) are required to encode x into µ. If we view the manifold representation of X as a compression scheme, then I(X, M) tells us the necessary capacity of the channel needed to transmit the compressed data. Ideally, we would like to obtain a manifold description {M, PM (M|X)} of the data set X that provides both a low distortion D(M, PM , ρ) and a good compression (i.e. small I(X, M)). The more bits we are willing to provide for the description of the data, the more detailed a manifold that can be constructed. So there is a trade off between how faithful a manifold representation can be and how much information is required for its description. To formalize this notion we introduce the concept of an optimal manifold. DEFINITION. Given a data set X and a channel capacity I, a manifold description (M, PM (M|X)) that minimizes the distortion D(M, PM , X), and requires only information I for representing an element of X, will be called an optimal manifold M(I, X). Note that another way to define an optimal manifold is to require that the information I(M, X) is minimized while the average distortion is fixed at value D. The shape and the dimensionality of optimal manifold depends on our information resolution (or the description length that we are willing to allow). This dependence captures our intuition that for real world, multi-scale data, a proper manifold representation must reflect the compression level we are trying to achieve. To find the optimal manifold (M(I), PM(I) ) for a given data set X, we must solve a constrained optimization problem. Let us introduce a Lagrange multiplier λ that represents the trade off between information and distortion. Then optimal manifold M(I) minimizes the functional: F(M, PM ) = D + λI. (3) Let us parametrize the manifold M by t (presumably t ∈ Rd for some d ≤ D). The function γ(t) : t → M maps the points from the parameter space onto the manifold and therefore describes the manifold. Our equations become: D = dD x dd t ρ(x)P (t|x) x − γ(t) 2 , I = dD x dd t ρ(x)P (t|x) log P (t|x) , P (t) F(γ(t), P (t|x)) = D + λI. (4) (5) (6) Note that both information and distortion measures are properties of the manifold description doublet {M, PM (M|X)} and are invariant under reparametrization. We require the variations of the functional to vanish for optimal manifolds δF/δγ(t) = 0 and δF/δP (t|x) = 0, to obtain the following set of self consistent equations: P (t) = γ(t) = P (t|x) = Π(x) = dD x ρ(x)P (t|x), 1 dD x xρ(x)P (t|x), P (t) P (t) − 1 x−γ (t) 2 e λ , Π(x) 2 1 dd t P (t)e− λ x−γ (t) . (7) (8) (9) (10) In practice we do not have the full density ρ(x), but only a discrete number of samples. 1 So we have to approximate ρ(x) = N δ(x − xi ), where N is the number of samples, i is the sample label, and xi is the multidimensional vector describing the ith sample. Similarly, instead of using a continuous variable t we use a discrete set t ∈ {t1 , t2 , ..., tK } of K points to model the manifold. Note that in (7 − 10) the variable t appears only as an argument for other functions, so we can replace the integral over t by a sum over k = 1..K. Then P (t|x) becomes Pk (xi ),γ(t) is now γ k , and P (t) is Pk . The solution to the resulting set of equations in discrete variables (11 − 14) can be found by an iterative Blahut-Arimoto procedure [11] with an additional EM-like step. Here (n) denotes the iteration step, and α is a coordinate index in RD . The iteration scheme becomes: (n) Pk (n) γk,α = = N 1 N (n) Pk (xi ) = Π(n) (xi ) N 1 1 (n) N P k where α (11) i=1 = (n) xi,α Pk (xi ), (12) i=1 1, . . . , D, K (n) 1 (n) Pk e− λ xi −γ k 2 (13) k=1 (n) (n+1) Pk (xi ) = (n) 2 Pk 1 . e− λ xi −γ k (n) (x ) Π i (14) 0 0 One can initialize γk and Pk (xi ) by choosing K points at random from the data set and 0 letting γk = xi(k) and Pk = 1/K, then use equations (13) and (14) to initialize the 0 association map Pk (xi ). The iteration procedure (11 − 14) is terminated once n−1 n max |γk − γk | < , (15) k where determines the precision with which the manifold points are located. The above algorithm requires the information distortion cost λ = −δD/δI as a parameter. If we want to find the manifold description (M, P (M|X)) for a particular value of information I, we can plot the curve I(λ) and, because it’s monotonic, we can easily find the solution iteratively, arbitrarily close to a given value of I. 4 Evaluating the solution The result of our algorithm is a collection of K manifold points, γk ∈ M ⊂ RD , and a stochastic projection map, Pk (xi ), which maps the points from the data space onto the manifold. Presumably, the manifold M has a well defined intrinsic dimensionality d. If we imagine a little ball of radius r centered at some point on the manifold of intrinsic dimensionality d, and then we begin to grow the ball, the number of points on the manifold that fall inside will scale as rd . On the other hand, this will not be necessarily true for the original data set, since it is more spread out and resembles locally the whole embedding space RD . The Grassberger-Procaccia algorithm [12] captures this intuition by calculating the correlation dimension. First, calculate the correlation integral: 2 C(r) = N (N − 1) N N H(r − |xi − xj |), (16) i=1 j>i where H(x) is a step function with H(x) = 1 for x > 0 and H(x) = 0 for x < 0. This measures the probability that any two points fall within the ball of radius r. Then define 0 original data manifold representation -2 ln C(r) -4 -6 -8 -10 -12 -14 -5 -4 -3 -2 -1 0 1 2 3 4 ln r Figure 2: The semicircle. (a) N = 3150 points randomly scattered around a semicircle of radius R = 20 by a normal process with σ = 1 and the final positions of 100 manifold points. (b) Log log plot of C(r) vs r for both the manifold points (squares) and the original data set (circles). the correlation dimension at length scale r as the slope on the log log plot. dcorr (r) = d log C(r) . d log r (17) For points lying on a manifold the slope remains constant and the dimensionality is fixed, while the correlation dimension of the original data set quickly approaches that of the embedding space as we decrease the length scale. Note that the slope at large length scales always tends to decrease due to finite span of the data and curvature effects and therefore does not provide a reliable estimator of intrinsic dimensionality. 5 5.1 Examples Semi-Circle We have randomly generated N = 3150 data points scattered by a normal distribution with σ = 1 around a semi-circle of radius R = 20 (Figure 2a). Then we ran the algorithm with K = 100 and λ = 8, and terminated the iterative algorithm once the precision = 0.1 had been reached. The resulting manifold is depicted in red. To test the quality of our solution, we calculated the correlation dimension as a function of spatial scale for both the manifold points and the original data set (Figure 2b). As one can see, the manifold solution is of fixed dimensionality (the slope remains constant), while the original data set exhibits varying dimensionality. One should also note that the manifold points have dcorr (r) = 1 well into the territory where the original data set becomes two dimensional. This is what we should expect: at a given information level (in this case, I = 2.8 bits), the information about the second (local) degree of freedom is lost, and the resulting structure is one dimensional. A note about the parameters. Letting K → ∞ does not alter the solution. The information I and distortion D remain the same, and the additional points γk also fall on the semi-circle and are simple interpolations between the original manifold points. This allows us to claim that what we have found is a manifold, and not an agglomeration of clustering centers. Second, varying λ changes the information resolution I(λ): for small λ (high information rate) the local structure becomes important. At high information rate the solution undergoes 3.5 3 3 3 2.5 2.5 2 2.5 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0 0.5 -0.5 0 0 -1 5 -0.5 -0.5 4 1 3 0.5 2 -1 -1 0 1 -0.5 0 -1 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Figure 3: S-shaped sheet in 3D. (a) N = 2000 random points on a surface of an S-shaped sheet in 3D. (b) Normal noise added. XY-plane projection of the data. (c) Optimal manifold points in 3D, projected onto an XY plane for easy visualization. a phase transition, and the resulting manifold becomes two dimensional to take into account the local structure. Alternatively, if we take λ → ∞, the cost of information rate becomes very high and the whole manifold collapses to a single point (becomes zero dimensional). 5.2 S-surface Here we took N = 2000 points covering an S-shaped sheet in three dimensions (Figure 3a), and then scattered the position of each point by adding Gaussian noise. The resulting manifold is difficult to visualize in three dimensions, so we provided its projection onto an XY plane for an illustrative purpose (Figure 3b). After running our algorithm we have recovered the original structure of the manifold (Figure 3c). 6 Discussion The problem of finding low dimensional manifolds in high dimensional data requires regularization to avoid hgihly folded, Peano curve like solutions which are low dimensional in the mathematical sense but fail to capture our geometric intuition. Rather than constraining geometrical features of the manifold (e.g., the curvature) we have constrained the mutual information between positions on the manifold and positions in the original data space, and this is invariant to all invertible coordinate transformations in either space. This approach enforces “smoothness” of the manifold only implicitly, but nonetheless seems to work. Our information theoretic approach has considerable generality relative to methods based on specific smoothing criteria, but requires a separate algorithm, such as LLE, to give the manifold points curvilinear coordinates. For data points not in the original data set, equations (9-10) and (13-14) provide the mapping onto the manifold. Eqn. (7) gives the probability distribution over the latent variable, known in the density modeling literature as “the prior.” The running time of the algorithm is linear in N . This compares favorably with other methods and makes it particularly attractive for very large data sets. The number of manifold points K usually is chosen as large as possible, given the computational constraints, to have a dense sampling of the manifold. However, a value of K << N is often sufficient, since D(λ, K) → D(λ) and I(λ, K) → I(λ) approach their limits rather quickly (the convergence improves for large λ and deteriorates for small λ). In the example of a semi-circle, the value of K = 30 was sufficient at the compression level of I = 2.8 bits. In general, the threshold value for K scales exponentially with the latent dimensionality (rather than with the dimensionality of the embedding space). The choice of λ depends on the desired information resolution, since I depends on λ. Ideally, one should plot the function I(λ) and then choose the region of interest. I(λ) is a monotonically decreasing function, with the kinks corresponding to phase transitions where the optimal manifold abruptly changes its dimensionality. In practice, we may want to run the algorithm only for a few choices of λ, and we would like to start with values that are most likely to correspond to a low dimensional latent variable representation. In this case, as a rule of thumb, we choose λ smaller, but on the order of the largest linear dimension (i.e. λ/2 ∼ Lmax ). The dependence of the optimal manifold M(I) on information resolution reflects the multi-scale nature of the data and should not be taken as a shortcoming. References [1] Bregler, C. & Omohundro, S. (1995) Nonlinear image interpolation using manifold learning. Advances in Neural Information Processing Systems 7. MIT Press. [2] Hastie, T. & Stuetzle, W. (1989) Principal curves. Journal of the American Statistical Association, 84(406), 502-516. [3] Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. [4] Tenenbaum, J., de Silva, V., & Langford, J. (2000) A global geometric framework for nonlinear dimensionality reduction. Science, 290 , 2319–2323. [5] Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417-441,498-520. [6] Bishop, C., Svensen, M. & Williams, C. (1998) GTM: The generative topographic mapping. Neural Computation,10, 215–234. [7] Brand, M. (2003) Charting a manifold. Advances in Neural Information Processing Systems 15. MIT Press. [8] Scholkopf, B., Smola, A. & Muller K-R. (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299-1319. [9] Kramer, M. (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37, 233-243. [10] Belkin M. & Niyogi P. (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396. [11] Blahut, R. (1972) Computation of channel capacity and rate distortion function. IEEE Trans. Inform. Theory, IT-18, 460-473. [12] Grassberger, P., & Procaccia, I. (1983) Characterization of strange attractors. Physical Review Letters, 50, 346-349.

4 0.11931448 124 nips-2003-Max-Margin Markov Networks

Author: Ben Taskar, Carlos Guestrin, Daphne Koller

Abstract: In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M3 ) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1

5 0.11887709 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1

6 0.10885353 122 nips-2003-Margin Maximizing Loss Functions

7 0.096562542 12 nips-2003-A Model for Learning the Semantics of Pictures

8 0.093275994 73 nips-2003-Feature Selection in Clustering Problems

9 0.088629514 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

10 0.085023597 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

11 0.084710397 100 nips-2003-Laplace Propagation

12 0.082670093 120 nips-2003-Locality Preserving Projections

13 0.071482882 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification

14 0.070731714 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

15 0.070689775 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

16 0.070084743 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

17 0.067075372 126 nips-2003-Measure Based Regularization

18 0.06583 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

19 0.064127885 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

20 0.063514441 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.235), (1, -0.089), (2, -0.003), (3, -0.081), (4, -0.111), (5, -0.001), (6, -0.031), (7, -0.106), (8, 0.031), (9, -0.133), (10, -0.027), (11, -0.041), (12, -0.08), (13, 0.211), (14, -0.11), (15, -0.08), (16, 0.01), (17, 0.071), (18, -0.081), (19, 0.152), (20, 0.002), (21, 0.131), (22, -0.035), (23, -0.068), (24, 0.042), (25, 0.072), (26, 0.009), (27, -0.096), (28, -0.061), (29, -0.044), (30, -0.013), (31, 0.044), (32, 0.223), (33, 0.04), (34, 0.027), (35, -0.087), (36, -0.043), (37, -0.056), (38, -0.044), (39, -0.007), (40, -0.066), (41, 0.1), (42, -0.017), (43, -0.008), (44, -0.062), (45, -0.121), (46, 0.143), (47, -0.047), (48, 0.052), (49, 0.128)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97612709 88 nips-2003-Image Reconstruction by Linear Programming

Author: Koji Tsuda, Gunnar Rätsch

Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

2 0.74362475 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman

Abstract: Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-specific image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a significant improvement over other common multiple-image super-resolution techniques. 1

3 0.57160282 12 nips-2003-A Model for Learning the Semantics of Pictures

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1

4 0.5676958 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

Author: Charles Rosenberg, Alok Ladsariya, Tom Minka

Abstract: We present a Bayesian approach to color constancy which utilizes a nonGaussian probabilistic model of the image formation process. The parameters of this model are estimated directly from an uncalibrated image set and a small number of additional algorithmic parameters are chosen using cross validation. The algorithm is empirically shown to exhibit RMS error lower than other color constancy algorithms based on the Lambertian surface reflectance model when estimating the illuminants of a set of test images. This is demonstrated via a direct performance comparison utilizing a publicly available set of real world test images and code base.

5 0.54616249 120 nips-2003-Locality Preserving Projections

Author: Xiaofei He, Partha Niyogi

Abstract: Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA) – a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.

6 0.46715322 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

7 0.4366754 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

8 0.4320555 149 nips-2003-Optimal Manifold Representation of Data: An Information Theoretic Approach

9 0.41154319 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

10 0.41081151 119 nips-2003-Local Phase Coherence and the Perception of Blur

11 0.40162942 124 nips-2003-Max-Margin Markov Networks

12 0.39199233 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples

13 0.38142145 168 nips-2003-Salient Boundary Detection using Ratio Contour

14 0.37750328 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

15 0.36309645 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

16 0.34840372 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

17 0.34662011 73 nips-2003-Feature Selection in Clustering Problems

18 0.34312114 112 nips-2003-Learning to Find Pre-Images

19 0.34156758 155 nips-2003-Perspectives on Sparse Bayesian Learning

20 0.33488759 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (11, 0.439), (29, 0.011), (30, 0.013), (35, 0.051), (53, 0.102), (68, 0.012), (69, 0.012), (71, 0.057), (76, 0.034), (85, 0.068), (91, 0.073), (99, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98151201 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

Author: Masakazu Yagi, Hideo Yamasaki, Tadashi Shibata

Abstract: A mixed-signal image filtering VLSI has been developed aiming at real-time generation of edge-based image vectors for robust image recognition. A four-stage asynchronous median detection architecture based on analog digital mixed-signal circuits has been introduced to determine the threshold value of edge detection, the key processing parameter in vector generation. As a result, a fully seamless pipeline processing from threshold detection to edge feature map generation has been established. A prototype chip was designed in a 0.35-µm double-polysilicon three-metal-layer CMOS technology and the concept was verified by the fabricated chip. The chip generates a 64-dimension feature vector from a 64x64-pixel gray scale image every 80µsec. This is about 104 times faster than the software computation, making a real-time image recognition system feasible. 1 In tro du c ti o n The development of human-like image recognition systems is a key issue in information technology. However, a number of algorithms developed for robust image recognition so far [1]-[3] are mostly implemented as software systems running on general-purpose computers. Since the algorithms are generally complex and include a lot of floating point operations, they are computationally too expensive to build real-time systems. Development of hardware-friendly algorithms and their direct VLSI implementation would be a promising solution for real-time response systems. Being inspired by the biological principle that edge information is firstly detected in the visual cortex, we have developed an edge-based image representation algorithm compatible to hardware processing. In this algorithm, multiple-direction edges extracted from an original gray scale image is utilized to form a feature vector. Since the spatial distribution of principal edges is represented by a vector, it was named Projected Principal-Edge Distribution (PPED) [4],[5], or formerly called Principal Axis Projection (PAP) [6],[7]. (The algorithm is explained later.) Since the PPED vectors very well represent the human perception of similarity among images, robust image recognition systems have been developed using PPED vectors in conjunction with the analog soft pattern classifier [4],[8], the digital VQ (Vector Quantization) processor [9], and support vector machines [10] . The robust nature of PPED representation is demonstrated in Fig. 1, where the system was applied to cephalometric landmark identification (identifying specific anatomical landmarks on medical radiographs) as an example, one of the most important clinical practices of expert dentists in orthodontics [6],[7]. Typical X-ray images to be experienced by apprentice doctors were converted to PPED vectors and utilized as templates for vector matching. The system performance has been proven for 250 head film samples regarding the fundamental 26 landmarks [11]. Important to note is the successful detection of the landmark on the soft tissue boundary (the tip of the lower lip) shown in Fig. 1(c). Landmarks on soft tissues are very difficult to detect as compared to landmarks on hard tissues (solid bones) because only faint images are captured on radiographs. The successful detection is due to the median algorithm that determines the threshold value for edge detection. Sella Nasion Orbitale By our system (a) By expert dentists Landmark on soft tissue (b) (c) Fig. 1: Image recognition using PPED vectors: (a,b) cephalometric landmark identification; (c) successful landmark detection on soft tissue. We have adopted the median value of spatial variance of luminance within the filtering kernel (5x5 pixels), which allows us to extract all essential features in a delicate gray scale image. However, the problem is the high computational cost in determining the median value. It takes about 0.6 sec to generate one PPED vector from a 64x64-pixel image (a standard image size for recognition in our system) on a SUN workstation, making real time processing unrealistic. About 90% of the computation time is for edge detection from an input image, in which most of the time is spent for median detection. Then the purpose of this work is to develop a new architecture median-filter VLSI subsystem for real-time PPED-vector generation. Special attention has been paid to realize a fully seamless pipeline processing from threshold detection to edge feature map generation by employing the four-stage asynchronous median detection architecture. 2 P r o je c t e d P r i n c i pa l E dg e Dis tribution (PPED ) Projected Principal Edge Distribution (PPED) algorithm [5],[6] is briefly explained using Fig. 2(a). A 5x5-pixel block taken from a 64x64-pixel target image is subjected to edge detection filtering in four principal directions, i.e. horizontal, vertical, and ±45-degree directions. In the figure, horizontal edge filtering is shown as an example. (The filtering kernels used for edge detection are given in Fig. 2(b).) In order to determine the threshold value for edge detection, all the absolute-value differences between two neighboring pixels are calculated in both vertical and horizontal directions and the median value is taken as the threshold. By scanning the 5x5-pixel filtering kernels in the target image, four 64x64 edge-flag maps are generated, which are called feature maps. In the horizontal feature map, for example, edge flags in every four rows are accumulated and spatial distribution of edge flags are represented by a histogram having 16 elements. Similar procedures are applied to other three directions to form respective histograms each having 16 elements. Finally, a 64-dimension vector is formed by series-connecting the four histograms in the order of horizontal, +45-degree, vertical, and –45-degree. Edge Detection 64x64 Feature Map (64x64) (Horizontal) 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 -1-1-1-1-1 0 0 0 0 0 (Horizontal) Threshold || Median Scan (16 elements) Edge Filter PPED Vector (Horizontal Section) 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 -1 0 1 0 -1 0 1 0 -1 -1 0 0 -1 0 0 0 Horizontal +45-degree 0 0 0 0 0 Threshold Detection Absolute value difference between neiboring pels. 1 1 1 1 1 0 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 -1 0 0 0 1 0 -1 -1 0 0 1 0 -1 0 0 1 1 0 -1 0 0 0 1 0 Vertical (a) -45-degree (b) Fig. 2: PPED algorithm (a) and filtering kernels for edge detection (b). 3 Sy stem Orga ni za ti o n The system organization of the feature map generation VLSI is illustrated in Fig. 3. The system receives one column of data (8-b x 5 pixels) at each clock and stores the data in the last column of the 5x6 image buffer. The image buffer shifts all the stored data to the right at every clock. Before the edge filtering circuit (EFC) starts detecting four direction edges with respect to the center pixel in the 5x5 block, the threshold value calculated from all the pixel data in the 5x5 block must be ready in time for the processing. In order to keep the coherence of the threshold detection and the edge filtering processing, the two last-in data locating at column 5 and 6 are given to median filter circuit (MFC) in advance via absolute value circuit (AVC). AVC calculates all luminance differences between two neighboring pixels in columns 5 and 6. In this manner, a fully seamless pipeline processing from threshold detection to edge feature map generation has been established. The key requirement here is that MFC must determine the median value of the 40 luminance difference data from the 5x5-pixel block fast enough to carry out the seamless pipeline processing. For this purpose, a four-stage asynchronous median detection architecture has been developed which is explained in the following. Edge Filtering Circuit (EFC) 6 5 4 3 2 1 Edge flags H +45 V Image buffer 8-b x 5 pixels (One column) Absolute Value Circuit (AVC) Threshold value Median Filter Circuit (MFC) -45 Feature maps Fig. 3: System organization of feature map generation VLSI. The well-known binary search algorithm was adopted for fast execution of median detection. The median search processing for five 4-b data is illustrated in Fig. 4 for the purpose of explanation. In the beginning, majority voting is carried out for the MSB’s of all data. Namely, the number of 1’s is compared with the number of 0’s and the majority group wins. The majority group flag (“0” in this example) is stored as the MSB of the median value. In addition, the loser group is withdrawn in the following voting by changing all remaining bits to the loser MSB (“1” in this example). By repeating the processing, the median value is finally stored in the median value register. Elapse of time Median Register : 0 1 X X 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MVC0 MVC1 MVC2 MVC3 MVC0 MVC1 MVC2 MVC3 MVC0 MVC1 MVC2 MVC3 MVC0 MVC1 MVC2 MVC3 Majority Flag : 0 0 X X X Majority Voting Circuit (MVC) Fig. 4: Hardware algorithm for median detection by binary search. How the median value is detected from all the 40 8-b data (20 horizontal luminance difference data and 20 vertical luminance difference data) is illustrated in Fig. 5. All the data are stored in the array of median detection units (MDU’s). At each clock, the array receives four vertical luminance difference data and five horizontal luminance difference data calculated from the data in column 5 and 6 in Fig. 3. The entire data are shifted downward at each clock. The median search is carried out for the upper four bits and the lower four bits separately in order to enhance the throughput by pipelining. For this purpose, the chip is equipped with eight majority voting circuits (MVC 0~7). The upper four bits from all the data are processed by MVC 4~7 in a single clock cycle to yield the median value. In the next clock cycle, the loser information is transferred to the lower four bits within each MDU and MVC0~3 carry out the median search for the lower four bits from all the data in the array. Vertical Luminance Difference AVC AVC AVC AVC Horizontal Luminance Difference AVC AVC AVC AVC AVC Shift Shift Median Detection Unit (MDU) x (40 Units) Lower 4bit Upper 4bit MVC0 MVC2 MVC1 MVC3 MVC4 MVC5 MVC6 MVC7 MVCs for upper 4bit MVCs for lower 4bit Fig. 5: Median detection architecture for all 40 luminance difference data. The majority voting circuit (MVC) is shown in Fig. 6. Output connected CMOS inverters are employed as preamplifiers for majority detection which was first proposed in Ref. [12]. In the present implementation, however, two preamps receiving input data and inverted input data are connected to a 2-stage differential amplifier. Although this doubles the area penalty, the instability in the threshold for majority detection due to process and temperature variations has been remarkably improved as compared to the single inverter thresholding in Ref. [12]. The MVC in Fig. 6 has 41 input terminals although 40 bits of data are inputted to the circuit at one time. Bit “0” is always given to the terminal IN40 to yield “0” as the majority when there is a tie in the majority voting. PREAMP IN0 PREAMP 2W/L IN0 2W/L OUT W/L ENBL W/L W/L IN1 IN1 2W/L 2W/L W/L ENBL IN40 W/L W/L IN40 Fig. 6: Majority voting circuit (MVC). The edge filtering circuit (EFC) in Fig. 3 is composed as a four-stage pipeline of regular CMOS digital logic. In the first two stages, four-direction edge gradients are computed, and in the succeeding two stages, the detection of the largest gradient and the thresholding is carried out to generate four edge flags. 4 E x p e r i m e n t a l R es u l t s The feature map generation VLSI was fabricated in a 0.35-µm double-poly three-metal-layer CMOS technology. A photomicrograph of the proof-of-concept chip is shown in Fig. 7. The measured waveforms of the MVC at operating frequencies of 10MHz and 90MHz are demonstrated in Fig. 8. The input condition is in the worst case. Namely, 21 “1” bits and 20 “0” bits were fed to the inputs. The observed computation time is about 12 nsec which is larger than the simulation result of 2.5 nsec. This was caused by the capacitance loading due to the probing of the test circuit. In the real circuit without external probing, we confirmed the average computation time of 4~5 nsec. Edge-detection Filtering Circuit Processing Technology 0.35µm CMOS 2-Poly 3-Metal Median Filter Control Unit Chip Size 4.5mm x 4.5mm MVC Majority Voting Circuit X8 Supply Voltage 3.3 V Operation Frequengy 50MHz Vector Generator Fig. 7: Photomicrograph and specification of the fabricated proof-of-concept chip. 1V/div 5ns/div MVC_Output 1V/div 8ns/div MVC_OUT IN IN 1 Majority Voting operation (a) Majority Voting operation (b) Fig. 8: Measured waveforms of majority voting circuit (MVC) at operation frequencies of 10MHz (a) and 90 MHz (b) for the worst-case input data. The feature maps generated by the chip at the operation frequency of 25 MHz are demonstrated in Fig. 9. The power dissipation was 224 mW. The difference between the flag bits detected by the chip and those obtained by computer simulation are also shown in the figure. The number of error flags was from 80 to 120 out of 16,384 flags, only a 0.6% of the total. The occurrence of such error bits is anticipated since we employed analog circuits for median detection. However, such error does not cause any serious problems in the PPED algorithm as demonstrated in Figs. 10 and 11. The template matching results with the top five PPED vector candidates in Sella identification are demonstrated in Fig. 11, where Manhattan distance was adopted as the dissimilarity measure. The error in the feature map generation processing yields a constant bias to the dissimilarity and does not affect the result of the maximum likelihood search. Generated Feature maps Difference as compared to computer simulation Sella Horizontal Plus 45-degrees Vertical Minus 45-degrees Fig. 9: Feature maps for Sella pattern generated by the chip. Generated PPED vector by the chip Sella Difference as compared to computer simulation Dissimilarity (by Manhattan Distance) Fig. 10: PPED vector for Sella pattern generated by the chip. The difference in the vector components between the PPED vector generated by the chip and that obtained by computer simulation is also shown. 1200 Measured Data 1000 800 Computer Simulation 600 400 200 0 1st (Correct) 2nd 3rd 4th 5th Candidates in Sella recognition Fig. 11: Comparison of template matching results. 5 Conclusion A mixed-signal median filter VLSI circuit for PPED vector generation is presented. A four-stage asynchronous median detection architecture based on analog digital mixed-signal circuits has been introduced. As a result, a fully seamless pipeline processing from threshold detection to edge feature map generation has been established. A prototype chip was designed in a 0.35-µm CMOS technology and the fab- ricated chip generates an edge based image vector every 80 µsec, which is about 10 4 times faster than the software computation. Acknowledgments The VLSI chip in this study was fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo with the collaboration by Rohm Corporation and Toppan Printing Corporation. The work is partially supported by the Ministry of Education, Science, Sports, and Culture under Grant-in-Aid for Scientific Research (No. 14205043) and by JST in the program of CREST. References [1] C. Liu and Harry Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition”, IEEE Transactions on Image Processing, Vol. 11, No.4, Apr. 2002. [2] C. Yen-ting, C. Kuo-sheng, and L. Ja-kuang, “Improving cephalogram analysis through feature subimage extraction”, IEEE Engineering in Medicine and Biology Magazine, Vol. 18, No. 1, 1999, pp. 25-31. [3] H. Potlapalli and R. C. Luo, “Fractal-based classification of natural textures”, IEEE Transactions on Industrial Electronics, Vol. 45, No. 1, Feb. 1998. [4] T. Yamasaki and T. Shibata, “Analog Soft-Pattern-Matching Classifier Using Floating-Gate MOS Technology,” Advances in Neural Information Processing Systems 14, Vol. II, pp. 1131-1138. [5] Masakazu Yagi, Tadashi Shibata, “An Image Representation Algorithm Compatible to Neural-Associative-Processor-Based Hardware Recognition Systems,” IEEE Trans. Neural Networks, Vol. 14, No. 5, pp. 1144-1161, September (2003). [6] M. Yagi, M. Adachi, and T. Shibata,

2 0.9334411 37 nips-2003-Automatic Annotation of Everyday Movements

Author: Deva Ramanan, David A. Forsyth

Abstract: This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The 3D motion capture data is manually annotated off-line using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate.

same-paper 3 0.91379797 88 nips-2003-Image Reconstruction by Linear Programming

Author: Koji Tsuda, Gunnar Rätsch

Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

4 0.82704264 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie

Abstract: A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. 1

5 0.67166555 12 nips-2003-A Model for Learning the Semantics of Pictures

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1

6 0.65374768 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

7 0.56857169 168 nips-2003-Salient Boundary Detection using Ratio Contour

8 0.51425761 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

9 0.48761767 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

10 0.48456278 119 nips-2003-Local Phase Coherence and the Perception of Blur

11 0.48294881 164 nips-2003-Ranking on Data Manifolds

12 0.47782674 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

13 0.47733715 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

14 0.47608814 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

15 0.4724279 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

16 0.46799693 112 nips-2003-Learning to Find Pre-Images

17 0.46551299 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

18 0.46547079 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

19 0.45866761 120 nips-2003-Locality Preserving Projections

20 0.45354414 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples