nips nips2011 nips2011-235 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Carsten Rother, Martin Kiefel, Lumin Zhang, Bernhard Schölkopf, Peter V. Gehler
Abstract: We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field. 1
Reference: text
sentIndex sentText sentNum sentScore
1 1 Introduction The task of recovering intrinsic images is to separate a given input image into its material-dependent properties, known as reflectance or albedo, and its light-dependent properties, such as shading, shadows, specular highlights, and inter-reflectance. [sent-12, score-0.44]
2 For example, an image which solely depends on material-dependent properties is helpful for image segmentation and object recognition [11], while a clean image of shading is a valuable input to shape-from-shading algorithms. [sent-14, score-0.855]
3 As in most previous work in this field, we cast the intrinsic image recovery problem into the following simplified form, where each image pixel is the product of two components: I = sR . [sent-15, score-0.504]
4 The fact that shading is only a 1D entity imposes some limitations. [sent-20, score-0.37]
5 For example, shading effects stemming from multiple light sources can only be modeled if all light sources have the same color. [sent-21, score-0.45]
6 The main focus of this paper is on exploring sensible priors for both shading and reflectance. [sent-25, score-0.37]
7 (a) Image I “paper1” (b) I (in RGB) (c) Reflectance R (d) R (in RGB) (e) Shading s Figure 1: An image (a), its color in RGB space (b), the reflectance image (c), its distribution in RGB space (d), and the shading image (e). [sent-30, score-0.949]
8 Omer and Werman [12] have shown that an image of a natural scene often contains only a few different “basis colorlines”. [sent-31, score-0.156]
9 Figure (b) shows a dominant gray-scale color-line and other color lines corresponding to the scribbles on the paper (a). [sent-32, score-0.143]
10 The basis colors are clearly visible in (d), where the cluster for white (top, right) is the dominant one. [sent-34, score-0.14]
11 shading or shadows), and/or properties of the camera, e. [sent-37, score-0.37]
12 The main motivation of our work is to develop a simple, yet powerful probabilistic model for shading and reflectance estimation. [sent-44, score-0.387]
13 The idea is to extract those image edges which are (potentially) true reflectance edges and then to recover a new reflectance image that contains only these edges, using a set of Poisson equations. [sent-47, score-0.412]
14 The next factor is a simple smoothness prior on shading between neighboring image pixels, and has been used by some previous work e. [sent-51, score-0.593]
15 If we use image optimal parameter settings we perform on par with methods that use multiple images as input. [sent-61, score-0.279]
16 2 Related Work There is a vast amount of literature on the problem of recovering intrinsic images. [sent-63, score-0.184]
17 After that the Retinex algorithm was extended to two dimensions by Blake [3] and Horn [8], and later applied to color images [6]. [sent-67, score-0.211]
18 The basic Retinex algorithm is a 2-step procedure: 1) detect all image gradients which are caused by changes in reflectance; 2) recover a reflectance image which preserves the detected reflectance gradients. [sent-68, score-0.395]
19 The basic assumption of this approach is that small image gradients are more likely caused by a shading effect and strong gradients by a change in reflectance. [sent-69, score-0.617]
20 For color images this rule can be extended by treating changes in the 1D brightness domain differently to changes in the 2D chromaticity space. [sent-70, score-0.303]
21 Note, 2 Note, a gradient in chromaticity can only be caused by differently colored light sources, or inter-reflectance. [sent-72, score-0.178]
22 By doing so we directly extend previous intrinsic image models which makes evident the gains that can be attributed to a global sparse reflectance term alone. [sent-86, score-0.386]
23 The key idea in their work is to perform a pre-processing step where the (normalized) reflectance image is partitioned into a few clusters. [sent-89, score-0.176]
24 However, the differences are that they do not formulate this idea as a joint probabilistic model over latent reflectance “basis colors” and shading variables. [sent-93, score-0.407]
25 Also, they need a Retinex type of edge term to avoid the trivial solution of s = 1. [sent-95, score-0.103]
26 on learning low-level vision [5] formulates a probabilistic model for intrinsic images. [sent-98, score-0.202]
27 In essence, they build a patch-based prior jointly over shading and reflectance. [sent-99, score-0.398]
28 In a new test image the best explanation for reflectance and shading is determined. [sent-100, score-0.526]
29 Since no large-scale ground database was available at that time, they only train and test on computer generated images of blob-like textures. [sent-102, score-0.163]
30 This idea is derived from the Laplacian matrix used for image matting [10]. [sent-107, score-0.176]
31 3 3 A Probabilistic Model for Intrinsic Images The model outlined here falls into the class of Conditional Random Fields, specifying a conditional probability distribution over reflectance R and shading S components for a given image I p(s, R | I) ∝ exp (−E(s, R | I)) . [sent-110, score-0.549]
32 Thus Ii is an image pixel (vector of dimension 3), Ri a reflectance vector (a 3-vector), si the shading (a scalar). [sent-113, score-0.592]
33 There are two ways to use the relationship (1) to formulate a model for shading and reflectance, corresponding to two different image likelihoods p(I | s, R). [sent-121, score-0.526]
34 One possible way is to relax the relation (1) and for example assume a Gaussian likelihood p(I | s, R) ∝ exp(− I − sR 2 ) to account for some noise in the image formation process. [sent-122, score-0.177]
35 Since Ii = si Ri has to hold of all color channels c = {R, G, B}, the unknown variables are specified up to scalar multipliers, in other words the direction of Ri is already known. [sent-125, score-0.166]
36 We rewrite Ri = ri Ri , with Ri = Ii / Ii , leaving r = (r1 , . [sent-126, score-0.149]
37 The shading components can be computed using si = Ii /ri . [sent-132, score-0.411]
38 The latter reduction is commonly exploited by intrinsic image algorithms in order to simplify the model [7, 14, 4] and in the remainder we will also make use of it. [sent-134, score-0.3]
39 We will describe the three components and their influence in greater detail below, first we write the optimization problem that corresponds to a MAP solution in its most general form min ws Es (r) + wr Eret (r) + wcl Ecl (r, α). [sent-143, score-0.333]
40 ,n Note, the global scale of the energy is not important, hence we can always fix one non-zero weight ws , wr , wcl to 1. [sent-147, score-0.392]
41 Shading Prior (Es ) We expect the shading of an image to vary smoothly over the image and we encode this in the following pairwise factors −1 −1 ri Ii − rj Ij Es (r) = 2 , (4) i∼j where we use a 4-connected pixel graph to encode the neighborhood relation which we denote with i ∼ j. [sent-148, score-0.933]
42 Gradient Consistency (Eret ) As discussed in the introduction, the main idea of the Retinex algorithm is to disambiguate between edges that are due to shading variations from those that are caused by material reflectance changes. [sent-154, score-0.472]
43 Assume that we already know, or have classified, that an edge at location i, j in the input image is caused by a change in reflectance. [sent-156, score-0.233]
44 Using the fact log( Ii ) = log(Ii )−log(Ri ) (for all channels c) and assuming a squared deviation around the log gradient magnitude, this translates into the following Gaussian MRF term on the reflectances 2 (log(ri ) − log(rj ) − gij (I)(log( Ii ) − log( Ij ))) . [sent-158, score-0.143]
45 Eret (r) = (5) i∼j It remains to specify the classification function g(I) for the image edges. [sent-159, score-0.156]
46 For each pixel i and a neighbor j we compute the gradient of the intensity image and the gradient of the chromaticity change. [sent-161, score-0.359]
47 The two parameters which are the thresholds θg , θc for the intensity and the chromaticity change are then estimated using leave-one-out-cross validation. [sent-164, score-0.117]
48 It is worth noting that this term is qualitatively different from the smoothness prior on shading (4) even for pixels where gij (I) = 0. [sent-165, score-0.544]
49 Here, the log-difference is penalized whereas the shading smoothness does also depend on the intensity values Ii , Ij . [sent-166, score-0.441]
50 Global Sparse Reflectance Prior (Ecl ) Motivated by the findings of [12] we include a term that acts as a global potential on the reflectances and favors the decomposition into some few reflectance clusters. [sent-169, score-0.105]
51 Every reflectance component ri belongs to one of the clusters and we denote its cluster membership with the variable αi ∈ {1, . [sent-174, score-0.237]
52 This is summarized in the following energy term 4 Figure 2: A crop from the image “panther”. [sent-178, score-0.228]
53 Left: input image I and true decomposition (R, s). [sent-179, score-0.175]
54 Note, the colors in reflectance image (True R) have been modified on purpose such that there are exactly 4 different colors. [sent-180, score-0.21]
55 The second column shows a clustering (here from the solution with ws = 0), where each cluster has an arbitrary color. [sent-181, score-0.27]
56 The remaining columns show results with various settings for C and ws (left reflectance image, right shading image). [sent-182, score-0.537]
57 Top row is the result for C = 4 and bottom row for C = 50 clusters, columns are results for ws = 0, 10−5 , and 0. [sent-183, score-0.211]
58 Below the images is the corresponding LMSE score (described in Section 4. [sent-185, score-0.119]
59 This represents a global potential, since the cluster means depend on the assignment of all pixels in the image. [sent-190, score-0.155]
60 The cluster means Rc 1 ˜ are optimally determined given r and α: Rc = |{i:αi =c}| i:αi =c ri Ri . [sent-192, score-0.211]
61 We use a simplified model (2), namely Ecl + ws Es , and vary ws as well as the number of clusters. [sent-194, score-0.352]
62 Let us first consider the case where ws = 0 (third column). [sent-195, score-0.167]
63 Hence the shading within one cluster looks reasonable, but is not aligned across clusters. [sent-198, score-0.432]
64 If we were to give each pixel its own cluster this would no longer be true and we would get the trivial solution of s = 1. [sent-202, score-0.133]
65 Finally, results deteriorate when the smoothing term is too strong (last column ws = 0. [sent-203, score-0.225]
66 Note, that for this simple toy example the smoothness prior was not important, however for real images the best results are achieved by using a non-zero ws . [sent-205, score-0.334]
67 de/mkiefel/projects/intrinsic 5 comment Color Retinex no edge information Col-Ret+ global term full model Es - Ecl - Eret - - LOO-CV 29. [sent-218, score-0.145]
68 The column “best-single” is the parameter set that works best on all 16 images jointly, “image opt. [sent-232, score-0.12]
69 ” is the result when choosing the parameters optimal for each image individually, based on ground truth information. [sent-233, score-0.224]
70 For instance, we virtually always achieve a lower energy compared to using the ground truth r as initial start point. [sent-235, score-0.138]
71 From these constraints we can derive that Ii ≥ ri ≥ 3. [sent-240, score-0.149]
72 Experiments For the empirical evaluation we use the intrinsic image database that has been introduced in [7]. [sent-256, score-0.325]
73 This dataset consists of 16 different images for all of which the ground truth shading and reflectance components are available. [sent-257, score-0.561]
74 In all experiments we compare against Color Retinex which was found to be the best performing method among those that take a single image as input. [sent-260, score-0.156]
75 The method from [19] yields better results but requires multiple input images from different light variations. [sent-261, score-0.121]
76 The first metric is the average of the localized mean squared error (LMSE) between the predicted and true shading and predicted and true reflectance image. [sent-264, score-0.37]
77 the weights wcl , ws , wr and the gradient thresholds θc , θg have been chosen using a leave-one-out estimate (LOO-CV). [sent-269, score-0.362]
78 Due to the high variance of the scores for the images we used the median error to score the parameters. [sent-270, score-0.119]
79 Thus for image i the parameter was chosen that leads to the lowest median error on all images except i. [sent-271, score-0.281]
80 Additionally we record the best single parameter set that works well on all images, and the score that is obtained when using the optimal parameters on each image individually. [sent-272, score-0.175]
81 For models using both the cluster and shading smoothness terms, we select from ws ∈ {0. [sent-276, score-0.638]
82 1}, for models that use the cluster and Color Retinex term wr ∈ {0. [sent-279, score-0.164]
83 When all three terms are non-zero, we vary ws as above paired with wr ∈ ×{0. [sent-283, score-0.249]
84 The first observation is that the Color Retinex algorithm (1st row) performs about similar to the system using a shading smoothness prior together with the global factor Ecl (2nd row). [sent-293, score-0.485]
85 The lower value for the image optimal setting of 18. [sent-296, score-0.156]
86 With knowledge about the optimal image parameter it yields a lower LMSE score (16. [sent-304, score-0.175]
87 0 we believe that they are informative, given the parameter estimation Table 2: Method comparison with other intrinsic image algoproblems due to the diverse and rithms also compared in [7]. [sent-338, score-0.318]
88 For entries ’-’ we had no individproves over all the compared methual results (and no code), the two numbers marked ∗ are estiods that use only a single image as mated from Fig4. [sent-343, score-0.156]
89 The full model is even better on 6/16 images than the Weiss algorithm [19] that uses multiple images. [sent-350, score-0.117]
90 For example note that the method BAS that either attributes all variations to shading (r = 1) or to reflectance alone (s = 1) already yields a LMSE of 36. [sent-360, score-0.387]
91 6, if for every image the optimal choice between the two is made. [sent-361, score-0.156]
92 We have also tested our method on various other real-world images and results are visually similar to [15, 4]. [sent-364, score-0.155]
93 Without the global term (Color Retinex with LOO-CV and image optimal) the result is imperfect. [sent-368, score-0.242]
94 With a global term (remaining three results) the images look visually much better. [sent-372, score-0.241]
95 7 Figure 3: Various results obtained with different methods and settings (more in supplementary material); For each result: left reflectance image, right shading image Note that the third row shows an extreme variation for the full model when switching from image optimal setting to LOO-CV setting. [sent-373, score-0.721]
96 Note, in this case we chose for both methods the image optimal settings to illustrate the potential of each model. [sent-382, score-0.156]
97 5 Discussion and Conclusion We have introduced a new probabilistic model for intrinsic images that explicitly models the reflectance formation process. [sent-383, score-0.282]
98 Another refinement would be to replace the Gaussian cluster term with a color line term [12]. [sent-387, score-0.249]
99 Ground-truth dataset and baseline evaluations for intrinsic image algorithms. [sent-439, score-0.3]
100 Intrinsic images decomposition using a local and global sparse representation of reflectance. [sent-483, score-0.167]
wordName wordTfidf (topN-words)
[('ectance', 0.589), ('retinex', 0.444), ('shading', 0.37), ('ws', 0.167), ('image', 0.156), ('ri', 0.149), ('intrinsic', 0.144), ('ecl', 0.143), ('re', 0.136), ('lmse', 0.127), ('color', 0.111), ('images', 0.1), ('ectances', 0.079), ('wcl', 0.079), ('rgb', 0.072), ('wr', 0.064), ('chromaticity', 0.063), ('eret', 0.063), ('cluster', 0.062), ('tappen', 0.056), ('visually', 0.055), ('colors', 0.054), ('sr', 0.051), ('pixel', 0.048), ('global', 0.048), ('adelson', 0.048), ('rc', 0.047), ('ii', 0.044), ('edge', 0.042), ('gij', 0.042), ('vision', 0.041), ('recovering', 0.04), ('smoothness', 0.039), ('ground', 0.038), ('term', 0.038), ('rj', 0.036), ('caused', 0.035), ('energy', 0.034), ('intensity', 0.032), ('barrow', 0.032), ('bas', 0.032), ('colorlines', 0.032), ('lightness', 0.032), ('ret', 0.032), ('scribbles', 0.032), ('gradient', 0.03), ('mrf', 0.03), ('truth', 0.03), ('edges', 0.03), ('differently', 0.029), ('shen', 0.028), ('prior', 0.028), ('omer', 0.028), ('gradients', 0.028), ('pixels', 0.027), ('clusters', 0.026), ('land', 0.026), ('shadows', 0.026), ('field', 0.025), ('rt', 0.025), ('database', 0.025), ('lowest', 0.025), ('basis', 0.024), ('weiss', 0.024), ('trivial', 0.023), ('fields', 0.023), ('components', 0.023), ('cvpr', 0.023), ('par', 0.023), ('freeman', 0.022), ('decades', 0.022), ('row', 0.022), ('thresholds', 0.022), ('scalar', 0.021), ('light', 0.021), ('starting', 0.021), ('clustering', 0.021), ('formation', 0.021), ('recover', 0.02), ('column', 0.02), ('idea', 0.02), ('score', 0.019), ('virtually', 0.019), ('highlights', 0.019), ('decomposition', 0.019), ('sources', 0.019), ('si', 0.018), ('vary', 0.018), ('believe', 0.018), ('assignment', 0.018), ('pattern', 0.018), ('ij', 0.018), ('recognition', 0.017), ('planck', 0.017), ('variations', 0.017), ('initial', 0.017), ('full', 0.017), ('probabilistic', 0.017), ('log', 0.017), ('channels', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Author: Carsten Rother, Martin Kiefel, Lumin Zhang, Bernhard Schölkopf, Peter V. Gehler
Abstract: We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field. 1
2 0.082255967 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
Author: Vicente Ordonez, Girish Kulkarni, Tamara L. Berg
Abstract: We develop and demonstrate automatic image description methods using a large captioned photo collection. One contribution is our technique for the automatic collection of this new dataset – performing a huge number of Flickr queries and then filtering the noisy results down to 1 million images with associated visually relevant captions. Such a collection allows us to approach the extremely challenging problem of description generation using relatively simple non-parametric methods and produces surprisingly effective results. We also develop methods incorporating many state of the art, but fairly noisy, estimates of image content to produce even more pleasing results. Finally we introduce a new objective performance measure for image captioning. 1
3 0.07650236 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
Author: Bin Zhao, Fei Li, Eric P. Xing
Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1
4 0.062037662 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
5 0.057928868 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
Author: Fahad S. Khan, Joost Weijer, Andrew D. Bagdanov, Maria Vanrell
Abstract: We describe a novel technique for feature combination in the bag-of-words model of image classification. Our approach builds discriminative compound words from primitive cues learned independently from training images. Our main observation is that modeling joint-cue distributions independently is more statistically robust for typical classification problems than attempting to empirically estimate the dependent, joint-cue distribution directly. We use Information theoretic vocabulary compression to find discriminative combinations of cues and the resulting vocabulary of portmanteau1 words is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. State-of-theart results on both the Oxford Flower-102 and Caltech-UCSD Bird-200 datasets demonstrate the effectiveness of our technique compared to other, significantly more complex approaches to multi-cue image representation. 1
6 0.056433346 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
7 0.055855602 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
8 0.054221001 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
9 0.053151671 293 nips-2011-Understanding the Intrinsic Memorability of Images
10 0.050963577 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies
11 0.050112233 165 nips-2011-Matrix Completion for Multi-label Image Classification
12 0.049765654 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
13 0.049145803 168 nips-2011-Maximum Margin Multi-Instance Learning
14 0.049077898 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
15 0.048425216 227 nips-2011-Pylon Model for Semantic Segmentation
16 0.04776768 180 nips-2011-Multiple Instance Filtering
17 0.047486637 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
18 0.045798726 276 nips-2011-Structured sparse coding via lateral inhibition
19 0.045726575 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
20 0.045492623 66 nips-2011-Crowdclustering
topicId topicWeight
[(0, 0.138), (1, 0.066), (2, -0.037), (3, 0.054), (4, 0.012), (5, 0.026), (6, -0.008), (7, 0.026), (8, 0.049), (9, 0.035), (10, 0.014), (11, 0.034), (12, -0.008), (13, -0.059), (14, -0.012), (15, 0.016), (16, -0.022), (17, -0.005), (18, -0.012), (19, 0.04), (20, 0.005), (21, -0.003), (22, 0.031), (23, 0.008), (24, -0.015), (25, 0.027), (26, 0.042), (27, -0.015), (28, 0.028), (29, 0.065), (30, -0.035), (31, 0.013), (32, -0.009), (33, -0.005), (34, 0.013), (35, 0.003), (36, 0.072), (37, 0.023), (38, -0.042), (39, 0.011), (40, 0.007), (41, 0.089), (42, -0.047), (43, 0.084), (44, -0.025), (45, -0.04), (46, 0.001), (47, -0.021), (48, 0.024), (49, -0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.92403972 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Author: Carsten Rother, Martin Kiefel, Lumin Zhang, Bernhard Schölkopf, Peter V. Gehler
Abstract: We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field. 1
2 0.72797883 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
Author: Fahad S. Khan, Joost Weijer, Andrew D. Bagdanov, Maria Vanrell
Abstract: We describe a novel technique for feature combination in the bag-of-words model of image classification. Our approach builds discriminative compound words from primitive cues learned independently from training images. Our main observation is that modeling joint-cue distributions independently is more statistically robust for typical classification problems than attempting to empirically estimate the dependent, joint-cue distribution directly. We use Information theoretic vocabulary compression to find discriminative combinations of cues and the resulting vocabulary of portmanteau1 words is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. State-of-theart results on both the Oxford Flower-102 and Caltech-UCSD Bird-200 datasets demonstrate the effectiveness of our technique compared to other, significantly more complex approaches to multi-cue image representation. 1
3 0.70060998 293 nips-2011-Understanding the Intrinsic Memorability of Images
Author: Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva
Abstract: Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember. While it may seem like image memorability is purely subjective, recent work shows that it is not an inexplicable phenomenon: variation in memorability of images is consistent across subjects, suggesting that some images are intrinsically more memorable than others, independent of a subjects’ contexts and biases. In this paper, we used the publicly available memorability dataset of Isola et al. [13], and augmented the object and scene annotations with interpretable spatial, content, and aesthetic image properties. We used a feature-selection scheme with desirable explaining-away properties to determine a compact set of attributes that characterizes the memorability of any individual image. We find that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful scenes are not. Contrary to popular belief, unusual or aesthetically pleasing scenes do not tend to be highly memorable. This work represents one of the first attempts at understanding intrinsic image memorability, and opens a new domain of investigation at the interface between human cognition and computer vision. 1
4 0.69115007 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
Author: Vicente Ordonez, Girish Kulkarni, Tamara L. Berg
Abstract: We develop and demonstrate automatic image description methods using a large captioned photo collection. One contribution is our technique for the automatic collection of this new dataset – performing a huge number of Flickr queries and then filtering the noisy results down to 1 million images with associated visually relevant captions. Such a collection allows us to approach the extremely challenging problem of description generation using relatively simple non-parametric methods and produces surprisingly effective results. We also develop methods incorporating many state of the art, but fairly noisy, estimates of image content to produce even more pleasing results. Finally we introduce a new objective performance measure for image captioning. 1
5 0.68434638 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
6 0.64280343 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
7 0.60643256 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
8 0.5894134 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
9 0.58066177 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies
10 0.54378456 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
11 0.54347044 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
12 0.53265756 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
13 0.53242666 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
14 0.53128392 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout
15 0.52665067 165 nips-2011-Matrix Completion for Multi-label Image Classification
16 0.51664758 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
17 0.5085113 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
18 0.50196087 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
19 0.50139624 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability
20 0.49429935 168 nips-2011-Maximum Margin Multi-Instance Learning
topicId topicWeight
[(0, 0.017), (4, 0.051), (20, 0.044), (26, 0.02), (31, 0.068), (33, 0.042), (43, 0.05), (45, 0.106), (57, 0.037), (74, 0.053), (83, 0.037), (84, 0.328), (99, 0.043)]
simIndex simValue paperId paperTitle
1 0.79520613 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data
Author: Bo Chen, David E. Carlson, Lawrence Carin
Abstract: Nonparametric Bayesian methods are developed for analysis of multi-channel spike-train data, with the feature learning and spike sorting performed jointly. The feature learning and sorting are performed simultaneously across all channels. Dictionary learning is implemented via the beta-Bernoulli process, with spike sorting performed via the dynamic hierarchical Dirichlet process (dHDP), with these two models coupled. The dHDP is augmented to eliminate refractoryperiod violations, it allows the “appearance” and “disappearance” of neurons over time, and it models smooth variation in the spike statistics. 1
same-paper 2 0.77103043 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Author: Carsten Rother, Martin Kiefel, Lumin Zhang, Bernhard Schölkopf, Peter V. Gehler
Abstract: We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field. 1
3 0.75425452 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
4 0.71525174 131 nips-2011-Inference in continuous-time change-point models
Author: Florian Stimberg, Manfred Opper, Guido Sanguinetti, Andreas Ruttor
Abstract: We consider the problem of Bayesian inference for continuous-time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two specific cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in finance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights. 1
5 0.52351695 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei
Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.
6 0.50961351 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
7 0.50674051 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
8 0.50568885 285 nips-2011-The Kernel Beta Process
9 0.50175303 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
10 0.49754694 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
11 0.49699906 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
12 0.49362451 156 nips-2011-Learning to Learn with Compound HD Models
13 0.49276853 227 nips-2011-Pylon Model for Semantic Segmentation
14 0.49174479 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
15 0.49109569 219 nips-2011-Predicting response time and error rates in visual search
16 0.48957625 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
17 0.48593378 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
18 0.48536915 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
19 0.48128486 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
20 0.4801259 35 nips-2011-An ideal observer model for identifying the reference frame of objects