nips nips2013 nips2013-195 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky
Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. [sent-9, score-0.459]
2 Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. [sent-10, score-0.972]
3 We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. [sent-11, score-1.032]
4 We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0. [sent-12, score-0.807]
5 001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. [sent-14, score-1.067]
6 We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. [sent-15, score-1.014]
7 1 Introduction Visual clutter, defined colloquially as a “confused collection” or a “crowded disorderly state”, is a dimension of image understanding that has implications for applications ranging from visualization and interface design to marketing and image aesthetics. [sent-16, score-0.306]
8 In this study we apply methods from computer vision to quantify and predict human visual clutter perception. [sent-17, score-0.816]
9 The effects of visual clutter have been studied most extensively in the context of an object detection task, where models attempt to describe how increasing clutter negatively impacts the time taken to find a target object in an image [19][25][29][18][6]. [sent-18, score-1.602]
10 Visual clutter has even been suggested as a surrogate measure for set size effect, the finding that search performance often degrades with the number of objects in a scene [32]. [sent-19, score-0.72]
11 One of the earliest attempts to model visual clutter used edge density, i. [sent-21, score-0.749]
12 the ratio of the number of edge pixels in an image to image size [19]. [sent-23, score-0.345]
13 The subsequent feature congestion model ignited interest in clutter perception by estimating 1 Figure 1: How can we quantify set size or the number of objects in these scenes, and would this object count capture the perception of scene clutter? [sent-24, score-1.334]
14 image complexity in terms of the density of intensity, color, and texture features in an image [25]. [sent-25, score-0.403]
15 However, recent work has pointed out limitations of the feature congestion model [13][21], leading to the development of alternative approaches to quantifying visual clutter [25][5][29][18]. [sent-26, score-0.868]
16 Our approach is to model visual clutter in terms of proto-objects: regions of locally similar features that are believed to exist at an early stage of human visual processing [24]. [sent-27, score-0.93]
17 Previous work used blob detectors to segment proto-objects from saliency maps for the purpose of quantifying shifts of visual attention [31], but this method is limited in that it results in elliptical proto-objects that do not capture the complexity or variability of shapes in natural scenes. [sent-30, score-0.217]
18 Alternatively, it may be possible to apply standard image segmentation methods to the task of proto-object discovery. [sent-31, score-0.249]
19 3), it is also limited in that the goal of these methods is to approximate a human segmented ground truth, where each segment generally corresponds to a complete and recognizable object. [sent-33, score-0.207]
20 For example, in the Berkeley Segmentation Dataset [20] people were asked to segment each image into 2 to 20 equally important and distinguishable things, which results in many segments being actual objects. [sent-34, score-0.224]
21 However, one rarely knows the number of objects in a scene, and ambiguity in what constitutes an object has even led some researchers to suggest that obtaining an object ground truth for natural scenes is an ill-posed problem [21]. [sent-35, score-0.338]
22 Our clutter perception model uses a parametric method of proto-object partitioning that clusters superpixels, and requires no object ground truth. [sent-36, score-1.035]
23 In summary, we create a graph having superpixels as nodes, then compute feature similarity distances between adjacent nodes. [sent-37, score-0.54]
24 We refer to these merged image fragments as proto-objects. [sent-39, score-0.212]
25 2), and this allows us to model such similarity distance statistics with a mixture of Weibull distribution, resulting in extremely efficient and robust superpixel clustering in the context of our model. [sent-41, score-0.388]
26 Our method runs in linear time with respect to the number of adjacent superpixel pairs, and has an endto-end run time of 15-20 seconds for a typical 0. [sent-42, score-0.267]
27 1 Proto-object partitioning Superpixel pre-processing and feature similarity To merge similar fragments into a coherent proto-object region, the term fragment and the measure of coherence (similarity) must be defined. [sent-45, score-0.345]
28 We define an image fragment as a group of pixels that share similar low-level image features: intensity, color, and orientation. [sent-46, score-0.341]
29 This conforms with processing in the human visual system, and also makes a fragment analogous to an image superpixel, which is a perceptually meaningful atomic region that contains pixels similar in color and texture [30]. [sent-47, score-0.505]
30 However, superpixel segmentation methods in general produce a fixed number of superpixels from an image, and groups of nearby superpixels may belong to the same proto-object due to the intended over-segmentation. [sent-48, score-0.857]
31 Therefore, we extract superpixels as image fragments for pre-processing, 2 and subsequently merge similar superpixels into proto-objects. [sent-49, score-0.754]
32 We define that a pair of adjacent superpixels belong to a coherent proto-object if they are similar in all three low-level image features. [sent-50, score-0.555]
33 Thus we need to determine a similarity threshold for each of the three features, that separates the similarity distance values into “similar”, and “dissimilar” populations, detailed in Section 2. [sent-51, score-0.222]
34 In this work, the similarity statistics are based on comparing histograms of intensity, color, and orientation features from an image fragment. [sent-53, score-0.376]
35 The intensity feature is a 1D 256 bin histogram, the color feature is a 76×76 (8 bit color) 2D histogram using hue and saturation from the HSV colorspace, and the orientation feature is a symmetrical 1D 360 bin histogram using gradient orientations, similar to the HOG feature [10]. [sent-54, score-0.652]
36 All three feature histograms are normalized to have the same total mass, such that bin counts sum to one. [sent-55, score-0.18]
37 We use Earth Mover’s Distance (EMD) to compute the similarity distance between feature histograms [26], which is known to be robust to partially matching histograms. [sent-56, score-0.252]
38 , N ) of nodes va and vb under feature f ∈ {i, c, o} as intensity, color, and orientation. [sent-60, score-0.212]
39 In the subsequent sections, we explain our proposed method for finding the adaptive similarity threshold from xf , which is the EMDs of all pairs of adjacent nodes . [sent-63, score-0.426]
40 2 EMD statistics and Weibull distributon Any pair of adjacent superpixels are either similar enough to belong to the same proto-object, or they belong to different proto-objects, as separated by the adaptive similarity threshold γf that is different for every image. [sent-65, score-0.488]
41 We formulate this as an edge labeling problem: given a graph G = (V, E), where va ∈ V and vb ∈ V are two adjacent nodes (superpixels) having edge ea,b ∈ E, a = b between them, also the nth edge of G. [sent-66, score-0.336]
42 Once γf is computed, removing the edges such that yf = 0 results in isolated clusters of locally similar image patches, which are the desired groups of proto-objects. [sent-68, score-0.211]
43 Intuitively, any pair of adjacent nodes is either within the same proto-object cluster, or between different clusters (yn;f = {1, 0}), therefore we consider two populations (the within-cluster edges, and the between-cluster edges) to be modeled from the density of xf in a given image. [sent-69, score-0.367]
44 In theory, this would mean that the density of xf is a distribution exhibiting bi-modality, such that the left mode corresponds to the set of xf that are considered similar and coherent, while the right mode contains the set of xf that represent dissimilarity. [sent-70, score-0.752]
45 In the following, we argue that the similarity distances xf computed by EMD follow Weibull distribution, which is a distribution of the Exponential family that is skewed in shape. [sent-73, score-0.366]
46 m n m n We define EMD(P, Q) = ( i j fij dij )/( i j fij ), with an optimal flow fij such that fij ≤ pi , fij ≤ qj , fi,j = min( i pi , j qj ), and fij ≥ 0, where P = j i i,j {(x1 , p1 ), . [sent-74, score-0.402]
47 When P and Q are normalized to have the same total mass, EMD becomes identical to Mallows distance [17], n 1 defined as Mp (X, Y ) = ( n i=1 |xi − yi |p )1/p , where X and Y are sorted vectors of the same size, and Mallows distance is an Lp -norm based distance measurement. [sent-83, score-0.156]
48 Hence, we can model each feature of xf as a mixture of two Weibull distributions separately, and compute the corresponding γf as the boundary locations between the two components of the mixtures. [sent-87, score-0.371]
49 Although the Weibull distribution has been used in modeling actual image features such 3 as texture and edges [12][35], it has not been used to model EMD similarity distance statistics until now. [sent-88, score-0.386]
50 Therefore, the initial guess for the first mixture component φ1;f is the N MLE of φ1;f (θ1;f ; xf ), such that xf = {min(EMD(vj;f , vj;f ))|vj;f ; j = 1, . [sent-94, score-0.57]
51 , z, f ∈ {i, c, o}}, where z is the total number of superpixels, and xf ⊂ xf . [sent-97, score-0.484]
52 For the NLS optimization method, first xf are approximated with histograms much like a box filter that smoothes a curve. [sent-106, score-0.288]
53 4 Figure 2: (a) original image, (b) after superpixel pre-processing [1] (977 initial segments), (c) final proto-object partitioning result (150 segments). [sent-113, score-0.261]
54 (d) W 2 (xf ; θf ) optimized using the Nelder-Mead algorithm for intensity, (e) color, and (f) orientation based on the image in (b). [sent-115, score-0.213]
55 4 Visual clutter model with model selection At times, the dissimilar population can be highly mixed in with the similar population, the density of which would resemble more of a single Weibull in shape such as Figure 2d. [sent-118, score-0.66]
56 Therefore, we fit a single Weibull as well as a two component WMM over xf , and apply the Akaike Information Criterion (AIC) to prevent any possible over-fittings by the two component WMM. [sent-119, score-0.242]
57 This projected distance feature is used to construct a minimum spanning tree over the superpixels to form the structure of graph G, which weakens the inter-cluster connectivity by removing cycles and other excessive graph connections. [sent-127, score-0.392]
58 2 given the computed γf , such that an edge is labeled as 1 (similar) only if the pair of superpixels are similar in all three features. [sent-129, score-0.31]
59 3 Dataset and ground truth Various in-house image datasets have been used in previous work to evaluate their models of visual clutter. [sent-131, score-0.363]
60 In each of these datasets, each image must be rank ordered for visual clutter with respect to every other image in the set by the same human subject, which is a tiring and time consuming process. [sent-133, score-1.113]
61 This rank ordering is essential for a clutter perception experiment as it establishes a stable clutter metric that is meaningful across participants; alas it limits the dataset size to the number of images each individual observer can handle. [sent-134, score-1.553]
62 Absolute clutter scales are undesirable as different raters might use different ranges on this scale. [sent-135, score-0.645]
63 We created a comparatively large clutter perception dataset consisting of 90 800×600 real world images sampled from the SUN Dataset images [33] for which there exists human segmentations of objects and object counts. [sent-136, score-1.191]
64 The high resolution of these images is also important for the accurate perception and assessment of clutter. [sent-138, score-0.287]
65 The 90 images were selected to constitute six groups based on their ground truth object counts, with 15 images in each group. [sent-139, score-0.327]
66 Specifically, group 1 had images with object counts in the 1-10 range, group 2 had counts in the 11-20 range, up to group 6 with counts in the 51-60 range. [sent-140, score-0.251]
67 These 90 images were rated in the laboratory by 15 college-aged participants whose task was to order the images in terms of least to most perceived visual clutter. [sent-141, score-0.365]
68 This was done by displaying each image one at a time and asking participants to insert it into an expanding set of previously rated images. [sent-142, score-0.266]
69 Participants were encouraged to take as much time as they needed, and were allowed to freely scroll through the existing set of clutter rated images when deciding where to insert the new image. [sent-143, score-0.739]
70 We used the median ranked position of each image as the ground truth for clutter perception in our experiments. [sent-148, score-1.05]
71 1 Experiment and results Image feature assumptions In their demonstration that similarity distances adhere to a Weibull dstribution, Burghouts et al. [sent-150, score-0.193]
72 The three image features used in this model are finite and upper bounded, and we follow the procedure from [7] with L2 distance to determine whether they are correlated. [sent-152, score-0.237]
73 We consider distances from one reference superpixel feature vector s to 100 other randomly selected superpixel feature vectors T (of the same feature), and compute the differences at index i such that we are obtaining the random variable Xi = |si − Ti |p . [sent-153, score-0.559]
74 This procedure is repeated 500 times per image for all three feature types over all 90 images. [sent-156, score-0.222]
75 This confirms that the low level image features used in this study follow a Weibull distribution. [sent-160, score-0.185]
76 4810 Table 1: Correlations between human clutter perception and all the evaluated methods. [sent-170, score-0.875]
77 We then correlated the number of proto-objects formed after superpixel merging with the ground truth behavioral clutter perception estimates by computing the Spearman’s Rank Correlation (Spearman’s ρ) following the convention of [25][5][29][18]. [sent-186, score-1.16]
78 To the extent that this is meaningful and extends to people, it suggests that visual clutter perception may ignore feature dissimilarity on the order of 14% when deciding whether two adjacent regions are similar and should be merged. [sent-205, score-1.089]
79 We compared our model to four other state-of-the-art models of clutter perception: the feature congestion model [25], the edge density method [19], the power-law model [6], and the C3 model [18]. [sent-206, score-0.788]
80 Although we did not record run-time statistics on the other models, our model, implemented in Matlab1 , had an end-to-end (excluding superpixel pre-processing) run-time of 15-20 seconds using 800×600 images running on an Win7 Intel Core i-7 computer with 8 Gb RAM. [sent-210, score-0.271]
81 A similar limitation was found for image segmentation methods that utilizes gPb contour detection as pre-processing, such as [8][14], while [23][34] took 10 hours on a single image and did not converge. [sent-213, score-0.402]
82 Therefore, we limit our evaluation to mean-shift [9] and Graph-based method [11], as they are able to produce variable numbers of segments based on the unsupervised partitioning of the 90 images from our dataset. [sent-214, score-0.188]
83 We also correlated the number of objects segmented by humans (as provided in the SUN Dataset) with the clutter perception ground truth, denoted as # obj in Table 1. [sent-216, score-1.017]
84 html 7 Figure 3: Top: Four images from our dataset, rank ordered for clutter perception by human raters, median clutter rank order from left to right: 6, 47, 70, 87. [sent-220, score-1.598]
85 Bottom: Corresponding images after parametric proto-object partitioning, median clutter rank order from left to right: 7, 40, 81, 83. [sent-221, score-0.735]
86 count being a human-derived estimate, it produced among the lowest correlations with clutter perception. [sent-222, score-0.595]
87 This suggests that clutter perception is not determined by simply the number of objects in a scene; it is the proto-object composition of these objects that is important. [sent-223, score-0.92]
88 5 Conclusion We proposed a model of visual clutter perception based on a parametric image partitioning method that is fast and able to work on large images. [sent-224, score-1.176]
89 This method of segmenting proto-objects from an image using mixture of Weibull distributions is also novel in that it models similarity distance statistics rather than feature statistics obtained directly from pixels. [sent-225, score-0.419]
90 Our work also contributes to the behavioral understanding of clutter perception. [sent-226, score-0.637]
91 We showed that our model is an excellent predictor of human clutter perception, outperforming all existing clutter models, and predicts clutter perception better than even a behavioral segmentation of objects. [sent-227, score-2.203]
92 This suggests that clutter perception is best described at the proto-object level, a level intermediate to that of objects and features. [sent-228, score-0.861]
93 Moreover, our work suggests a means of objectively quantifying a behaviorally meaningful set size for scenes, at least with respect to clutter perception. [sent-229, score-0.685]
94 We also introduced a new and validated clutter perception dataset consisting of a variety of scene types and object categories. [sent-230, score-0.965]
95 In future work we plan to extend our parametric partitioning method to general image segmentation and data clustering problems, and to use our model to predict human visual search behavior and other behaviors that might be affected by visual clutter. [sent-232, score-0.658]
96 The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements. [sent-327, score-0.661]
97 A model of clutter for complex, multivariate geospatial displays. [sent-365, score-0.595]
98 A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. [sent-378, score-0.295]
99 Natural image segmentation with adaptive texture and boundary encoding. [sent-401, score-0.288]
100 Unsupervised segmentation of natural images via lossy data compression. [sent-475, score-0.176]
wordName wordTfidf (topN-words)
[('clutter', 0.595), ('weibull', 0.28), ('superpixels', 0.271), ('xf', 0.242), ('perception', 0.207), ('superpixel', 0.191), ('emd', 0.19), ('image', 0.153), ('nls', 0.117), ('wmm', 0.117), ('visual', 0.115), ('mle', 0.097), ('segmentation', 0.096), ('similarity', 0.085), ('intensity', 0.085), ('images', 0.08), ('aic', 0.076), ('adjacent', 0.076), ('human', 0.073), ('object', 0.072), ('partitioning', 0.07), ('feature', 0.069), ('fij', 0.067), ('scene', 0.066), ('va', 0.065), ('color', 0.063), ('orientation', 0.06), ('mixture', 0.06), ('objects', 0.059), ('congestion', 0.059), ('fragments', 0.059), ('vj', 0.055), ('vb', 0.055), ('ground', 0.055), ('distance', 0.052), ('mover', 0.051), ('brook', 0.05), ('raters', 0.05), ('stony', 0.05), ('participants', 0.049), ('spearman', 0.048), ('earth', 0.046), ('segmented', 0.046), ('histograms', 0.046), ('correlation', 0.046), ('burghouts', 0.044), ('behavioral', 0.042), ('rated', 0.041), ('truth', 0.04), ('scenes', 0.04), ('texture', 0.039), ('dissimilar', 0.039), ('distances', 0.039), ('maps', 0.039), ('edge', 0.039), ('mallows', 0.038), ('segments', 0.038), ('gb', 0.036), ('parametric', 0.036), ('fragment', 0.035), ('xn', 0.034), ('isolated', 0.033), ('aicc', 0.033), ('emdf', 0.033), ('fowlkes', 0.033), ('iqr', 0.033), ('objectively', 0.033), ('slic', 0.033), ('weibulls', 0.033), ('segment', 0.033), ('vision', 0.033), ('counts', 0.033), ('features', 0.032), ('bin', 0.032), ('iccv', 0.031), ('correlated', 0.03), ('tpami', 0.03), ('quantifying', 0.03), ('samaras', 0.029), ('bravo', 0.029), ('carreira', 0.029), ('hue', 0.029), ('charts', 0.029), ('belong', 0.028), ('yn', 0.028), ('crowded', 0.027), ('symmetrical', 0.027), ('meaningful', 0.027), ('coherent', 0.027), ('guess', 0.026), ('density', 0.026), ('weather', 0.025), ('obj', 0.025), ('edges', 0.025), ('dataset', 0.025), ('rank', 0.024), ('histogram', 0.024), ('insert', 0.023), ('nodes', 0.023), ('lp', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning
Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky
Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1
2 0.12762573 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
Author: Yangqing Jia, Joshua T. Abbott, Joseph Austerweil, Thomas Griffiths, Trevor Darrell
Abstract: Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a significant advantage results from combining visual classifiers with the ability to identify an appropriate level of abstraction using Bayesian generalization. 1
3 0.099962004 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
Author: Carl Doersch, Abhinav Gupta, Alexei A. Efros
Abstract: Recent work on mid-level visual representations aims to capture information at the level of complexity higher than typical “visual words”, but lower than full-blown semantic objects. Several approaches [5, 6, 12, 23] have been proposed to discover mid-level visual elements, that are both 1) representative, i.e., frequently occurring within a visual dataset, and 2) visually discriminative. However, the current approaches are rather ad hoc and difficult to analyze and evaluate. In this work, we pose visual element discovery as discriminative mode seeking, drawing connections to the the well-known and well-studied mean-shift algorithm [2, 1, 4, 8]. Given a weakly-labeled image collection, our method discovers visually-coherent patch clusters that are maximally discriminative with respect to the labels. One advantage of our formulation is that it requires only a single pass through the data. We also propose the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches, and compare our method against prior work on the Paris Street View dataset of [5]. We also evaluate our method on the task of scene classification, demonstrating state-of-the-art performance on the MIT Scene-67 dataset. 1
4 0.094209477 166 nips-2013-Learning invariant representations and applications to face verification
Author: Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Abstract: One approach to computer object recognition and modeling the brain’s ventral stream involves unsupervised learning of representations that are invariant to common transformations. However, applications of these ideas have usually been limited to 2D affine transformations, e.g., translation and scaling, since they are easiest to solve via convolution. In accord with a recent theory of transformationinvariance [1], we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identitypreserving transformations. The model’s wiring can be learned from videos of transforming objects—or any other grouping of images into sets by their depicted object. Through a series of successively more complex empirical tests, we study the invariance/discriminability properties of this model with respect to different transformations. First, we empirically confirm theoretical predictions (from [1]) for the case of 2D affine transformations. Next, we apply the model to non-affine transformations; as expected, it performs well on face verification tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction. Surprisingly, it can also tolerate clutter “transformations” which map an image of a face on one background to an image of the same face on a different background. Motivated by these empirical findings, we tested the same model on face verification benchmark tasks from the computer vision literature: Labeled Faces in the Wild, PubFig [2, 3, 4] and a new dataset we gathered—achieving strong performance in these highly unconstrained cases as well. 1
5 0.090444475 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections
Author: Matthew Lawlor, Steven W. Zucker
Abstract: Association field models have attempted to explain human contour grouping performance, and to explain the mean frequency of long-range horizontal connections across cortical columns in V1. However, association fields only depend on the pairwise statistics of edges in natural scenes. We develop a spectral test of the sufficiency of pairwise statistics and show there is significant higher order structure. An analysis using a probabilistic spectral embedding reveals curvature-dependent components. 1
6 0.082649469 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
7 0.079741322 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
8 0.07911969 226 nips-2013-One-shot learning by inverting a compositional causal process
9 0.077607155 84 nips-2013-Deep Neural Networks for Object Detection
11 0.074728608 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
12 0.073396601 63 nips-2013-Cluster Trees on Manifolds
13 0.073309451 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
14 0.073247328 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
15 0.07200928 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
16 0.069046363 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
17 0.068976305 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
18 0.064323775 183 nips-2013-Mapping paradigm ontologies to and from the brain
19 0.063993305 167 nips-2013-Learning the Local Statistics of Optical Flow
20 0.063458063 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking
topicId topicWeight
[(0, 0.147), (1, 0.073), (2, -0.101), (3, -0.042), (4, 0.089), (5, -0.017), (6, 0.003), (7, -0.012), (8, -0.048), (9, 0.062), (10, -0.149), (11, 0.006), (12, 0.009), (13, 0.029), (14, -0.072), (15, 0.019), (16, -0.065), (17, -0.135), (18, -0.07), (19, 0.017), (20, 0.034), (21, 0.039), (22, -0.026), (23, -0.01), (24, -0.133), (25, -0.005), (26, 0.073), (27, 0.024), (28, -0.001), (29, -0.011), (30, -0.001), (31, 0.028), (32, -0.002), (33, 0.028), (34, -0.066), (35, 0.078), (36, -0.008), (37, 0.077), (38, 0.019), (39, 0.059), (40, 0.047), (41, 0.085), (42, -0.049), (43, -0.03), (44, -0.006), (45, 0.03), (46, 0.025), (47, 0.063), (48, 0.009), (49, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.95095819 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning
Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky
Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1
2 0.75492632 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
Author: Vibhav Vineet, Carsten Rother, Philip Torr
Abstract: Many methods have been proposed to solve the problems of recovering intrinsic scene properties such as shape, reflectance and illumination from a single image, and object class segmentation separately. While these two problems are mutually informative, in the past not many papers have addressed this topic. In this work we explore such joint estimation of intrinsic scene properties recovered from an image, together with the estimation of the objects and attributes present in the scene. In this way, our unified framework is able to capture the correlations between intrinsic properties (reflectance, shape, illumination), objects (table, tv-monitor), and materials (wooden, plastic) in a given scene. For example, our model is able to enforce the condition that if a set of pixels take same object label, e.g. table, most likely those pixels would receive similar reflectance values. We cast the problem in an energy minimization framework and demonstrate the qualitative and quantitative improvement in the overall accuracy on the NYU and Pascal datasets. 1
3 0.75171006 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1
4 0.73923326 166 nips-2013-Learning invariant representations and applications to face verification
Author: Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Abstract: One approach to computer object recognition and modeling the brain’s ventral stream involves unsupervised learning of representations that are invariant to common transformations. However, applications of these ideas have usually been limited to 2D affine transformations, e.g., translation and scaling, since they are easiest to solve via convolution. In accord with a recent theory of transformationinvariance [1], we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identitypreserving transformations. The model’s wiring can be learned from videos of transforming objects—or any other grouping of images into sets by their depicted object. Through a series of successively more complex empirical tests, we study the invariance/discriminability properties of this model with respect to different transformations. First, we empirically confirm theoretical predictions (from [1]) for the case of 2D affine transformations. Next, we apply the model to non-affine transformations; as expected, it performs well on face verification tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction. Surprisingly, it can also tolerate clutter “transformations” which map an image of a face on one background to an image of the same face on a different background. Motivated by these empirical findings, we tested the same model on face verification benchmark tasks from the computer vision literature: Labeled Faces in the Wild, PubFig [2, 3, 4] and a new dataset we gathered—achieving strong performance in these highly unconstrained cases as well. 1
5 0.73108876 226 nips-2013-One-shot learning by inverting a compositional causal process
Author: Brenden M. Lake, Ruslan Salakhutdinov, Josh Tenenbaum
Abstract: People can learn a new visual class from just one example, yet machine learning algorithms typically require hundreds or thousands of examples to tackle the same problems. Here we present a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. We evaluated performance on a challenging one-shot classification task, where our model achieved a human-level error rate while substantially outperforming two deep learning models. We also tested the model on another conceptual task, generating new examples, by using a “visual Turing test” to show that our model produces human-like performance. 1
6 0.70862871 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
8 0.67983013 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
9 0.67097241 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
10 0.64697576 84 nips-2013-Deep Neural Networks for Object Detection
11 0.64544511 119 nips-2013-Fast Template Evaluation with Vector Quantization
12 0.6348269 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
13 0.63343745 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections
14 0.61475074 167 nips-2013-Learning the Local Statistics of Optical Flow
15 0.58850813 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
16 0.54981381 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars
18 0.52007031 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
19 0.51505303 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
20 0.50230861 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
topicId topicWeight
[(2, 0.014), (14, 0.011), (16, 0.044), (33, 0.163), (34, 0.09), (41, 0.022), (49, 0.05), (56, 0.07), (65, 0.28), (70, 0.046), (85, 0.026), (89, 0.024), (93, 0.057), (95, 0.016)]
simIndex simValue paperId paperTitle
1 0.81256837 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations
Author: Botond Cseke, Manfred Opper, Guido Sanguinetti
Abstract: We propose an approximate inference algorithm for continuous time Gaussian Markov process models with both discrete and continuous time likelihoods. We show that the continuous time limit of the expectation propagation algorithm exists and results in a hybrid fixed point iteration consisting of (1) expectation propagation updates for discrete time terms and (2) variational updates for the continuous time term. We introduce postinference corrections methods that improve on the marginals of the approximation. This approach extends the classical Kalman-Bucy smoothing procedure to non-Gaussian observations, enabling continuous-time inference in a variety of models, including spiking neuronal models (state-space models with point process observations) and box likelihood models. Experimental results on real and simulated data demonstrate high distributional accuracy and significant computational savings compared to discrete-time approaches in a neural application. 1
same-paper 2 0.7892487 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning
Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky
Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1
3 0.73384637 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
Author: Siwei Lyu, Xin Wang
Abstract: Nonnegative matrix factorization (NMF) is a popular data analysis method, the objective of which is to approximate a matrix with all nonnegative components into the product of two nonnegative matrices. In this work, we describe a new simple and efficient algorithm for multi-factor nonnegative matrix factorization (mfNMF) problem that generalizes the original NMF problem to more than two factors. Furthermore, we extend the mfNMF algorithm to incorporate a regularizer based on the Dirichlet distribution to encourage the sparsity of the components of the obtained factors. Our sparse mfNMF algorithm affords a closed form and an intuitive interpretation, and is more efficient in comparison with previous works that use fix point iterations. We demonstrate the effectiveness and efficiency of our algorithms on both synthetic and real data sets. 1
4 0.68666351 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors
Author: Nitish Srivastava, Ruslan Salakhutdinov
Abstract: High capacity classifiers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classification performance for such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classifier’s parameters. We show that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require large amounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-specified tree, this algorithm modifies the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classification task. Our method achieves state-of-the-art classification results on the CIFAR-100 image data set and the MIR Flickr image-text data set. 1
5 0.61294943 331 nips-2013-Top-Down Regularization of Deep Belief Networks
Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim
Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1
6 0.61219823 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
7 0.61083502 64 nips-2013-Compete to Compute
8 0.60849756 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
9 0.6071592 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes
10 0.60569096 301 nips-2013-Sparse Additive Text Models with Low Rank Background
11 0.60511452 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
12 0.60476887 251 nips-2013-Predicting Parameters in Deep Learning
13 0.60455728 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
14 0.60442817 201 nips-2013-Multi-Task Bayesian Optimization
15 0.60373569 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
16 0.60369784 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
17 0.60354543 183 nips-2013-Mapping paradigm ontologies to and from the brain
18 0.60274291 30 nips-2013-Adaptive dropout for training deep neural networks
19 0.6021437 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
20 0.60163742 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks