nips nips2013 nips2013-37 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. [sent-7, score-0.746]
2 Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. [sent-9, score-0.594]
3 This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. [sent-10, score-0.305]
4 We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. [sent-11, score-0.416]
5 Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. [sent-12, score-0.746]
6 1 Introduction Computer vision has historically been formulated as the problem of producing symbolic descriptions of scenes from input images [10]. [sent-13, score-0.18]
7 This is usually done by building bottom-up processing pipelines that isolate the portions of the image associated with each scene element and extract features that signal its identity. [sent-14, score-0.449]
8 Many pattern recognition and learning techniques can then be used to build classifiers for individual scene elements, and sometimes to learn the features themselves [11, 7]. [sent-15, score-0.307]
9 Bottom-up pipelines that combine image processing and machine learning can identify written characters with high accuracy and recognize objects from large sets of possibilities. [sent-17, score-0.242]
10 Generative models for a range of image parsing tasks are also being explored [17, 4, 18, 22, 20]. [sent-21, score-0.175]
11 But like traditional bottom-up pipelines for vision, these approaches have relied on considerable problem-specific engineering, chiefly to design and/or learn custom inference strategies, such as MCMC proposals [18, 22] that incorporate bottom-up cues. [sent-28, score-0.258]
12 In this paper, we propose a novel formulation of image interpretation problems, called generative probabilstic graphics programming (GPGP). [sent-31, score-0.709]
13 Our probabilistic graphics programs are written in Venture, a probabilistic programming language descended from Church [6]. [sent-33, score-0.774]
14 Unlike typical generative models for scene parsing, inverting our probabilistic graphics programs requires no custom inference algorithm design. [sent-36, score-1.193]
15 Instead, we rely on the automatic Metropolis-Hastings (MH) transition operators provided by our probabilistic programming system. [sent-37, score-0.184]
16 The approximations and stochasticity in our renderer, scene generator and likelihood models serve to implement a variant of approximate Bayesian computation [19, 12]. [sent-38, score-0.531]
17 To the best of our knowledge, our GPGP framework is the first real-world image interpretation formulation to combine all of the following elements: probabilistic programming, automatic inference, computer graphics, and approximate Bayesian computation; this constitutes our main contribution. [sent-40, score-0.356]
18 Our second contribution is to provide demonstrations of the efficacy of this approach on two image interpretation problems: reading snippets of degraded and adversarially obscured alphanumeric characters, and inferring 3D road models from vehicle mounted cameras. [sent-41, score-0.577]
19 GPGP defines generative models for images by combining four components. [sent-44, score-0.166]
20 The first is a stochastic scene generator written as probabilistic code that makes random choices for the location and configuration of the main elements in the scene. [sent-45, score-0.529]
21 The second is an approximate renderer based on existing graphics software that maps a scene S and control variables X to an image IR = f (S, X). [sent-46, score-1.287]
22 The third is a stochastic likelihood model for image data ID that enables scoring of rendered scenes given the control variables. [sent-47, score-0.476]
23 The fourth is a set of latent variables X that control the fidelity of the renderer and/or the tolerance in the stochastic likelihood model. [sent-48, score-0.597]
24 We first give a general description of the generative model and inference algorithm induced by our probabilistic graphics programs; in later sections, we describe specific details for each application. [sent-51, score-0.748]
25 Let S = {Si } be a decomposition of the scene S into parts Si with independent priors P (Si ). [sent-52, score-0.271]
26 Each of our models shares a common template: a stochastic scene generator which samples possible scenes S according to their prior, latent variables X that control the fidelity of the rendering and the tolerance of the model, an approximate render f (S, X) ! [sent-55, score-0.69]
27 IR based on existing graphics software, and a stochastic likelihood model P (ID |IR , X) that links observed rendered images. [sent-56, score-0.637]
28 A scene S ⇤ sampled from the scene generator according to ⇤ P (S) could be rendered onto a single image IR . [sent-57, score-0.916]
29 Instead of requiring exact matches, our formulation can broaden the renderer’s output ⇤ P (IR |S⇤) and the image likelihood P (ID |IR ) via the latent control variables X. [sent-59, score-0.264]
30 Our proposals modify single elements of the scene and control variables at a time, as follows: P (S) = Y P (Si ) 0 0 qi (Si , Si ) = P (Si ) P (X) = i Y P (Xj ) 0 0 qj (Xj , Xj ) = P (Xj ) j Now let K = |{Si }| + |{Xj }| be the total number of random variables in each execution. [sent-62, score-0.345]
31 If k corresponds to a scene variable i, then we propose from qi (Si , Si ), 0 0 0 0 0 so our overall proposal kernel q((S, X) ! [sent-67, score-0.271]
32 In both cases we re-render the scene IR = f (S 0 , X 0 ). [sent-70, score-0.271]
33 (S 0 , X 0 )) We implement our probabilistic graphics programs in the Venture probabilistic programming language. [sent-74, score-0.774]
34 The Metropolis-Hastings inference algorithm we use is provided by default in this system; no custom inference code is required. [sent-75, score-0.288]
35 ABC methods approximate Bayesian inference over complex generative processes by using an exogenous distance function to compare sampled outputs with observed data. [sent-77, score-0.267]
36 Our formulation incorporates a combination of these insights: rendered scenes are only approximately constrained to match the observed image, with the tightness of the match mediated by inference over factors such as the fidelity of the rendering and the stochasticity in the likelihood. [sent-81, score-0.438]
37 3 Figure 2: Four input images from our CAPTCHA corpus, along with the final results and convergence trajectory of typical inference runs. [sent-83, score-0.16]
38 Our probabilistic graphics program did not originally support rotation, which was needed for the AOL CAPTCHAs; adding it required only 1 additional line of probabilistic code. [sent-86, score-0.691]
39 We developed a probabilistic graphics program for reading short snippets of degraded text consisting of arbitrary digits and letters. [sent-89, score-0.787]
40 5 P (Si = x) = P (Si = y) = 0 otherwise 0 otherwise glyph id P (Si = g) = ( 1/G 0 glyph id 0 Si < ✓max otherwise Our renderer rasterizes each letter independently, applies a spatial blur to each image, composites the letters, and then blurs the result. [sent-92, score-1.24]
41 We also applied global blur to the original training image before applying the stochastic likelihood model on the blurred original and rendered images. [sent-93, score-0.588]
42 To assess the accuracy of our approach on adversarially obscured text, we developed a corpus consisting of over 40 images from widely used websites such as TurboTax, E-Trade, and AOL, plus additional challenging synthetic CAPTCHAs with high degrees of letter overlap and superimposed distractors. [sent-99, score-0.286]
43 Each source of text violates the underlying assumptions of our probabilistic graphics program in different ways. [sent-100, score-0.616]
44 TurboTax CAPTCHAs incorporate occlusions that break strokes within 4 (a) (c) (b) (d) (e) (f) Figure 3: Inference over renderer fidelity significantly improves the reliability of inference. [sent-101, score-0.381]
45 (a) Reconstruction errors for 5 runs of two variants of our probabilistic graphics program for text. [sent-102, score-0.573]
46 Without sufficient stochasticity and approximation in the generative model — that is, with a strong prior over a purely deterministic, high-fidelity renderer — inference gets stuck in local energy minima (red lines). [sent-103, score-0.714]
47 With inference over renderer fidelity via per-letter and global blur, the tolerance of the image likelihood, and the number of letters, convergence improves substantially (blue lines). [sent-104, score-0.674]
48 Many local minima in the likelihood are escaped over the course of single-variable inference, and the blur variables are automatically adjusted to support localizing and identifying letters. [sent-105, score-0.324]
49 From left to right, we show overall log probability, pixel-wise disagreement (many local minima are escaped over the course of inference), the number of active letters in the scene, and the per-letter blur variables. [sent-108, score-0.299]
50 Inference automatically adjusts blur so that newly proposed letters are often blurred out until they are localized and identified accurately. [sent-109, score-0.251]
51 The dynamically-adjustable fidelity of our approximate renderer and the high stochasticity of our generative model appear to be necessary for inference to robustly escape local minima. [sent-112, score-0.711]
52 We have observed a kind of self-tuning annealing resulting from inference over the control variables; see Figure 3 for an illustration. [sent-113, score-0.148]
53 We observe robust character recognition given enough inference, with an overall character detection rate of 70. [sent-114, score-0.154]
54 To calibrate the difficulty of our corpus, we also ran the Tesseract optical character recognition engine [16] on our corpus; its character detection rate was 37. [sent-116, score-0.154]
55 We have also developed a generative probabilistic graphics program for localizing roads in 3D from single images. [sent-119, score-0.691]
56 As with many perception problems in robotics, there is clear scene structure to exploit, but also considerable uncertainty about the scene, as well as substantial image-to-image variability that needs to be robustly ignored. [sent-121, score-0.271]
57 The probabilistic graphics program we use for this problem is shown in Figure 7. [sent-123, score-0.573]
58 The latent scene S is comprised of the height of the roadway from the ground plane, the road’s width and lane size, and the 3D offset of the corner of the road from the (arbitrary) camera location. [sent-124, score-0.609]
59 The prior encodes assumption that the lanes are small relative to the road, and that the road has two lanes and is very likely to be visible (but may not be centered). [sent-125, score-0.289]
60 This scene is then rendered to produce a surface-based segmentation image, that assigns each input pixel to one of 4 regions r 2 R = {left o↵road, right o↵road, road, lane}. [sent-126, score-0.455]
61 Rendering is done for each scene element separately, followed by compositing, as with our 2D text program. [sent-127, score-0.314]
62 Extensions to richer road and ground geometries are an interesting direction for future work. [sent-129, score-0.221]
63 ASSUME blur (mem (lambda (id) (* 7 (beta 1 2)))) ASSUME global_blur (* 7 (beta 1 2)) ASSUME data_blur (* 7 (beta 1 2)) ASSUME epsilon (gamma 1 1) ASSUME data (load_image "captcha_1. [sent-134, score-0.225]
64 (is_present 10)) OBSERVE (incorporate_stochastic_likelihood data image epsilon) True Figure 4: A generative probabilistic graphics program for reading degraded text. [sent-138, score-0.933]
65 The scene generator chooses letter identity (A-Z and digits 0-9), position, size and rotation at random. [sent-139, score-0.518]
66 These random variables are fed into the renderer, along with the bandwidths of a series of spatial blur kernels (one per letter, another for the overall rendered image from generative model and another for the original input image). [sent-140, score-0.633]
67 These blur kernels control the fidelity of the rendered image. [sent-141, score-0.364]
68 The image returned by the renderer is compared to the data via a pixel-wise Gaussian likelihood model, whose variance is also an unknown variable. [sent-142, score-0.574]
69 ence is that our framework relies on automatic inference techniques, is representationally richer due to compact model description and goes beyond point estimates to report posterior uncertainty. [sent-143, score-0.213]
70 We used these clusters to build a compact appearance model based on cluster-center histograms, by assigning text image pixels to their nearest cluster. [sent-145, score-0.279]
71 Our stochastic likelihood incorporates these histograms, by multiplying together the appearance probabilities for each image region ~ r 2 R. [sent-147, score-0.334]
72 IR =r ID(x,y) ✓r +✏ Zr Figure 5f shows appearance model histograms from one random training frame. [sent-152, score-0.135]
73 Figure 5c shows the extremely noisy lane/non-lane classifications that result from the appearance model on its own, without our scene prior; accuracy is extremely low. [sent-153, score-0.401]
74 Other, richer appearance models, such as Gaussian mixtures over RGB values (which could be either hand specified or learned), are compatible with our formulation; our simple, quantized model was chosen primarily for simplicity. [sent-154, score-0.168]
75 We use the same generic Metropolis-Hastings strategy for inference in this problem as in our text application. [sent-155, score-0.155]
76 Although deterministic search strategies for MAP inference could be developed for this particular program, it is less clear how to build a single deterministic search algorithm that could work on both of the generative probabilistic graphics programs we present. [sent-156, score-0.858]
77 In Table 1, we report the accuracy of our approach on one road dataset from the KITTI Vision Benchmark Suite [5]. [sent-157, score-0.221]
78 We report lane/non-lane accuracy results for maximum likelihood classification over 10 appearance models (from 10 randomly chosen training images), as well as for the single best appearance model from this set. [sent-160, score-0.291]
79 This baseline system requires significant 3D a priori knowledge, including 6 (a) (b) (c) (d) (e) (f) Figure 5: An illustration of generative probabilistic graphics for 3D road finding. [sent-163, score-0.829]
80 (a) Renderings of random samples from our scene prior, showing the surface-based image segmentation induced by each sample. [sent-164, score-0.45]
81 (c) Maximum likelihood lane/non-lane classification of the images from (b) based solely on the best-performing singletraining-frame appearance model (ignoring latent geometry). [sent-166, score-0.244]
82 Geometric constraints are clearly needed for reliable road finding. [sent-167, score-0.193]
83 (e) Typical inference results from the proposed generative probabilistic graphics approach on the images from (b). [sent-169, score-0.796]
84 (f) Appearance model histograms (over quantized RGB values) from the best-performing single-training-frame appearance model for all four region types: lane, left offroad, right offroad and road. [sent-170, score-0.209]
85 In contrast, our approach has to infer these aspects of the scene from the image data. [sent-173, score-0.405]
86 We also show some uncertainty estimates that result from approximate Bayesian inference in Figure 6. [sent-174, score-0.149]
87 Our probabilistic graphics program for this problem requires under 20 lines of probabilistic code. [sent-175, score-0.691]
88 5 Discussion We have shown that it is possible to write short probabilistic graphics programs that use simple 2D and 3D computer graphics techniques as the backbone for highly approximate generative models. [sent-176, score-1.183]
89 Approximate Bayesian inference over the execution histories of these probabilistic graphics 7 ASSUME road_width (uniform_discrete 5 8) //arbitrary units ASSUME road_height (uniform_discrete 70 150) ASSUME lane_pos_x (uniform_continuous -1. [sent-177, score-0.63]
90 60% Table 1: Quantitative results for lane detection accuracy on one of the road datasets in the KITTI Vision Benchmark Suite [5]. [sent-210, score-0.3]
91 programs — automatically implemented via generic, single-variable Metropolis-Hastings transitions, using existing rendering libraries and simple likelihoods — then implements a new variation on analysis by synthesis [21]. [sent-212, score-0.192]
92 We have also shown that this approach can yield accurate, globally consistent interpretations of real-world images, and can coherently report posterior uncertainty over latent scenes when appropriate. [sent-213, score-0.139]
93 To scale our inference approach to handle more complex scenes, it will likely be important to consider more complex forms of automatic inference, beyond the single-variable Metropolis-Hastings proposals we currently use. [sent-215, score-0.188]
94 For example, discriminatively trained proposals could help, and in fact could be trained based on forward executions of the probabilistic graphics program. [sent-216, score-0.556]
95 Appearance models derived from modern image features and texture descriptors [14, 7, 11] — going beyond the simple quantizations we currently use — could also reduce the burden on inference and improve the generalizability of individual programs. [sent-217, score-0.246]
96 It is important to note that the high dimensionality involved in probabilistic graphics programming does not necessarily mean inference (and even automatic inference) is impossible. [sent-218, score-0.696]
97 For example, approximate inference in models with probabilities bounded away from 0 and 1 can sometimes be provably tractable via sampling techniques, with runtimes that depend on factors other than dimensionality [3]. [sent-219, score-0.149]
98 The most interesting potential of GPGP lies in bringing graphics representations and algorithms to bear on the hard modeling and inference problems in vision. [sent-221, score-0.512]
99 For example, to avoid global rerendering after each inference step, we need to represent and exploit the conditional independencies between latent scene elements and image regions. [sent-222, score-0.552]
100 We hope the GPGP framework facilitates image analysis by Bayesian inversion of rich graphics algorithms for scene generation and image synthesis. [sent-224, score-0.939]
wordName wordTfidf (topN-words)
[('graphics', 0.4), ('renderer', 0.381), ('scene', 0.271), ('gpgp', 0.217), ('road', 0.193), ('blur', 0.189), ('id', 0.174), ('ir', 0.164), ('captcha', 0.145), ('mem', 0.145), ('rendered', 0.139), ('image', 0.134), ('glyph', 0.127), ('delity', 0.121), ('probabilistic', 0.118), ('generative', 0.118), ('inference', 0.112), ('lambda', 0.111), ('programs', 0.11), ('appearance', 0.102), ('generator', 0.101), ('si', 0.097), ('captchas', 0.091), ('lane', 0.079), ('aol', 0.072), ('scenes', 0.069), ('frame', 0.069), ('letter', 0.068), ('custom', 0.064), ('kitti', 0.064), ('vision', 0.063), ('stochasticity', 0.063), ('degraded', 0.061), ('likelihood', 0.059), ('character', 0.059), ('program', 0.055), ('abc', 0.055), ('rendering', 0.055), ('tesseract', 0.054), ('turbotax', 0.054), ('xj', 0.054), ('bandwidths', 0.053), ('images', 0.048), ('lanes', 0.048), ('obscured', 0.048), ('suite', 0.048), ('bayesian', 0.048), ('reading', 0.047), ('tolerance', 0.047), ('beta', 0.047), ('segmentation', 0.045), ('pipelines', 0.044), ('rotation', 0.044), ('text', 0.043), ('parsing', 0.041), ('minima', 0.04), ('stochastic', 0.039), ('automatic', 0.038), ('proposals', 0.038), ('rgb', 0.038), ('quantized', 0.038), ('approximate', 0.037), ('adversarially', 0.036), ('characters', 0.036), ('eps', 0.036), ('epsilon', 0.036), ('escaped', 0.036), ('offroad', 0.036), ('vikash', 0.036), ('xblur', 0.036), ('zhuowen', 0.036), ('control', 0.036), ('recognition', 0.036), ('latent', 0.035), ('posterior', 0.035), ('letters', 0.034), ('digits', 0.034), ('histograms', 0.033), ('alexei', 0.032), ('venture', 0.032), ('zr', 0.032), ('breaking', 0.031), ('camera', 0.031), ('gamma', 0.03), ('corpus', 0.03), ('bonawitz', 0.029), ('derek', 0.029), ('martial', 0.029), ('snippets', 0.029), ('interpretation', 0.029), ('list', 0.028), ('accuracy', 0.028), ('richer', 0.028), ('software', 0.028), ('superimposed', 0.028), ('blurred', 0.028), ('church', 0.028), ('mansinghka', 0.028), ('programming', 0.028), ('likelihoods', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1
2 0.15998206 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
Author: Haichao Zhang, David Wipf
Abstract: Typical blur from camera shake often deviates from the standard uniform convolutional assumption, in part because of problematic rotations which create greater blurring away from some unknown center point. Consequently, successful blind deconvolution for removing shake artifacts requires the estimation of a spatiallyvarying or non-uniform blur operator. Using ideas from Bayesian inference and convex analysis, this paper derives a simple non-uniform blind deblurring algorithm with a spatially-adaptive image penalty. Through an implicit normalization process, this penalty automatically adjust its shape based on the estimated degree of local blur and image structure such that regions with large blur or few prominent edges are discounted. Remaining regions with modest blur and revealing edges therefore dominate on average without explicitly incorporating structureselection heuristics. The algorithm can be implemented using an optimization strategy that is virtually tuning-parameter free and simpler than existing methods, and likely can be applied in other settings such as dictionary learning. Detailed theoretical analysis and empirical comparisons on real images serve as validation.
3 0.10532411 226 nips-2013-One-shot learning by inverting a compositional causal process
Author: Brenden M. Lake, Ruslan Salakhutdinov, Josh Tenenbaum
Abstract: People can learn a new visual class from just one example, yet machine learning algorithms typically require hundreds or thousands of examples to tackle the same problems. Here we present a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. We evaluated performance on a challenging one-shot classification task, where our model achieved a human-level error rate while substantially outperforming two deep learning models. We also tested the model on another conceptual task, generating new examples, by using a “visual Turing test” to show that our model produces human-like performance. 1
4 0.090340376 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
Author: Carl Doersch, Abhinav Gupta, Alexei A. Efros
Abstract: Recent work on mid-level visual representations aims to capture information at the level of complexity higher than typical “visual words”, but lower than full-blown semantic objects. Several approaches [5, 6, 12, 23] have been proposed to discover mid-level visual elements, that are both 1) representative, i.e., frequently occurring within a visual dataset, and 2) visually discriminative. However, the current approaches are rather ad hoc and difficult to analyze and evaluate. In this work, we pose visual element discovery as discriminative mode seeking, drawing connections to the the well-known and well-studied mean-shift algorithm [2, 1, 4, 8]. Given a weakly-labeled image collection, our method discovers visually-coherent patch clusters that are maximally discriminative with respect to the labels. One advantage of our formulation is that it requires only a single pass through the data. We also propose the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches, and compare our method against prior work on the Paris Street View dataset of [5]. We also evaluate our method on the task of scene classification, demonstrating state-of-the-art performance on the MIT Scene-67 dataset. 1
5 0.079741322 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning
Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky
Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1
6 0.070520513 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
7 0.070258178 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
8 0.067163505 166 nips-2013-Learning invariant representations and applications to face verification
9 0.064943343 299 nips-2013-Solving inverse problem of Markov chain with partial observations
10 0.063917719 119 nips-2013-Fast Template Evaluation with Vector Quantization
11 0.061249893 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
12 0.060958169 84 nips-2013-Deep Neural Networks for Object Detection
13 0.060485203 161 nips-2013-Learning Stochastic Inverses
14 0.060184006 318 nips-2013-Structured Learning via Logistic Regression
15 0.059579298 149 nips-2013-Latent Structured Active Learning
16 0.05953769 211 nips-2013-Non-Linear Domain Adaptation with Boosting
17 0.059087489 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
18 0.058566593 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
19 0.05821339 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
20 0.056817189 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
topicId topicWeight
[(0, 0.146), (1, 0.058), (2, -0.096), (3, -0.039), (4, 0.062), (5, -0.014), (6, 0.028), (7, 0.009), (8, -0.001), (9, -0.003), (10, -0.065), (11, -0.001), (12, 0.013), (13, 0.015), (14, -0.051), (15, 0.035), (16, -0.052), (17, -0.113), (18, -0.045), (19, 0.025), (20, -0.034), (21, 0.019), (22, -0.003), (23, -0.008), (24, -0.053), (25, 0.045), (26, 0.064), (27, -0.026), (28, 0.011), (29, 0.005), (30, 0.077), (31, 0.069), (32, -0.008), (33, 0.07), (34, -0.059), (35, 0.018), (36, 0.038), (37, 0.046), (38, 0.064), (39, -0.008), (40, 0.026), (41, 0.107), (42, -0.046), (43, -0.093), (44, -0.053), (45, -0.025), (46, 0.043), (47, 0.065), (48, 0.058), (49, -0.095)]
simIndex simValue paperId paperTitle
same-paper 1 0.93707371 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1
2 0.7817536 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
Author: Haichao Zhang, David Wipf
Abstract: Typical blur from camera shake often deviates from the standard uniform convolutional assumption, in part because of problematic rotations which create greater blurring away from some unknown center point. Consequently, successful blind deconvolution for removing shake artifacts requires the estimation of a spatiallyvarying or non-uniform blur operator. Using ideas from Bayesian inference and convex analysis, this paper derives a simple non-uniform blind deblurring algorithm with a spatially-adaptive image penalty. Through an implicit normalization process, this penalty automatically adjust its shape based on the estimated degree of local blur and image structure such that regions with large blur or few prominent edges are discounted. Remaining regions with modest blur and revealing edges therefore dominate on average without explicitly incorporating structureselection heuristics. The algorithm can be implemented using an optimization strategy that is virtually tuning-parameter free and simpler than existing methods, and likely can be applied in other settings such as dictionary learning. Detailed theoretical analysis and empirical comparisons on real images serve as validation.
3 0.76938355 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning
Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky
Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects defined as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1
4 0.71933609 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
Author: Zhenwen Dai, Georgios Exarchakis, Jörg Lücke
Abstract: We study optimal image encoding based on a generative approach with non-linear feature combinations and explicit position encoding. By far most approaches to unsupervised learning of visual features, such as sparse coding or ICA, account for translations by representing the same features at different positions. Some earlier models used a separate encoding of features and their positions to facilitate invariant data encoding and recognition. All probabilistic generative models with explicit position encoding have so far assumed a linear superposition of components to encode image patches. Here, we for the first time apply a model with non-linear feature superposition and explicit position encoding for patches. By avoiding linear superpositions, the studied model represents a closer match to component occlusions which are ubiquitous in natural images. In order to account for occlusions, the non-linear model encodes patches qualitatively very different from linear models by using component representations separated into mask and feature parameters. We first investigated encodings learned by the model using artificial data with mutually occluding components. We find that the model extracts the components, and that it can correctly identify the occlusive components with the hidden variables of the model. On natural image patches, the model learns component masks and features for typical image components. By using reverse correlation, we estimate the receptive fields associated with the model’s hidden units. We find many Gabor-like or globular receptive fields as well as fields sensitive to more complex structures. Our results show that probabilistic models that capture occlusions and invariances can be trained efficiently on image patches, and that the resulting encoding represents an alternative model for the neural encoding of images in the primary visual cortex. 1
5 0.71802759 226 nips-2013-One-shot learning by inverting a compositional causal process
Author: Brenden M. Lake, Ruslan Salakhutdinov, Josh Tenenbaum
Abstract: People can learn a new visual class from just one example, yet machine learning algorithms typically require hundreds or thousands of examples to tackle the same problems. Here we present a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image. We evaluated performance on a challenging one-shot classification task, where our model achieved a human-level error rate while substantially outperforming two deep learning models. We also tested the model on another conceptual task, generating new examples, by using a “visual Turing test” to show that our model produces human-like performance. 1
6 0.70416772 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
7 0.64459223 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
8 0.64198196 167 nips-2013-Learning the Local Statistics of Optical Flow
9 0.61762071 166 nips-2013-Learning invariant representations and applications to face verification
10 0.61666006 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
11 0.57829148 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
12 0.57743639 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars
13 0.57499498 84 nips-2013-Deep Neural Networks for Object Detection
14 0.55737364 119 nips-2013-Fast Template Evaluation with Vector Quantization
15 0.54680532 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections
16 0.51838589 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
17 0.50930929 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
18 0.4973231 160 nips-2013-Learning Stochastic Feedforward Neural Networks
19 0.49512365 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
20 0.4930205 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
topicId topicWeight
[(2, 0.013), (16, 0.032), (33, 0.162), (34, 0.115), (41, 0.018), (49, 0.025), (56, 0.056), (70, 0.041), (83, 0.285), (85, 0.049), (89, 0.021), (93, 0.073), (95, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.79550397 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1
2 0.76098454 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement
Author: Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena
Abstract: We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalizability of our algorithm on a variety of grocery checkout tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment.1 1
3 0.72544092 5 nips-2013-A Deep Architecture for Matching Short Texts
Author: Zhengdong Lu, Hang Li
Abstract: Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufficient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More specifically, we apply this model to matching tasks in natural language, e.g., finding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the state-of-the-art models. 1
4 0.6127969 201 nips-2013-Multi-Task Bayesian Optimization
Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams
Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1
5 0.61059028 183 nips-2013-Mapping paradigm ontologies to and from the brain
Author: Yannick Schwartz, Bertrand Thirion, Gael Varoquaux
Abstract: Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied by the activation of brain regions, it is necessary to combine a wide exploration of the various brain functions and some inversion of the statistical inference. Here we introduce a methodology for accumulating knowledge towards a bidirectional link between observed brain activity and the corresponding function. We rely on a large corpus of imaging studies and a predictive engine. Technically, the challenges are to find commonality between the studies without denaturing the richness of the corpus. The key elements that we contribute are labeling the tasks performed with a cognitive ontology, and modeling the long tail of rare paradigms in the corpus. To our knowledge, our approach is the first demonstration of predicting the cognitive content of completely new brain images. To that end, we propose a method that predicts the experimental paradigms across different studies. 1
6 0.61054617 251 nips-2013-Predicting Parameters in Deep Learning
7 0.60936517 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
8 0.60890478 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes
9 0.60826993 64 nips-2013-Compete to Compute
10 0.60800189 166 nips-2013-Learning invariant representations and applications to face verification
11 0.60637671 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
12 0.60572493 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
13 0.60566801 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
14 0.60480762 99 nips-2013-Dropout Training as Adaptive Regularization
15 0.60450917 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
16 0.60422903 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models
17 0.60389316 173 nips-2013-Least Informative Dimensions
18 0.60324287 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification
19 0.60306585 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
20 0.60297489 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising