nips nips2009 nips2009-175 knowledge-graph by maker-knowledge-mining

175 nips-2009-Occlusive Components Analysis


Source: pdf

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. [sent-12, score-0.236]

2 This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. [sent-13, score-0.428]

3 We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. [sent-14, score-0.585]

4 However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. [sent-16, score-0.327]

5 Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. [sent-18, score-0.374]

6 Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. [sent-19, score-0.572]

7 The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. [sent-20, score-0.319]

8 1 Introduction A long-standing goal of unsupervised learning on images is to be able to learn the shape and form of objects from unlabelled scenes. [sent-21, score-0.326]

9 Any individual “hidden cause” is rarely active, corresponding to the small number of objects present in any one image. [sent-24, score-0.208]

10 Perhaps the most crucial is that in the underlying latent variable models, objects or parts thereof, combine linearly to form the image. [sent-26, score-0.263]

11 In real images the combination of individual objects depends on their relative distance from the camera or eye. [sent-27, score-0.338]

12 If two objects occupy the same region in planar space, the nearer one occludes the other, i. [sent-28, score-0.31]

13 , the hidden causes non-linearly compete to determine the pixel values in the region of overlap. [sent-30, score-0.262]

14 The idea of using many hidden “cause” variables to control the presence or absence of objects is retained, but these variables are augmented by another set of latent variables which determine the relative 1 depth of the objects, much as in the z-buffer employed by computer graphics. [sent-32, score-0.494]

15 In turn, this enables the simplistic linear combination rule to be replaced by one in which nearby objects occlude those that are more distant. [sent-33, score-0.36]

16 Prominent probabilistic approaches [3, 4] assign pixels in multiple images taken from the same scene to a fixed number of image layers. [sent-37, score-0.288]

17 However, they model, in contrast to our approach, data in which objects maintain a fixed position in depth relative to the other objects. [sent-40, score-0.327]

18 The first is a set of variables which controls the presence or absence of objects in a particular image (this part will be analogous, e. [sent-42, score-0.374]

19 The second is a variable which controls the relative depths of the objects that are present. [sent-45, score-0.208]

20 The third is the combination rule which describes how closer active objects occlude more distant ones. [sent-46, score-0.464]

21 To model the presence or absence of an object we use H binary hidden variables s1 , . [sent-47, score-0.297]

22 We assume that the presence of one object is independent of the presence of the others and assume, for simplicity, equal probabilities π for objects to be present: p(s | π) = H h=1 Bernoulli(sh ; π) = H h=1 π sh (1 − π)1−sh . [sent-51, score-0.613]

23 (1) Objects in a real image can be ordered by their depth and it is this ordering which determines which of two overlapping objects occludes the other. [sent-52, score-0.466]

24 But then, because the depth of absent objects (sh = 0) is irrelevant, no more than |s|! [sent-64, score-0.327]

25 ˆ A B object objects permutation image Figure 1: A Illustration of how two object masks and features combine to generate an image (generation without noise). [sent-66, score-0.905]

26 ˆ The final stage of the generative model describes how to produce the image given a selection of active causes and an ordering in relative depth of these causes. [sent-68, score-0.452]

27 One approach would be to choose the closest object and to set the image equal to the feature vector associated with this object. [sent-69, score-0.293]

28 What is missing from this description is a notion of the extent of an object and the fact that it might only contribute to a local selection of pixels in an image. [sent-71, score-0.266]

29 One set of parameters, W ∈ ÊH×D , describes what contribution an object makes to each pixel (D is the number of pixels). [sent-73, score-0.253]

30 , WhD ) is therefore described as the mask of object h. [sent-77, score-0.268]

31 1A an object h with sh = 1 occupies all image pixels with Whd = 1 and does not occupy pixels with Whd = 0. [sent-91, score-0.669]

32 The function τ maps all causes h with sh = 0 to zero while all other causes are mapped to values within the interval [1, 2] (see Fig. [sent-95, score-0.437]

33 For a given pixel d the h sh A B h sh τ (S, h) τ (S, h) τ h sh τ C τ (S, h) Figure 2: Visualization of the mapping τ . [sent-98, score-0.582]

34 τ combination rule (3) simply states that of all objects with Whd = 1, the most proximal is used to set the pixel property. [sent-100, score-0.355]

35 (5) However, as there is usually a large number of objects that can potentially be present in the training images, and as the likelihood involves summing over all combinations of objects and associated orderings, the computation of (5) is typically intractable. [sent-113, score-0.452]

36 Moreover, even if it were tractably computable, optimization of the likelihood is made problematic by an analytical intractability arising from the fact that the occlusion non-linearity is non-differentiable. [sent-114, score-0.274]

37 The free-energy can thus be written as: N F(Θ, q) qn (S , Θ′ ) = n=1 log p(Y (n) | S, Θ) + log p(S | Θ) S 3 + H(q) , (6) where the function H(q) = − n S qn (S , Θ′ ) log(qn (S , Θ′ )) (the Shannon entropy) is independent of Θ. [sent-129, score-0.388]

38 Unfortunately, this standard procedure is not directly applicable because of the non-linear nature of occlusion as reflected by the combination rule (3). [sent-139, score-0.262]

39 For large values of ρ the following holds: N ∂ F(Θ, q) ∂Wid ≈ ∂ F(Θ, q) ∂Tic ≈ ∂ T ρ d (S, Θ) ∂Wid qn (S , Θ′ ) n=1 S N D S f y (n) , T ρ d (S, Θ) ∂ T ρ d (S, Θ) ∂Tic qn (S , Θ′ ) n=1 T d=1 , (9) T f y (n) , T ρ d (S, Θ) , (10) ∂ log p(y (n) | t ) = −σ −2 (y (n) − t ). [sent-150, score-0.388]

40 Equations (11), together with the exact posterior qn (S, Θ′ ) = p(S | y (n) , Θ′ ), represent a maximumlikelihood based learning algorithm for the generative model (1) to (4). [sent-154, score-0.237]

41 The crucial entities that have to be computed for update equations (11) are the sufficient statistics Aid (S, W ) qn , i. [sent-162, score-0.237]

42 In order to derive a computationally tractable learning algorithm the expectation Aid (S, W ) qn is re-written and approximated as follows, p(S, Y (n) | Θ′ ) Aid (S, W ) p(S, Y (n) | Θ′ ) Aid (S, W ) Aid (S, W ) qn = S ≈ ˜ p(S, Y (n) | Θ′ ) ˜ S S,(|s|≤χ) ˜ p(S, Y (n) | Θ′ ) . [sent-165, score-0.388]

43 Also in the case of occlusion we will later see that in numerical experiments using approximation (13) the true generating causes are indeed recovered. [sent-173, score-0.376]

44 In all the experiments we use image pixels as input variables yd . [sent-175, score-0.311]

45 The entries of the observed variables yd are set by the pixels’ rgb-color vector, yd ∈ [0, 1]3 . [sent-176, score-0.218]

46 In all trials of all experiments the initial values of the mask parameters Whd and c the feature parameters Th were independently and uniformly drawn from the interval [0, 1]. [sent-177, score-0.338]

47 For the sufficient statistics Aid (S, W ) qn we used approximation (13) with Aρ (S, W ) in id (8) instead of Aid (S, W ) and with χ = 3 if not stated otherwise. [sent-185, score-0.251]

48 The component extraction capabilities of the model were tested using the colored bars test. [sent-192, score-0.479]

49 This test is a generalization of the classical bars test [11] which has become a popular benchmark task for non-linear component extraction. [sent-193, score-0.354]

50 In the standard bars test with H = 8 bars the input data are 16-dimensional vectors, representing a 4 × 4 grid of pixels, i. [sent-194, score-0.546]

51 The single bars appear at the 4 vertical and 4 horizontal positions. [sent-197, score-0.273]

52 For the colored bars test, the bars gen have colors Th which are independently and uniformly drawn from the rgb-color-cube [0, 1]3 . [sent-198, score-0.727]

53 For each image a bar appears independently with a probability π = 2 which results in two bars per image on average (the standard 8 value in the literature). [sent-200, score-0.564]

54 For the bars active in an image, a ranking in depth is randomly and uniformly chosen from the permutation group. [sent-201, score-0.471]

55 The color of each pixel is determined by the least distant bar and is black if the pixel is occupied by no bar. [sent-202, score-0.314]

56 The learning algorithms were applied to the colored bars test with H = 8 hidden units and D = 16 input units. [sent-205, score-0.405]

57 The observation noise was set 5 C A B W T iteration 1 20 40 100 Figure 3: Application to the colored bars test. [sent-206, score-0.338]

58 C Feature vectors at the iterations in B displayed as points in color space (for visualization we used the 2-D hue and saturation plane of the HSV color space). [sent-210, score-0.218]

59 As can be observed, the mask value W and the feature values T converged to values close to the generating ones. [sent-219, score-0.25]

60 05) the algorithms converges to values representing all causes in 48 of 50 trials (96% reliability). [sent-221, score-0.22]

61 A maximum of three causes (on average) were used for the noiseless bars test. [sent-223, score-0.441]

62 This is considered a difficult task in the standard bars test. [sent-224, score-0.273]

63 In particular, if all bar colors are white, T = (1, 1, 1)T , the classical version of the bars test is recovered. [sent-233, score-0.414]

64 When the generating parameters were as above (eight bars, probability of a bar to be present 2 , N = 500), all bars were successfully extracted in 42 of 50 trials (84% reliability). [sent-235, score-0.55]

65 For 8 2 a bars test with ten bars, D = 5 × 5, a probability of 10 for each bar to be present, and N = 500 data points, the algorithm with model parameters as above extracted all bars in 43 of 50 trials (86% reliability; mean number of extracted bars 9. [sent-236, score-1.11]

66 For N = 1000 instead of 500 reliability increased to 94% (50 trials; mean number of extracted bars 9. [sent-239, score-0.525]

67 The bars test with ten bars is probably the one most frequently found in the literature. [sent-241, score-0.546]

68 One possible criticism of the bars tests above is that the bars are relatively simple objects. [sent-248, score-0.546]

69 Sized objects were taken from the COIL100 dataset [15] with relatively uniform color distribution (objects 2, 4, 47, 78, 94, 97; all with zero degree rotation). [sent-250, score-0.286]

70 The images were scaled down to 15 × 15 pixels and randomly placed on a black background image of 25 × 25 pixels. [sent-251, score-0.288]

71 Downscaling introduced blurred object edges and to remove this effect dark pixels were set to black. [sent-252, score-0.266]

72 The training images were generated with each object being 6 C A B W T iteration 1 10 25 50 100 Figure 4: Application to images of cluttered objects. [sent-253, score-0.378]

73 We applied the learning algorithm with H = 6, an initial temperature 1 ˆ for annealing of T init = 4 D, and parameters as above otherwise. [sent-263, score-0.215]

74 As can be observed, the mask values converged to represent the different objects, and the feature vectors converged to values representing the mean object color. [sent-266, score-0.467]

75 Note that the model is not matched to the dataset as each object has a fixed distribution of color values which is a poor match to a Gaussian distribution with a constant color mean. [sent-267, score-0.314]

76 The model reacted by assigning part of the real color distribution to the mask values which are responsible for the 3-dimensional appearance of the masks (see Fig. [sent-268, score-0.307]

77 In 42 of the trials (84%) the algorithm converged to values representing all six objects together with appropriate values for their mean colors. [sent-272, score-0.359]

78 In seven trials the algorithm converged to a local optima (average number of extracted objects was 5. [sent-273, score-0.409]

79 In 50 trials with 8 objects (we added objects 36 and 77 of the COIL-100 database) an algorithm with same parameters but H = 8 extracted all objects in 40 of the trials (reliability 80%, average number of extracted objects 7. [sent-275, score-1.141]

80 5 Discussion We have studied learning in the generative model of occlusion (1) to (4). [sent-277, score-0.221]

81 Parameters can be optimized given a collection of N images in which different sets of causes are present at different positions in depth. [sent-278, score-0.26]

82 Typically, the algorithms are applied to data which consist of images that have a small number of foreground objects (usually one or two) on a static or slowly changing background. [sent-284, score-0.342]

83 The additional hidden variable used for object arrangements allows our model to be applied to images of cluttered scenes. [sent-289, score-0.423]

84 , it assumes that each object has the same depth position in all training images. [sent-292, score-0.277]

85 The difficulty of the data would become obvious if all pixels in each image of the data set were permuted by a fixed permutation map. [sent-297, score-0.249]

86 , towards systems that can learn from video data in which objects change their positions in depth. [sent-300, score-0.25]

87 In the class of multiple-causes approaches our model is the first to generalize the combination rule to one that models occlusion explicitly. [sent-309, score-0.262]

88 This required an additional variable for depth and the introduction of two sets of parameters: masks and features. [sent-310, score-0.238]

89 Note that in the context of multiple-causes models, masks have recently been introduced in conjunction with ICA [17] in order to model local contrast correlation in image patches. [sent-311, score-0.213]

90 For our model, the combination of masks and vectorial feature parameters allow for applications to more general sets of data than those used for classical component extraction. [sent-312, score-0.318]

91 The reported results for the standard bars test show the competitiveness of our approach despite its larger set of parameters [compare, e. [sent-316, score-0.306]

92 Possible alternatives are, however, Gabor feature vectors which model object textures (see, e. [sent-325, score-0.271]

93 Furthermore, individual prior parameters for the frequency of object appearances could be introduced. [sent-339, score-0.223]

94 An easy alteration would be, for instance, to always map one specific hidden unit to the most distant position in depth in order to model a background. [sent-342, score-0.226]

95 Especially for images of objects, changes in planar component positions have to be addressed in general. [sent-345, score-0.21]

96 Possible approaches that have been used in the literature can, for instance, be found in [3, 4] in the context of occlusion modeling, in [20] in the context of NMF, and in [18] in the context of object recognition. [sent-346, score-0.336]

97 In summary, the studied occlusion model advances generative modeling approaches to visual data by explicitly modeling object arrangements in depth. [sent-352, score-0.443]

98 The approach complements established approaches of occlusion modeling in the literature by generalizing standard approaches to multiplecauses component extraction. [sent-353, score-0.231]

99 Learning the parts of objects by non-negative matrix factorization. [sent-368, score-0.208]

100 Greedy learning of multiple objects in images using robust statistics and factorial learning. [sent-382, score-0.294]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('whd', 0.476), ('bars', 0.273), ('objects', 0.208), ('wid', 0.204), ('qn', 0.194), ('aid', 0.192), ('occlusion', 0.178), ('sh', 0.173), ('reliability', 0.17), ('object', 0.158), ('nmf', 0.136), ('causes', 0.132), ('masks', 0.119), ('depth', 0.119), ('em', 0.115), ('frankfurt', 0.113), ('mask', 0.11), ('yd', 0.109), ('pixels', 0.108), ('th', 0.098), ('image', 0.094), ('sc', 0.092), ('trials', 0.088), ('extraction', 0.088), ('images', 0.086), ('annealing', 0.082), ('color', 0.078), ('init', 0.073), ('bar', 0.07), ('ica', 0.069), ('occlude', 0.068), ('sprites', 0.068), ('hidden', 0.067), ('colored', 0.065), ('arrangements', 0.064), ('pixel', 0.063), ('converged', 0.063), ('intractability', 0.06), ('id', 0.057), ('component', 0.053), ('extracted', 0.05), ('occluded', 0.048), ('foreground', 0.048), ('cluttered', 0.048), ('permutation', 0.047), ('generation', 0.046), ('cke', 0.045), ('henniges', 0.045), ('mca', 0.045), ('occludes', 0.045), ('tit', 0.045), ('combination', 0.044), ('equations', 0.043), ('generative', 0.043), ('colors', 0.043), ('positions', 0.042), ('feature', 0.041), ('ti', 0.04), ('distant', 0.04), ('gatsby', 0.04), ('rule', 0.04), ('gen', 0.04), ('maneesh', 0.04), ('landscape', 0.04), ('textures', 0.04), ('wersing', 0.04), ('presence', 0.037), ('wh', 0.036), ('turner', 0.036), ('generating', 0.036), ('likelihood', 0.036), ('noiseless', 0.036), ('absence', 0.035), ('tic', 0.034), ('ucl', 0.034), ('parameters', 0.033), ('independently', 0.033), ('approximations', 0.033), ('decreased', 0.033), ('active', 0.032), ('describes', 0.032), ('vectors', 0.032), ('appearances', 0.032), ('unlabelled', 0.032), ('seemed', 0.032), ('increased', 0.032), ('degeneracy', 0.031), ('later', 0.03), ('displayed', 0.03), ('cause', 0.03), ('planar', 0.029), ('ec', 0.029), ('queen', 0.029), ('sums', 0.029), ('occupy', 0.028), ('latent', 0.028), ('classical', 0.028), ('temperature', 0.027), ('combine', 0.027), ('td', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 175 nips-2009-Occlusive Components Analysis

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1

2 0.18824963 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.14061834 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

4 0.13617913 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

Author: Ian Stevenson, Konrad Koerding

Abstract: In many domains, humans appear to combine perceptual cues in a near-optimal, probabilistic fashion: two noisy pieces of information tend to be combined linearly with weights proportional to the precision of each cue. Here we present a case where structural information plays an important role. The presence of a background cue gives rise to the possibility of occlusion, and places a soft constraint on the location of a target - in effect propelling it forward. We present an ideal observer model of depth estimation for this situation where structural or ordinal information is important and then fit the model to human data from a stereo-matching task. To test whether subjects are truly using ordinal cues in a probabilistic manner we then vary the uncertainty of the task. We find that the model accurately predicts shifts in subject’s behavior. Our results indicate that the nervous system estimates depth ordering in a probabilistic fashion and estimates the structure of the visual scene during depth perception. 1

5 0.12914011 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.10920837 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

7 0.10668937 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

8 0.093670174 236 nips-2009-Structured output regression for detection with partial truncation

9 0.087432429 151 nips-2009-Measuring Invariances in Deep Networks

10 0.080322638 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

11 0.077079557 96 nips-2009-Filtering Abstract Senses From Image Search Results

12 0.075138174 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

13 0.074491404 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

14 0.074064821 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

15 0.073905662 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

16 0.073416039 256 nips-2009-Which graphical models are difficult to learn?

17 0.072912678 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

18 0.072348677 2 nips-2009-3D Object Recognition with Deep Belief Nets

19 0.072053343 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

20 0.069163673 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.217), (1, -0.152), (2, -0.095), (3, -0.031), (4, -0.018), (5, 0.102), (6, 0.001), (7, 0.053), (8, 0.15), (9, -0.09), (10, 0.031), (11, -0.063), (12, 0.045), (13, -0.065), (14, 0.001), (15, 0.068), (16, 0.002), (17, -0.038), (18, 0.037), (19, -0.035), (20, -0.031), (21, -0.02), (22, -0.009), (23, -0.058), (24, -0.0), (25, -0.069), (26, -0.065), (27, 0.047), (28, -0.081), (29, -0.04), (30, -0.026), (31, -0.073), (32, -0.054), (33, -0.108), (34, 0.066), (35, -0.063), (36, -0.037), (37, 0.055), (38, 0.03), (39, 0.01), (40, 0.045), (41, -0.135), (42, -0.021), (43, -0.007), (44, -0.032), (45, 0.02), (46, -0.025), (47, -0.08), (48, -0.094), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93528593 175 nips-2009-Occlusive Components Analysis

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1

2 0.75881302 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black

Abstract: Multiple object tracking is a task commonly used to investigate the architecture of human visual attention. Human participants show a distinctive pattern of successes and failures in tracking experiments that is often attributed to limits on an object system, a tracking module, or other specialized cognitive structures. Here we use a computational analysis of the task of object tracking to ask which human failures arise from cognitive limitations and which are consequences of inevitable perceptual uncertainty in the tracking task. We find that many human performance phenomena, measured through novel behavioral experiments, are naturally produced by the operation of our ideal observer model (a Rao-Blackwelized particle filter). The tradeoff between the speed and number of objects being tracked, however, can only arise from the allocation of a flexible cognitive resource, which can be formalized as either memory or attention. 1

3 0.74653739 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

4 0.71900833 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

Author: Ian Stevenson, Konrad Koerding

Abstract: In many domains, humans appear to combine perceptual cues in a near-optimal, probabilistic fashion: two noisy pieces of information tend to be combined linearly with weights proportional to the precision of each cue. Here we present a case where structural information plays an important role. The presence of a background cue gives rise to the possibility of occlusion, and places a soft constraint on the location of a target - in effect propelling it forward. We present an ideal observer model of depth estimation for this situation where structural or ordinal information is important and then fit the model to human data from a stereo-matching task. To test whether subjects are truly using ordinal cues in a probabilistic manner we then vary the uncertainty of the task. We find that the model accurately predicts shifts in subject’s behavior. Our results indicate that the nervous system estimates depth ordering in a probabilistic fashion and estimates the structure of the visual scene during depth perception. 1

5 0.71702886 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

6 0.71415663 133 nips-2009-Learning models of object structure

7 0.6711058 211 nips-2009-Segmenting Scenes by Matching Image Composites

8 0.64835507 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

9 0.59979588 236 nips-2009-Structured output regression for detection with partial truncation

10 0.58910489 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

11 0.57752025 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

12 0.55884463 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

13 0.52516019 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

14 0.51560932 115 nips-2009-Individuation, Identification and Object Discovery

15 0.49197221 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

16 0.49181026 149 nips-2009-Maximin affinity learning of image segmentation

17 0.48927027 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

18 0.47331375 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

19 0.46592674 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

20 0.46035787 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.011), (24, 0.037), (25, 0.099), (35, 0.071), (36, 0.1), (39, 0.053), (42, 0.275), (58, 0.078), (61, 0.019), (71, 0.059), (81, 0.022), (86, 0.085), (91, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79626369 175 nips-2009-Occlusive Components Analysis

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1

2 0.75163871 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

Author: Jean Honorio, Dimitris Samaras, Nikos Paragios, Rita Goldstein, Luis E. Ortiz

Abstract: Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. 1

3 0.70239812 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

4 0.60369003 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

Author: Kurt Miller, Michael I. Jordan, Thomas L. Griffiths

Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1

5 0.600766 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

Author: Tom Ouyang, Randall Davis

Abstract: We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. 1

6 0.59972894 113 nips-2009-Improving Existing Fault Recovery Policies

7 0.5996232 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

8 0.59944665 97 nips-2009-Free energy score space

9 0.59888381 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

10 0.5967617 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

11 0.59591639 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

12 0.5956437 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

13 0.59545517 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

14 0.59314525 133 nips-2009-Learning models of object structure

15 0.59310091 70 nips-2009-Discriminative Network Models of Schizophrenia

16 0.59239382 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

17 0.59182066 112 nips-2009-Human Rademacher Complexity

18 0.59015471 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

19 0.58998835 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

20 0.58970869 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability