nips nips2007 nips2007-113 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
Reference: text
sentIndex sentText sentNum sentScore
1 The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. [sent-3, score-0.464]
2 These can be any combination of appearance, shape, or the layout of segments within the pattern. [sent-4, score-0.621]
3 Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. [sent-5, score-0.766]
4 We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images. [sent-8, score-0.669]
5 These visual attributes are important for understanding object appearance and for describing objects to other people. [sent-13, score-0.623]
6 Moreover, as different object categories often have attributes in common, modeling them explicitly allows part of the learning task to be shared amongst categories, or allows previously learnt knowledge about an attribute to be transferred to a novel category. [sent-19, score-0.576]
7 For example, learning the variability of zebra stripes under non-rigid deformations tells us a lot about the corresponding variability in striped shirts. [sent-21, score-0.568]
8 Both the appearance and the shape of pattern elements (e. [sent-25, score-0.398]
9 This enables our model to cover attributes defined by appearance (‘red’), by shape (‘round’), or by both (the black-and-white stripes of zebras). [sent-30, score-1.038]
10 Furthermore, the model takes into account attributes with general appearance, such as stripes which are characterized by a pattern of alternation ABAB of any two colors A and B, rather than by a specific combination of colors. [sent-31, score-0.871]
11 unary red round binary black/white stripes generic stripes Figure 1: Examples of different kinds of attributes. [sent-36, score-1.187]
12 On the left we show two simple attributes, whose characteristic properties are captured by individual image segments (appearance for red, shape for round). [sent-37, score-0.742]
13 This enables attributes to be learnt directly from a text specification by collecting training images using a web image search engine, such as Google-images, and querying on the name of the attribute. [sent-41, score-0.621]
14 Our approach is inspired by the ideas of Jojic and Caspi [4], where patterns have constant appearance within an image, but are free to change to another appearance in other images. [sent-42, score-0.653]
15 However, they work with textures covering the entire image and focus on finding distinctive appearance descriptors. [sent-46, score-0.52]
16 In constrast, here textures are attributes of objects, and therefore appear in complex images containing many other elements. [sent-47, score-0.4]
17 2 Image segments – basic visual representation The basic units in our attribute model are image segments extracted using the algorithm of [2]. [sent-53, score-1.337]
18 Figure 2a shows a few segments from a typical image. [sent-57, score-0.489]
19 Inspired by the success of simple patches as a basis for appearance descriptors [8, 9], we randomly sample a large number of 5 × 5 pixel patches from all training images and cluster them using kmeans [8]. [sent-58, score-0.443]
20 By clustering the segment histograms from the training images we obtain a codebook A of appearances (figure 2b). [sent-62, score-0.563]
21 Each entry in the codebook is a prototype segment descriptor, representing the appearance of a subset of the segments from the training set. [sent-63, score-1.113]
22 Each segment s is then assigned the appearance a ∈ A with the smallest Bhattacharya distance to the histogram of s. [sent-64, score-0.534]
23 In addition to appearance, various geometric properties of a segment are measured, summarizing its shape. [sent-65, score-0.421]
24 A P 1 A C A2 A P2 ln (A ) A 1 2 M C P m m M θ1 − θ2 θ1 θ2 a c b d Figure 2: Image segments as visual features. [sent-68, score-0.53]
25 a) An image with a few segments overlaid, including two pairs of adjacent segments on a striped region. [sent-69, score-1.319]
26 b) Each row is an entry from the appearance codebook A (i. [sent-70, score-0.366]
27 The three most frequent patch types for each appearance are displayed. [sent-73, score-0.371]
28 Two segments from the stripes are assigned to the white and black appearance respectively (arrows). [sent-74, score-1.247]
29 d) Relative geometric properties of a pair of segments: relative area and relative orientation. [sent-76, score-0.38]
30 relative area is the area of the first segment wrt to the second). [sent-79, score-0.392]
31 Simple attributes are entirely characterized by properties of a single segment (unary attributes). [sent-81, score-0.502]
32 Some unary attributes are defined by their appearance, such as colors (e. [sent-82, score-0.498]
33 Other unary attributes are defined by a segment shape (e. [sent-87, score-0.682]
34 All red segments have similar appearance, regardless of shape, while all round segments have similar shape, regardless of appearance. [sent-90, score-1.109]
35 More complex attributes have a basic element composed of two segments (binary attributes). [sent-91, score-0.753]
36 One example is the black/white stripes of a zebra, which are composed of pairs of segments sharing similar appearance and shape across all images. [sent-92, score-1.332]
37 Moreover, the layout of the two segments is characteristic as well: they are adjacent, nearly parallel, and have comparable area. [sent-93, score-0.651]
38 Going yet further, a general stripe pattern can have any appearance (e. [sent-94, score-0.463]
39 However, the pairs of segments forming a stripe pattern in one particular image must have the same appearance. [sent-97, score-0.817]
40 Hence, a characteristic of general stripes is a pattern of alternation ABABAB. [sent-98, score-0.541]
41 Essentially, attributes are found as patterns of repeated segments, or pairs of segments, sharing some properties (geometric and/or appearance and/or layout). [sent-101, score-0.662]
42 An image I is represented by a set of segments {s}. [sent-105, score-0.62]
43 Foreground segments are those on the image area covered by the attribute. [sent-107, score-0.706]
44 We collect f for all segments of I into the vector F. [sent-108, score-0.489]
45 An image has a foreground appearance a, shared by all the foreground segments it contains. [sent-109, score-1.259]
46 The other parameters are used to explain segments and are dicussed below. [sent-112, score-0.489]
47 Hence, the image likelihood can be expressed as a product over the probability of each segment s, counted by its area Ns (i. [sent-114, score-0.459]
48 D is the number of images in the dataset, Si is the number of segments in image i, and G is the total number of geometric properties considered (both active and inactive). [sent-117, score-0.964]
49 Φ1,2 are the geometric distributions for each segment a pair. [sent-120, score-0.382]
50 measure properties between two segments in a pair, such as relative orientation), and there are R of them in total (active and inactive). [sent-123, score-0.571]
51 It tells whether only adjacent pairs of segments are considered (so p(c|δ = 1) is one only iff c is a pair of adjacent segments). [sent-125, score-0.75]
52 2 Unary attributes Segments are the only observed variables in the unary model. [sent-129, score-0.402]
53 A segment s = (sa , {sj }) is defined g by its appearance sa and shape, captured by a set of geometric measurements {sj }, such as elong gation and curvedness. [sent-130, score-0.764]
54 The graphical model in figure 3a illustrates the conditional probability of image segments p(s|M; f, a) = p(sa |a) · β j p(sj |Φj )v g j if f = 1 if f = 0 (4) The likelihood for a segment depends on the model parameters M = (α, β, {λj }), which specify a visual attribute. [sent-131, score-0.984]
55 For each geometric property λj = (Φj , v j ), the model defines its distribution Φj over the foreground segments and whether the property is active or not (v j = 1 or 0). [sent-132, score-0.931]
56 The factor p(sa |a) = [sa = a] is 1 for segments having the foreground appearance a for this image, and 0 otherwise (thus it acts as a selector). [sent-138, score-0.962]
57 The scalar value β represents a simple background model: all segments assigned to the background have likelihood β. [sent-139, score-0.595]
58 During inference and learning we want to maximize the likelihood of an image given the model over F, which is achieved by setting f to foreground when the f = 1 case of equation (4) is greater than β. [sent-140, score-0.365]
59 β is some low value, corresponding to how likely it is for non-red segments to be assigned the red appearance. [sent-143, score-0.582]
60 3 Binary attributes The basic element of binary attributes is a pair of segments. [sent-148, score-0.549]
61 In addition to duplicating the unary appearance and geometric properties, the extended model includes pairwise properties which do not apply to individual segments. [sent-150, score-0.695]
62 In the graphical model of figure 3b, these are relative geometric properties γ (area, orientation) and adjacency δ, and together specify the layout of the attribute. [sent-151, score-0.425]
63 For example, the orientation of a segment with respect to the other can capture the parallelism of subsequent stripe segments. [sent-152, score-0.396]
64 Adjacency expresses whether the two segments in the pair are adjacent (like in stripes) or not (like the maple leaf and the stripes in the canadian flag). [sent-153, score-1.022]
65 We consider two segments adjacent if they share part of the boundary. [sent-154, score-0.569]
66 A pattern characterized by adjacent segments is more distinctive, as it is less likely to occur accidentally in a negative image. [sent-155, score-0.638]
67 An image is represented by a set of segments {s}, and the set of all possible pairs of segments {c}. [sent-157, score-1.15]
68 The image likelihood p(I|M; F, a) remains as defined in equation (3), but now a = (a1 , a2 ) specifies two foreground appearances, one for each segment in the pair. [sent-158, score-0.564]
69 The observed variables in our model are segments s and pairs of segments c. [sent-160, score-1.047]
70 A pair c = (s1 , s2 , {ck }) is defined by two segments s1 , s2 and their relative geometric measurements r {ck } (relative orientation and relative area in our implementation). [sent-161, score-0.881]
71 The two sets of λj = (Φj , vi ) are analogous to their counterparts in the unary model, i i and define the geometric distributions and their associated activation states for each segment in the pair respectively. [sent-163, score-0.608]
72 The layout part of the model captures the interaction between the two segments in k the pair. [sent-164, score-0.649]
73 For each relative geometric property γ k = (Ψk , vr ) the model gives its distribution Ψk over k pairs of foreground segments and its activation state vr . [sent-165, score-1.033]
74 The model parameter δ determines whether the pattern is composed of pairs of adjacent segments (δ = 1) or just any pair of segments (δ = 0). [sent-166, score-1.232]
75 The factor p(c|δ) is defined as 0 iff δ = 1 and the segments in c are not adjacent, while it is 1 in all other cases (so, when δ = 1, p(c|δ) acts as a pair selector). [sent-167, score-0.528]
76 The appearance factor p(s1,a , s2,a |a) = [s1,a = a1 ∧ s2,a = a2 ] is 1 when the two segments have the foreground appearances a = (a1 , a2 ) for this image. [sent-168, score-1.103]
77 The layout parameters are δ = 1, and γ rel area , γ rel orient are active and peaked at 0 (expressing that the two segments are parallel and have the same area). [sent-173, score-0.81]
78 The image likelihood defined in (3) depends on the foreground/background labels F and on the foreground appearance a. [sent-176, score-0.644]
79 While many of the positive images contain examples of the attribute to be learnt (figure 4), a considerable proportion don’t. [sent-183, score-0.404]
80 A discriminative approach instead positive training images negative training images Figure 4: Advantages of discriminative training. [sent-191, score-0.377]
81 The former case covers attributes such as colors, or patterns with specific colors (such as zebra stripes). [sent-200, score-0.439]
82 The latter case covers generic patterns, as it allows each image to pick a different appearance a ∈ α, while at the same time it properly constrains all segments/pairs within an image to share the same appearance (e. [sent-201, score-0.924]
83 subsequent pairs of stripe segments have the same appearance, forming a pattern of alternation ABABAB). [sent-203, score-0.745]
84 To achieve this, we need determine the latent variable F for 1 2 each training image, as it is necessary for estimating the geometric distributions over the foreground segments. [sent-208, score-0.352]
85 Estimate β and the geometric activations v iteratively: (a) Update β as the average probability of segments from I− . [sent-219, score-0.644]
86 This is obtained using the foreground expression of (5) for all segments of I− . [sent-220, score-0.655]
87 4 compactness 0 1 elongation curvedness −4 relative orientation relative area Figure 5: a) color models learnt for red, green, blue, and yellow. [sent-242, score-0.548]
88 b+c) geometric properties of the learned models for stripes (b) and dots (c). [sent-245, score-0.708]
89 One last, implicit, parameter is the model complexity: is the attribute unary or binary ? [sent-255, score-0.391]
90 The comparison is meaningful because image likelihood is measured in the same way in both unary and binary cases (i. [sent-257, score-0.375]
91 In all cases, the correct model is returned: unary, no active geometric property, and the correct color as a specific appearance (figure 5a). [sent-268, score-0.591]
92 Both stripes and dots are learnt as binary and with general appearance, while they differ substantially in their geometric properties. [sent-274, score-0.825]
93 Stripes are learnt as elongated, rather straight pairs of segments, with largely the same properties for the two segments in a pair. [sent-275, score-0.687]
94 The background segments have a very curved, zigzagging outline, because they circumvent several dots. [sent-279, score-0.522]
95 In contrast to stripes, the two segments that form this dotted pattern are not symmetric in their properties. [sent-280, score-0.527]
96 The learnt model is binary, with one segment for a black square and the other for an adjacent white square, demonstrating the learning algorithm correctly infers both models with specific and generic appearance, adapting to the training data. [sent-283, score-0.545]
97 Moreover, the area covered by the attribute is localized by the segments with f = 1 (figure 6). [sent-286, score-0.734]
98 Moreover, the images exhibit extreme variability: there are paintings as well as photographs, stripes appear in any orientation, scale, and appearance, and they are often are deformed Figure 6: Recognition results. [sent-291, score-0.519]
99 The two lower curves in the stripes plot correspond to a model without layout, and without either layout nor any geometry respectively. [sent-300, score-0.6]
100 Performance is convincing also for stripes and dots, especially since these attributes have generic appearance, and hence must be recognized based only on geometry and layout. [sent-311, score-0.7]
wordName wordTfidf (topN-words)
[('segments', 0.489), ('stripes', 0.414), ('appearance', 0.307), ('attributes', 0.236), ('segment', 0.227), ('unary', 0.166), ('foreground', 0.166), ('attribute', 0.159), ('geometric', 0.155), ('appearances', 0.141), ('layout', 0.132), ('image', 0.131), ('stripe', 0.118), ('learnt', 0.118), ('images', 0.105), ('dots', 0.1), ('colors', 0.096), ('red', 0.093), ('striped', 0.089), ('elongation', 0.082), ('adjacent', 0.08), ('area', 0.061), ('alternation', 0.059), ('curvedness', 0.059), ('codebook', 0.059), ('textures', 0.059), ('yellow', 0.057), ('color', 0.056), ('shape', 0.053), ('orientation', 0.051), ('gure', 0.046), ('sa', 0.045), ('active', 0.045), ('zebra', 0.044), ('relative', 0.043), ('visual', 0.041), ('patch', 0.041), ('pairs', 0.041), ('likelihood', 0.04), ('object', 0.039), ('patterns', 0.039), ('pair', 0.039), ('selector', 0.039), ('encompasses', 0.039), ('properties', 0.039), ('round', 0.038), ('binary', 0.038), ('pattern', 0.038), ('sj', 0.038), ('white', 0.037), ('cvpr', 0.036), ('inactive', 0.035), ('compactness', 0.035), ('background', 0.033), ('vr', 0.033), ('zisserman', 0.033), ('texture', 0.033), ('training', 0.031), ('negative', 0.031), ('green', 0.03), ('weakly', 0.03), ('characteristic', 0.03), ('caspi', 0.03), ('elong', 0.03), ('elongated', 0.03), ('ijcv', 0.03), ('rel', 0.03), ('sand', 0.03), ('skirt', 0.03), ('spotted', 0.03), ('adjacency', 0.028), ('model', 0.028), ('composed', 0.028), ('discriminative', 0.026), ('geometry', 0.026), ('jojic', 0.026), ('locus', 0.026), ('shirt', 0.026), ('covered', 0.025), ('recognize', 0.025), ('categories', 0.024), ('generic', 0.024), ('covers', 0.024), ('property', 0.024), ('qualities', 0.024), ('curved', 0.024), ('sx', 0.024), ('localizations', 0.024), ('winn', 0.024), ('generative', 0.023), ('frequent', 0.023), ('parallel', 0.023), ('distinctive', 0.023), ('positive', 0.022), ('checkerboard', 0.022), ('schmid', 0.022), ('activation', 0.021), ('rming', 0.021), ('tells', 0.021), ('pixels', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
2 0.14454699 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
3 0.12878756 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
4 0.09417887 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
5 0.074241288 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
Author: Leonid Sigal, Alexandru Balan, Michael J. Black
Abstract: Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initialization of such models has proved difficult and most approaches assume that the size and shape of the body parts are known a priori. In this paper we propose a method for automatically recovering a detailed parametric model of non-rigid body shape and pose from monocular imagery. Specifically, we represent the body using a parameterized triangulated mesh model that is learned from a database of human range scans. We demonstrate a discriminative method to directly recover the model parameters from monocular images using a conditional mixture of kernel regressors. This predicted pose and shape are used to initialize a generative model for more detailed pose and shape estimation. The resulting approach allows fully automatic pose and shape recovery from monocular and multi-camera imagery. Experimental results show that our method is capable of robustly recovering articulated pose, shape and biometric measurements (e.g. height, weight, etc.) in both calibrated and uncalibrated camera environments. 1
6 0.068189725 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
7 0.059917212 115 nips-2007-Learning the 2-D Topology of Images
8 0.059388958 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
9 0.0579895 136 nips-2007-Multiple-Instance Active Learning
10 0.056139585 182 nips-2007-Sparse deep belief net model for visual area V2
11 0.05322174 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
12 0.051165666 27 nips-2007-Anytime Induction of Cost-sensitive Trees
13 0.051010195 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
14 0.050861664 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
15 0.0502619 109 nips-2007-Kernels on Attributed Pointsets with Applications
16 0.049240515 196 nips-2007-The Infinite Gamma-Poisson Feature Model
17 0.047180064 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
18 0.045084663 19 nips-2007-Active Preference Learning with Discrete Choice Data
19 0.044888061 145 nips-2007-On Sparsity and Overcompleteness in Image Models
20 0.042492837 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
topicId topicWeight
[(0, -0.129), (1, 0.078), (2, -0.024), (3, -0.083), (4, 0.02), (5, 0.107), (6, -0.0), (7, 0.116), (8, 0.049), (9, 0.027), (10, -0.028), (11, 0.028), (12, -0.034), (13, -0.01), (14, -0.143), (15, 0.104), (16, -0.049), (17, 0.016), (18, 0.013), (19, -0.044), (20, -0.022), (21, 0.073), (22, 0.12), (23, 0.008), (24, -0.135), (25, -0.057), (26, -0.036), (27, -0.106), (28, 0.004), (29, 0.03), (30, -0.115), (31, 0.019), (32, 0.028), (33, 0.059), (34, -0.002), (35, 0.028), (36, -0.096), (37, 0.021), (38, 0.143), (39, -0.002), (40, 0.008), (41, -0.015), (42, -0.019), (43, -0.001), (44, -0.038), (45, 0.111), (46, -0.005), (47, -0.044), (48, -0.003), (49, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.9641065 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
2 0.84895355 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
3 0.79246569 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
4 0.57898313 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
5 0.52507395 196 nips-2007-The Infinite Gamma-Poisson Feature Model
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
6 0.47211033 183 nips-2007-Spatial Latent Dirichlet Allocation
7 0.46517214 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
8 0.46457773 115 nips-2007-Learning the 2-D Topology of Images
9 0.4382515 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
10 0.4354791 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
11 0.42344856 193 nips-2007-The Distribution Family of Similarity Distances
12 0.37587878 136 nips-2007-Multiple-Instance Active Learning
13 0.37336221 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
14 0.34956136 171 nips-2007-Scan Strategies for Meteorological Radars
15 0.34889552 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
16 0.33186552 109 nips-2007-Kernels on Attributed Pointsets with Applications
17 0.32291546 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
18 0.31836137 19 nips-2007-Active Preference Learning with Discrete Choice Data
19 0.28719962 145 nips-2007-On Sparsity and Overcompleteness in Image Models
20 0.28567278 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
topicId topicWeight
[(4, 0.011), (5, 0.073), (13, 0.045), (16, 0.029), (21, 0.055), (26, 0.012), (31, 0.024), (35, 0.031), (47, 0.068), (49, 0.014), (75, 0.297), (83, 0.098), (85, 0.011), (87, 0.065), (90, 0.073)]
simIndex simValue paperId paperTitle
same-paper 1 0.77871728 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
2 0.66033804 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
3 0.48986161 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
4 0.48981759 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
5 0.48332852 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
6 0.48039523 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
7 0.47840837 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
8 0.47821397 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
9 0.47749069 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
10 0.47701195 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
11 0.47647771 115 nips-2007-Learning the 2-D Topology of Images
12 0.47611958 63 nips-2007-Convex Relaxations of Latent Variable Training
13 0.47578773 59 nips-2007-Continuous Time Particle Filtering for fMRI
14 0.47374776 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
15 0.47334701 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
16 0.47305909 7 nips-2007-A Kernel Statistical Test of Independence
17 0.47216442 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
18 0.47119084 49 nips-2007-Colored Maximum Variance Unfolding
19 0.47042817 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
20 0.46941397 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning