nips nips2012 nips2012-185 knowledge-graph by maker-knowledge-mining

185 nips-2012-Learning about Canonical Views from Internet Image Collections


Source: pdf

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il/~mezuman Abstract Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. [sent-9, score-0.576]

2 This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. [sent-10, score-0.889]

3 Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. [sent-11, score-1.156]

4 In this paper we ask: Can we use Internet image collections to learn more about canonical views? [sent-12, score-0.598]

5 We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. [sent-13, score-0.62]

6 Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. [sent-14, score-0.712]

7 We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. [sent-15, score-0.466]

8 Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. [sent-16, score-0.978]

9 Although ideally object recognition should be viewpoint invariant, much research in human perception indicates that certain views are privileged, or “canonical”. [sent-18, score-0.657]

10 1 presents different views of a horse used in their experiments and the average goodness rating given by human subjects. [sent-22, score-0.616]

11 For the horse, the canonical view is a slightly off-axis sideways view, while the least favored view is from above. [sent-23, score-0.939]

12 The preference for side views of horses is very robust and can be reliably demonstrated in simple classroom experiments [6]. [sent-25, score-0.506]

13 36 Figure 1: When people are asked to rate images of the same object from different views some views consistently get better grades than others. [sent-40, score-1.2]

14 The view that gets the best grade is called the canonical view. [sent-41, score-0.673]

15 The first one, called the frequency hypothesis argues that the canonical view is the one from which we most often see the object. [sent-45, score-0.735]

16 The second one, called the maximal information hypothesis argues that the canonical view is the view that gives the most information about the 3D structure of the object. [sent-46, score-1.001]

17 If we have access to the statistics with which we view certain objects, we can compute the most frequent view and given the 3D shape of an object we can automatically compute the most stable view [7, 8, 9]. [sent-51, score-1.201]

18 Both of these formal theories have been shown to be insufficient to predict the canonical views preferred by human observers; Palmer et al. [sent-52, score-1.128]

19 One reason for the relative vagueness of theories of canonical views may be the lack of data: the number of objects for which canonical views have been tested in the lab is at most a few dozens. [sent-55, score-1.832]

20 In this paper, we seek to dramatically increase the number of examples for canonical views using Internet search engines and computer vision tools. [sent-56, score-0.984]

21 We expect that since the canonical view of an object corresponds to what people perceive as the "best" photograph, when people include a photograph of an object in their web page, they are most likely to choose a photograph from the canonical view. [sent-57, score-1.4]

22 In other words, we expect the canonical view to be the most frequent view in the set of images retrieved by a search engine when queried for the object. [sent-58, score-1.573]

23 We start by manually validating our hypothesis and showing that indeed the most frequent view in Internet image collections often corresponds to the cognitive canonical view. [sent-59, score-1.286]

24 We then present an automatic method for finding the most frequent view in a large dataset of images. [sent-60, score-0.614]

25 Rather than trying to map images to views and then finding the most frequent view, we find it by analyzing the density of global image descriptors. [sent-61, score-1.129]

26 Using images for which we have ground truth, we verify that our automatic method indeed finds the most frequent view in a large percentage of the cases. [sent-62, score-0.836]

27 We next apply this method to images retrieved by search engines and find the canonical view for hundreds of categories. [sent-63, score-1.104]

28 Finally we use the canonical views we find to present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. [sent-64, score-1.824]

29 2 Figure 2: The four most frequent views (frequencies specified) manually found in images returned by Google images (second-fifth rows) often corresponds to the canonical view found in psychophysical experiments (first row). [sent-65, score-2.027]

30 2 Manual experiments with Internet image collections We first asked whether Internet image collections will show the same view biases as reported in psychophysical experiments. [sent-66, score-0.794]

31 In order to answer this question, we downloaded images of the twelve categories used by Palmer et al. [sent-67, score-0.477]

32 To download these images we simply queried Google Image search with the object and retrieved the top returned images. [sent-69, score-0.459]

33 For each category we manually sorted the images into bins corresponding to similar views (each category could have a different number of bins), counted the number of images in each bin and found the most frequent view. [sent-70, score-1.385]

34 We used 400 images for the four categories presented in Figure 2 and 100 images for the other eight categories. [sent-71, score-0.634]

35 Figure 2 shows the bins with the highest frequencies along with their frequencies and the cognitive canonical view for car, horse, shoe, and steaming iron categories. [sent-72, score-0.788]

36 The results of this manual experiment are clear cut: for 11 out of the 12 categories, the most frequent view in Google images is the canonical view found by Palmer et al. [sent-73, score-1.504]

37 The only exception is the horse category for which the most frequent view is the one that received the second best ratings in the psychophysical experiments (see figure 1). [sent-75, score-0.837]

38 This study validates our hypothesis that when humans decide which view of an object to embed in a web page, they exhibit a very similar view bias as is seen in psychophysical experiments. [sent-76, score-0.832]

39 This result now gives us the possibility to harness the huge numbers of images available on the Internet to study these view biases in many categories. [sent-77, score-0.54]

40 3 Can we find the most frequent view automatically? [sent-78, score-0.569]

41 [10] showed how clustering Internet photographs of tourist sites can find several "canonical" views of the site. [sent-83, score-0.472]

42 Clustering on images from the Internet is also used to find canonical views (or iconic images) in other works e. [sent-84, score-1.142]

43 [13] uses similarity measure between images to find a small subset of canonical images to a larger set of images. [sent-88, score-0.9]

44 Deselaers and Ferrari [14] present a simpler method that finds the image in the center of the GIST image descriptor [15] space to select the prototype image for categories in ImageNet [16]. [sent-92, score-0.629]

45 We experimented with this method and found that often the prototypical image did not correspond to the most frequent view. [sent-93, score-0.457]

46 [17] suggest a method to find a single most representative image (canonical image) for a category relying on similarities between images based on local invariant features. [sent-95, score-0.413]

47 Since they use invariant features the view of the object in the image has no role in the selection of the canonical image. [sent-96, score-0.901]

48 Weyand and Leibe [18] use mode estimation to find iconic images for many images of a single scene using a distance measure based on calculating a homography between the images and measuring the overlap. [sent-97, score-0.841]

49 Our method to find the most frequent view is based on estimating the density of views using the Parzen window method, and simply choosing the modes of the density as the most frequent views. [sent-99, score-1.458]

50 In other words, if we have an image descriptor so that the distance between descriptors for two images approximates the similarity of views between the objects, we can calculate the Parzen density without ever computing the views. [sent-108, score-0.93]

51 We hypothesize that despite this sensitivity to the background, the maximum of the Parzen density when we use GIST similarity between images will serve as a useful proxy for the maximum of the Parzen density when we use view similarity. [sent-112, score-0.641]

52 1 Our method In summary, given an object category our algorithm automatically finds the modes of the GIST distribution in images of that object. [sent-114, score-0.458]

53 Our method therefore also includes a manual phase which requires a human to view the output of the algorithm and to verify whether or not this mode in the GIST distribution actually corresponds to a mode in the view distribution. [sent-116, score-0.815]

54 The first mode is simply the most frequent image in the GIST space and its k closest neighbors. [sent-120, score-0.505]

55 The second mode is the most frequent image that is not a close neighbor of the first most frequent image (e. [sent-121, score-0.936]

56 We found that when a human verifies that a set of images that are modes in the GIST space are indeed of the same view, then in almost all cases these images are indeed the modes in view space. [sent-129, score-0.916]

57 4 (a) (b) (c) (d) Figure 3: By using Parzen density estimation on GIST features, we are able to find the most frequent view without calculating the view for a given image. [sent-131, score-0.872]

58 (a) Distribution of views for 715 images of Notre Dame Cathedral in Paris, adapted from [22]. [sent-132, score-0.661]

59 The image from the most frequent view (c) is the same image of the most frequent GIST descriptor (d). [sent-134, score-1.183]

60 This is much less painful than requiring a human to look at all retrieved images which can take a few hours (the automatic part of the method, that finds the modes in GIST space, takes a few seconds of computer time). [sent-136, score-0.44]

61 In the first experiment, we ran our automatic method on the same images that we manually sorted into views in the previous section: images downloaded from Google image search for the twelve categories used by Palmer et al. [sent-140, score-1.388]

62 We find that in 10 out of 12 categories our automatic method found the same most frequent view as we found manually. [sent-143, score-0.804]

63 On this dataset, we calculated the most frequent view using Parzen density estimation in two different ways (1) using the similarity between the camera’s rotation matrices and (2) using the GIST similarity between images. [sent-147, score-0.704]

64 As shown in figure 3 the most frequent view calculated using the two methods was identical. [sent-148, score-0.569]

65 3 Control As can be seen in figure 4, the most frequent view chosen by our method often has a white, or uniform background. [sent-150, score-0.569]

66 Will a method that simply chooses images with uniform background can also find canonical views? [sent-151, score-0.629]

67 The ImageNet images were collected by querying various Internet search engines with the desired object, and the resulting set of images was then “cleaned up” by humans. [sent-156, score-0.541]

68 It is important to note that the humans were not instructed to choose particular views but rather to verify that the image contained the desired object. [sent-157, score-0.611]

69 For a subset of the images, ImageNet also supplies bounding boxes around the object of interest; we cropped the objects from the images and considered it as a fourth dataset. [sent-158, score-0.494]

70 We saw that our method finds preferred views also in the other datasets and that these preferred views are usually the cognitive canonical views. [sent-160, score-1.502]

71 One example of this improvement is the horse category for which we did not find the most frequent view using the the full images but did find it when we used the cropped images. [sent-162, score-0.983]

72 5 Random Palmer First mode Figure 4: Results on categories we downloaded from Google for which the canonical view was found in Palmer et al. [sent-164, score-0.972]

73 4 What can we learn from hundreds of canonical views? [sent-168, score-0.479]

74 To summarize our validation experiments: although we use GIST similarity as a proxy for view similarity, our method often finds the canonical view. [sent-169, score-0.752]

75 We used our method to find canonical views for two groups of object categories: (1) 54 categories inspired by the work of Rosch et al. [sent-171, score-1.136]

76 (2) 552 categories of mammals (all the categories of mammals in ImageNet [16] for which there are bounding boxes around the objects), for these categories we used the cropped objects. [sent-173, score-0.772]

77 For every object category tested we downloaded all corresponding images (in average more than 1,200 images, out of them around 300 with bounding boxes) from ImageNet. [sent-174, score-0.452]

78 For Rosch’s categories we used full images since for some of them bounding boxes are not supplied, for the mammals we used cropped images. [sent-178, score-0.574]

79 For most of the categories the modes found by our algorithm were indeed verified by a human observer as representing a true mode in view space. [sent-179, score-0.663]

80 Thus while our method does not succeed in finding preferred views for all categories, by focusing only on the categories for which humans verified that preferred views were found, we still have canonical views for hundreds of categories. [sent-180, score-2.21]

81 [2] raised two basic theories to explain the phenomenon of canonical views: (1) the frequency hypothesis and (2) the maximal information hypothesis. [sent-185, score-0.532]

82 We find canonical views of animals that are from the animals’ height rather than ours (fig. [sent-187, score-0.891]

83 5a-b); dogs, for example, are usually seen from above while many of the canonical views we find for dogs are from their height. [sent-188, score-0.873]

84 The canonical views of vehicles are another counter-example for the frequency hypothesis, we usually see vehicles from the side (as pedestrians) or from behind (as drivers), but the canonical views we find are the “perfect” off-axis view (fig. [sent-189, score-2.014]

85 As a third family of examples we have the tools; we usually see them when we use them, this is not the canonical view we find (fig. [sent-191, score-0.673]

86 While for 20% of the categories we find off-axis canonical views that give the most information about the shape of the object, for more than 60% of the categories we find canonical views that are either side-views (fig. [sent-194, score-2.072]

87 5f,i) or frontal views (especially views of the face - fig. [sent-195, score-0.878]

88 Not only do these views not give us the full information about the 3D structure of the object, they are also accidental, i. [sent-197, score-0.439]

89 a small change in the view will cause a big change of the appearance of the object; for example in some of the side-views we see only two legs out of four, a small change in the view will reveal the two other legs. [sent-199, score-0.562]

90 2 Constraints for new theories We believe that our experiments reveal several robust features of canonical views that every future theory should take into considerations. [sent-201, score-0.966]

91 The first aspect is that there are several preferred views for a given object. [sent-202, score-0.529]

92 Sometimes these several views are related to symmetry (e. [sent-203, score-0.439]

93 a mirror image of the preferred view is also preferred) but in other cases they are different views that are just slightly less preferred than the canonical view (e. [sent-205, score-1.686]

94 Finally, the view biases are most pronounced for basic and subordinate level categories and less so for superordinate categories (e. [sent-214, score-0.671]

95 5 Conclusion In this work we revisited a cognitive phenomenon that was discovered over 30 years ago: a preference by human observers for particular "canonical" views of objects. [sent-219, score-0.596]

96 We showed that a nearly identical view bias can be observed in the results of Internet image search engines, suggesting that when humans decide which image to embed in a web page, they prefer the same canonical view that is assigned highest goodness in laboratory experiments. [sent-220, score-1.302]

97 We presented an automatic method to discover the most likely view in an image collection and used this algorithm to obtain canonical views for hundreds of object categories. [sent-221, score-1.457]

98 Our results provide strong counter-examples for the two formal hypotheses of canonical views; we hope they will serve as a basis for a computational explanation for this fascinating effect. [sent-222, score-0.449]

99 Canonical views of scenes depend on the shape of the space. [sent-256, score-0.439]

100 Discovering favorite views of popular places with iconoid shift. [sent-347, score-0.439]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('views', 0.439), ('canonical', 0.407), ('frequent', 0.303), ('view', 0.266), ('gist', 0.255), ('images', 0.222), ('categories', 0.19), ('palmer', 0.154), ('image', 0.128), ('psychophysical', 0.121), ('internet', 0.113), ('rosch', 0.111), ('parzen', 0.108), ('object', 0.1), ('preferred', 0.09), ('theories', 0.09), ('horse', 0.084), ('imagenet', 0.077), ('mode', 0.074), ('iconic', 0.074), ('modes', 0.073), ('hundreds', 0.072), ('engines', 0.067), ('category', 0.063), ('collections', 0.063), ('photograph', 0.06), ('human', 0.06), ('viewpoint', 0.058), ('google', 0.057), ('collages', 0.055), ('descriptor', 0.055), ('objects', 0.05), ('similarity', 0.049), ('manually', 0.047), ('boxes', 0.045), ('animals', 0.045), ('automatic', 0.045), ('dame', 0.045), ('notre', 0.045), ('cropped', 0.045), ('humans', 0.044), ('formal', 0.042), ('vision', 0.041), ('mammals', 0.04), ('manual', 0.04), ('retrieved', 0.04), ('queried', 0.039), ('nd', 0.038), ('density', 0.037), ('cathedral', 0.037), ('denton', 0.037), ('edmond', 0.037), ('horses', 0.037), ('lthoff', 0.037), ('mezuman', 0.037), ('perceiver', 0.037), ('raguram', 0.037), ('safra', 0.037), ('weyand', 0.037), ('cognitive', 0.037), ('nds', 0.036), ('phase', 0.035), ('downloaded', 0.035), ('hypothesis', 0.035), ('goodness', 0.033), ('snavely', 0.033), ('blanz', 0.033), ('collage', 0.033), ('lily', 0.033), ('photographs', 0.033), ('bounding', 0.032), ('camera', 0.031), ('preference', 0.03), ('reveal', 0.03), ('azimuth', 0.03), ('workshops', 0.03), ('jing', 0.03), ('twelve', 0.03), ('observers', 0.03), ('proxy', 0.03), ('search', 0.03), ('deselaers', 0.028), ('vehicles', 0.028), ('download', 0.028), ('berg', 0.028), ('scene', 0.027), ('photo', 0.027), ('argues', 0.027), ('ehinger', 0.027), ('harness', 0.027), ('vague', 0.027), ('elevation', 0.027), ('dogs', 0.027), ('veri', 0.027), ('page', 0.027), ('bins', 0.026), ('frequencies', 0.026), ('prototypical', 0.026), ('ago', 0.026), ('biases', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 185 nips-2012-Learning about Canonical Views from Internet Image Collections

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

2 0.13192078 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

Author: Trung Nguyen, Tomi Silander, Tze Y. Leong

Abstract: We study how to automatically select and adapt multiple abstractions or representations of the world to support model-based reinforcement learning. We address the challenges of transfer learning in heterogeneous environments with varying tasks. We present an efficient, online framework that, through a sequence of tasks, learns a set of relevant representations to be used in future tasks. Without predefined mapping strategies, we introduce a general approach to support transfer learning across different state spaces. We demonstrate the potential impact of our system through improved jumpstart and faster convergence to near optimum policy in two benchmark domains. 1

3 0.12777238 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

4 0.10314551 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

5 0.099839412 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Author: Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, Anil K. Jain

Abstract: One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called semi-crowdsourced clustering that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects and from the manual annotations of only a small portion of the data to be clustered. One difficulty in learning the pairwise similarity measure is that there is a significant amount of noise and inter-worker variations in the manual annotations obtained via crowdsourcing. We address this difficulty by developing a metric learning algorithm based on the matrix completion method. Our empirical study with two real-world image data sets shows that the proposed algorithm outperforms state-of-the-art distance metric learning algorithms in both clustering accuracy and computational efficiency. 1

6 0.096816286 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

7 0.088969655 210 nips-2012-Memorability of Image Regions

8 0.086341202 201 nips-2012-Localizing 3D cuboids in single-view images

9 0.08584667 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

10 0.08261203 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

11 0.080163933 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

12 0.079596244 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

13 0.075487979 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

14 0.072912186 193 nips-2012-Learning to Align from Scratch

15 0.072909527 8 nips-2012-A Generative Model for Parts-based Object Segmentation

16 0.07097023 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

17 0.06966152 86 nips-2012-Convex Multi-view Subspace Learning

18 0.069408335 303 nips-2012-Searching for objects driven by context

19 0.068703443 62 nips-2012-Burn-in, bias, and the rationality of anchoring

20 0.068703443 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.133), (1, 0.028), (2, -0.178), (3, -0.029), (4, 0.087), (5, -0.086), (6, -0.005), (7, -0.027), (8, 0.03), (9, 0.006), (10, -0.013), (11, -0.034), (12, 0.003), (13, -0.072), (14, 0.051), (15, 0.102), (16, 0.032), (17, -0.023), (18, -0.04), (19, -0.067), (20, 0.004), (21, -0.023), (22, -0.015), (23, 0.011), (24, 0.002), (25, 0.021), (26, 0.059), (27, 0.042), (28, 0.06), (29, 0.054), (30, -0.016), (31, -0.029), (32, 0.053), (33, -0.059), (34, 0.05), (35, 0.097), (36, -0.025), (37, -0.045), (38, 0.055), (39, -0.052), (40, 0.042), (41, -0.014), (42, 0.058), (43, -0.044), (44, 0.083), (45, -0.017), (46, 0.159), (47, 0.049), (48, -0.023), (49, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98035073 185 nips-2012-Learning about Canonical Views from Internet Image Collections

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

2 0.75286239 210 nips-2012-Memorability of Image Regions

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

3 0.73324674 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the stateof-the-art algorithms. 1

4 0.59684008 146 nips-2012-Graphical Gaussian Vector for Image Categorization

Author: Tatsuya Harada, Yasuo Kuniyoshi

Abstract: This paper proposes a novel image representation called a Graphical Gaussian Vector (GGV), which is a counterpart of the codebook and local feature matching approaches. We model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Then we define a new image feature by embedding the proper metric into the parameters, which can be directly applied to scalable linear classifiers. We show that the GGV obtains better performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. 1

5 0.59347731 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

6 0.58959323 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

7 0.57571608 201 nips-2012-Localizing 3D cuboids in single-view images

8 0.56094748 202 nips-2012-Locally Uniform Comparison Image Descriptor

9 0.55892444 8 nips-2012-A Generative Model for Parts-based Object Segmentation

10 0.55460238 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

11 0.54510224 2 nips-2012-3D Social Saliency from Head-mounted Cameras

12 0.5225029 344 nips-2012-Timely Object Recognition

13 0.52119666 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

14 0.5181638 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

15 0.51526397 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

16 0.50082791 193 nips-2012-Learning to Align from Scratch

17 0.50023609 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

18 0.50023198 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

19 0.49779344 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

20 0.48796573 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.025), (17, 0.016), (21, 0.018), (37, 0.223), (38, 0.109), (39, 0.032), (42, 0.022), (54, 0.014), (55, 0.043), (60, 0.015), (74, 0.164), (76, 0.158), (80, 0.033), (92, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84496921 185 nips-2012-Learning about Canonical Views from Internet Image Collections

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

2 0.7374934 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

Author: Angela Eigenstetter, Bjorn Ommer

Abstract: Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. However, like all descriptors using second order statistics, ours also exhibits a high dimensionality. Although improving discriminability, the high dimensionality becomes a critical issue due to lack of generalization ability and curse of dimensionality. Given only a limited amount of training data, even sophisticated learning algorithms such as the popular kernel methods are not able to suppress noisy or superfluous dimensions of such high-dimensional data. Consequently, there is a natural need for feature selection when using present-day informative features and, particularly, curvature self-similarity. We therefore suggest an embedded feature selection method for SVMs that reduces complexity and improves generalization capability of object models. By successfully integrating the proposed curvature self-similarity representation together with the embedded feature selection in a widely used state-of-the-art object detection framework we show the general pertinence of the approach. 1

3 0.73673701 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

4 0.73518181 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

Author: Kei Wakabayashi, Takao Miura

Abstract: Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. We present some experiments to demonstrate that our proposed method works more efficiently to estimate HHMM parameters than do some existing methods such as the flattening method and Gibbs sampling method. 1

5 0.7334438 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection. 1

6 0.73326832 202 nips-2012-Locally Uniform Comparison Image Descriptor

7 0.73271334 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.73141533 201 nips-2012-Localizing 3D cuboids in single-view images

9 0.72672063 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

10 0.72324187 40 nips-2012-Analyzing 3D Objects in Cluttered Images

11 0.71975404 210 nips-2012-Memorability of Image Regions

12 0.71433431 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

13 0.71075863 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

14 0.7098912 75 nips-2012-Collaborative Ranking With 17 Parameters

15 0.6985873 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

16 0.69588023 303 nips-2012-Searching for objects driven by context

17 0.69396222 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

18 0.6933887 8 nips-2012-A Generative Model for Parts-based Object Segmentation

19 0.69218105 146 nips-2012-Graphical Gaussian Vector for Image Categorization

20 0.6892966 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves