nips nips2012 nips2012-210 knowledge-graph by maker-knowledge-mining

210 nips-2012-Memorability of Image Regions


Source: pdf

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. [sent-3, score-0.247]

2 Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. [sent-4, score-1.546]

3 However, the class of features and image information that is forgotten has not been explored yet. [sent-5, score-0.397]

4 In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. [sent-6, score-0.639]

5 The model automatically discovers memorability maps of individual images without any human annotation. [sent-7, score-1.049]

6 We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. [sent-8, score-1.343]

7 1 Introduction Human long-term memory can store a remarkable amount of visual information and remember thousands of different pictures even after seeing each of them only once [25, 1]. [sent-9, score-0.271]

8 While most of the work in visual cognition has examined how people forget for general classes of visual or verbal stimuli [30], little work has looked at which image information is forgotten and which is retained. [sent-11, score-0.583]

9 Are there some features, image regions or objects that are forgotten more easily than others? [sent-13, score-0.54]

10 Inspired by work in visual cognition showing that humans selectively forget some objects and regions from an image while retaining others [22], we propose a novel probabilistic framework for modeling image memorability, based on the fading of local image information. [sent-14, score-1.041]

11 Recent work on image memorability [6, 7, 12] has shown that there are large differences between the memorabilities of different images, and these differences are consistent across context and observers, suggesting that memory differences are intrinsic to the images themselves. [sent-15, score-1.228]

12 quantified the contribution of segmented regions to the image memorability score, creating a memorability map for each individual image that identifies objects that are correlated with high or low memorability scores. [sent-18, score-3.156]

13 However, this previous work did not attempt to discover in an automatic fashion which part of the image is memorable and which regions are forgettable. [sent-19, score-0.557]

14 In this paper, we introduce a novel framework for predicting image memorability that is able to account for how memorability of image regions and different types of features fade over time, offering memorability maps that are more interpretable than [7]. [sent-20, score-3.289]

15 The conversion to an internal representation in memory can be thought of as a noisy process where some elements of the image are changed probabilistically as described by α and β (Sec. [sent-35, score-0.376]

16 The image on the right illustrates a possible internal representation: the green and blue regions remain unchanged, while the red region is forgotten and the pink region is hallucinated. [sent-38, score-0.791]

17 2 Related work Large scale visual memory experiments [26, 25, 1, 13, 14, 28] have shown that humans can remember specific images they have seen among thousands of images, hours to days later, even after being exposed to each picture only once. [sent-40, score-0.333]

18 For instance, indoor spaces, pictures containing people, particularly if their face is visible, close up views on objects, animals, are more memorable than buildings, pictures of natural landscapes, and natural surfaces in general (like mountains, grass, field). [sent-44, score-0.319]

19 However, to date, there is no work which has attempted to predict which local information from an image is memorable or forgettable, in an automatic manner. [sent-45, score-0.455]

20 3 Modeling memorability using image regions We propose to predict memorability using a noisy memory process of encoding images in our memory, illustrated in Fig. [sent-46, score-2.154]

21 In our setting, an image consists of different types of image regions and features. [sent-48, score-0.635]

22 After a delay between the first and second presentation of an image, people are likely to remember some image regions and objects more than others. [sent-49, score-0.508]

23 For example, as shown in [7], people and close up views on objects tend to be more memorable than natural objects and regions of landscapes, suggesting for instance that an image region containing a person is less likely to be forgotten than an image region containing a tree. [sent-50, score-1.294]

24 It is well established that stored visual information decays over time [30, 31, 14], which can be represented in a model by a novel image vector with missing global and local information. [sent-51, score-0.368]

25 We postulate that the farther the stored representation of the image is from its veridical representation, the less likely it is to be remembered. [sent-52, score-0.321]

26 Here, we propose to model this noisy memorability process in a probabilistic framework. [sent-53, score-0.827]

27 We assume that the representation of an image is composed of image regions where different regions of an 2 image correspond to different sets of objects. [sent-54, score-0.965]

28 These regions have different probabilities of being forgotten and some regions have a probability of being imagined or hallucinated. [sent-55, score-0.382]

29 We postulate that the likelihood of an image to be remembered depends on the distance between the initial image representation and its internal degraded version. [sent-56, score-0.577]

30 An image with a larger distance to the internal representation is more likely to be forgotten, thereby the image should have a lower memorability score. [sent-57, score-1.386]

31 In our algorithm, we model this probabilistic process and show its effectiveness at predicting image memorability and at producing interpretable memorability maps. [sent-58, score-1.928]

32 1 Formulation Given some image Ij , we define its representation vj and vj as the external and internal represen˜ tation of the image respectively. [sent-60, score-0.789]

33 The external representation refers to the original image which is observed, while internal representation refers to the noisy representation of the same image that is stored in the observer’s memory. [sent-61, score-0.665]

34 Assume that there are N types of regions or objects an image can contain. [sent-62, score-0.467]

35 We define vj ∈ {0, 1}N as a binary vector of size N containing a 1 at index n when the corresponding region is present in image Ij and 0 otherwise. [sent-63, score-0.462]

36 Similarly, the internal representation consists of the same set of region types, but has different presence and absence values as memory is noisy. [sent-64, score-0.279]

37 In this setting, one of two things can happen when the external representation of an image is observed: (1) An image region that was shown is forgotten i. [sent-65, score-0.783]

38 vj (i) = 0 when vj (i) = 1, where vj (i) ˜ refers to the ith element of vj , or (2) An image region is hallucinated i. [sent-67, score-0.783]

39 an image region that did not exist in the image is believed to be present. [sent-69, score-0.581]

40 We expect this to happen with different probabilities for different types of image regions. [sent-70, score-0.283]

41 Therefore, we define two probability vectors α, β ∈ [0, 1]N , where αi corresponds to the probability of region type i being forgotten while βi corresponds to the probability of hallucinating a region of type i. [sent-71, score-0.42]

42 Using this representation, we define the distance between the internal and external representation as Dj = D(vj , vj ) = ||vj − vj ||1 . [sent-72, score-0.337]

43 Dj is inversely proportional to the memorability score of an image ˜ ˜ sj ; the higher the distance of an image in the brain from its true representation, the less likely it is to be remembered, i. [sent-73, score-1.336]

44 Thus, we can compute the expected distance E(Dj |vj ) of an image as: N v (i) αi j E(Dj |vj ) = 1−vj (i) ∗ βi T = vj α + (¬vj )T β (1) i=1 This represents the expected number of modifications in v from 1 to 0 (α) or from 0 to 1 (β). [sent-76, score-0.333]

45 β T T vM ¬vM where αi , βi ∈ [0, 1] and ∝rank represents that the proportionality is only related to the relative ranking of the image memorability scores, and M is the total number of images. [sent-83, score-1.078]

46 We do not explicitly predict a memorability score, rather the ranking of scores between images. [sent-84, score-0.885]

47 Implementation details: To generate the region types automatically, we randomly sample rectangular regions of arbitrary width and height from the training images. [sent-88, score-0.312]

48 Then we perform k-means clustering to learn the dictionary of region types as cluster centroids. [sent-93, score-0.236]

49 There may be more than one sampled region that corresponds to a particular region type. [sent-117, score-0.258]

50 In this case, we assume the dictionary of region types is given, and we simply assign the randomly sampled image regions to region types, and use the learned parameters (α, β) to compute a score. [sent-119, score-0.717]

51 2 Multiple feature integration We incorporate multiple attributes of each region type such as color, texture and gradient in the form of image features into our algorithm. [sent-121, score-0.62]

52 An image region is encoded using each feature dictionary independently, and the α, β parameters are learned jointly in our learning algorithm. [sent-125, score-0.426]

53 Subsequently, we use each set of α and β for individual features to construct memorability maps that are later combined using weighted pooling1 to produce an overall memorability map as shown in Fig. [sent-126, score-1.791]

54 4) that multiple feature integration helps to improve both the memorability score prediction and produce visually more consistent memorability maps. [sent-129, score-1.711]

55 2) and describe the experimental results on the image memorability dataset (Sec. [sent-134, score-1.053]

56 Experimental results show that our method outperforms state-of-the-art methods on this dataset while providing automatic memorability maps of images that compare favorably to when ground truth segmentation is used. [sent-137, score-1.056]

57 The images are fully annotated with segmented object regions and randomly sampled from different scene categories. [sent-141, score-0.273]

58 The images are cropped and resized to 256∗256 and a memorability score corresponding to each image is provided. [sent-142, score-1.173]

59 The memorability score is defined as the percentage of correct detections by participants in their study. [sent-143, score-0.887]

60 Algorithmic details: We sample 2000 patches per image with size 0. [sent-150, score-0.226]

61 To speed up convergence of SVM-Rank, we do not include rank constraints for memorability scores that lie within 0. [sent-155, score-0.889]

62 2 Image region attributes Our goal is to choose various features as attributes that human likely use to represent image regions. [sent-160, score-0.608]

63 The attributes are extracted for each region and assigned to a region type as described in Sec. [sent-162, score-0.351]

64 The descriptors for a given image region are max-pooled at 2 spatial pyramid levels[15] using Locality-Constrained Linear Coding (LLC) [29]. [sent-170, score-0.425]

65 [7] show that simple image color features, such as mean hue, saturation and intensity, only exhibits very weak correlation with memorability. [sent-174, score-0.331]

66 In contrast to this, color has been shown to yield excellent results in combination with shape features for image classification [11]. [sent-175, score-0.398]

67 Furthermore, many studies show that color names are actually linguistic labels that humans assign to color spectrum space. [sent-176, score-0.266]

68 In this paper, we use the color names feature [27] to better exploit the color information. [sent-177, score-0.255]

69 Then we learn a dictionary of size 100 and apply LLC at 2-level spatial pyramid to obtain the color descriptor for each region. [sent-179, score-0.245]

70 To encode visual texture perception information, we make use of the popular texture features – Local Binary Pattern [21] (LBP). [sent-181, score-0.284]

71 Saliency: Image saliency is a biologically inspired model to capture the regions that attract more visual attention and fixation focus [8]. [sent-183, score-0.281]

72 The descriptors for a given image region are max-pooled at 2 spatial pyramid levels using LLC. [sent-189, score-0.425]

73 Semantic: High-level semantic meaning contained in images has been shown to be strongly correlated to image memorability [7], where manual annotation of object labels lead to great performance in predicting image memorability. [sent-190, score-1.484]

74 Here, our goal is to design a fully automatic approach to predict image memorability, while still exploiting the semantic information. [sent-191, score-0.318]

75 The Top 20 row reports average measured memorability of the 20 highest predicted images. [sent-198, score-0.827]

76 Figure 3: Visualization of region types and corresponding α learned by our algorithm for gradient and semantic features. [sent-216, score-0.258]

77 The histograms represent the distribution of memorability scores corresponding to the particular region type. [sent-217, score-0.989]

78 The color of the bounding boxes corresponds to the memorability score of the image shown (using a jet color scheme). [sent-220, score-1.299]

79 3 Results In this section, we evaluate the performance of our model with single and multiple features, and later explore what the model has learned using memorability maps and the ranking of different types of image regions. [sent-222, score-1.192]

80 Further, we note that our model provides complementary information to global features as it focuses on local image regions, increasing performance by 2% when combined with our global features. [sent-228, score-0.341]

81 Despite using the same set of features, we are able to obtain performance gain suggesting that our algorithm is effective at capturing local information in the image that was overlooked by the global features. [sent-233, score-0.275]

82 Memorability maps: We obtain memorability maps using max-pooling of the α from different image regions. [sent-234, score-1.11]

83 4 shows the memorability maps obtained when using different features and the overall memorability map when combining multiple features. [sent-236, score-1.771]

84 From the images shown, we observe that there is no single attribute that is always effective at producing memorability maps, but the combination of the attributes leads to a significantly improved version. [sent-238, score-0.988]

85 Figure 4: Visualization of the memorability maps obtained using different features, and the overall memorability map. [sent-262, score-1.73]

86 Additionally, we also include the memorability map obtained when using ground truth segmentation on the right. [sent-263, score-0.87]

87 Figure 5: Additional examples of memorability maps generated by our algorithm. [sent-265, score-0.884]

88 3, we rank the image region types by their α value and visualize the regions for the corresponding region type when α is close to 0 or 1. [sent-267, score-0.712]

89 We observe that the region types are consistent with our intuition of what is memorable from [7]. [sent-268, score-0.346]

90 People often exist in image regions with low α (i. [sent-269, score-0.352]

91 Further, we analyze the image region types by computing the standard deviation of the memorability scores of the image regions that correspond to the particular type. [sent-272, score-1.644]

92 This suggests that our algorithm is effective at learning the regions with high and low probability of being forgotten as proposed in our framework. [sent-281, score-0.256]

93 Images are ranked by predicted memorability and plotted against the cumulative average of measured memorability scores. [sent-283, score-1.654]

94 α=0" Standard Deviation" 85 Standard Deviation" Average memorability for top N ranked images (%) Other Human [0. [sent-284, score-0.911]

95 5" α=1" (b) Standard deviation of memorability score of all region types averaged across the 25 splits for all features, sorted by α. [sent-298, score-1.098]

96 5" α=1" (c) Standard deviation of region types for Gradient feature averaged across the 25 splits. [sent-301, score-0.227]

97 Here, we propose a novel probabilistic framework for automatically constructing memorability maps, discovering regions in the image that are more likely to be memorable or forgettable by human observers. [sent-310, score-1.461]

98 Future development of such automatic algorithms of image memorability could have many exciting and far-reaching applications in computer science, graphics, media, designs, gaming and entertainment industries in general. [sent-312, score-1.098]

99 Object bank: A high-level image representation for scene classification & semantic feature sparsification. [sent-455, score-0.353]

100 Assessing the aesthetic quality of photographs using generic image descriptors. [sent-476, score-0.226]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('memorability', 0.827), ('image', 0.226), ('memorable', 0.16), ('isola', 0.133), ('forgotten', 0.13), ('region', 0.129), ('regions', 0.126), ('vj', 0.107), ('color', 0.105), ('texture', 0.085), ('images', 0.084), ('saliency', 0.082), ('attributes', 0.077), ('visual', 0.073), ('pictures', 0.071), ('memory', 0.064), ('objects', 0.058), ('types', 0.057), ('maps', 0.057), ('internal', 0.051), ('dictionary', 0.05), ('semantic', 0.047), ('remember', 0.046), ('pyramid', 0.045), ('automatic', 0.045), ('features', 0.041), ('brady', 0.04), ('forgettable', 0.04), ('konkle', 0.04), ('photo', 0.039), ('human', 0.037), ('external', 0.037), ('score', 0.036), ('alvarez', 0.035), ('representation', 0.035), ('scores', 0.033), ('spacing', 0.033), ('observers', 0.033), ('annotation', 0.032), ('humans', 0.032), ('psychology', 0.032), ('people', 0.031), ('dj', 0.031), ('cvpr', 0.031), ('xiao', 0.029), ('splits', 0.029), ('rank', 0.029), ('fade', 0.027), ('landscapes', 0.027), ('memorabilities', 0.027), ('psychonomic', 0.027), ('ssim', 0.027), ('torralba', 0.026), ('shape', 0.026), ('cognition', 0.026), ('interpretable', 0.026), ('ranking', 0.025), ('densely', 0.025), ('global', 0.025), ('gradient', 0.025), ('spatial', 0.025), ('scene', 0.024), ('vision', 0.024), ('names', 0.024), ('local', 0.024), ('automatically', 0.024), ('participants', 0.024), ('forget', 0.024), ('weijer', 0.024), ('predicting', 0.022), ('hog', 0.022), ('truth', 0.022), ('likely', 0.021), ('feature', 0.021), ('graphics', 0.021), ('ground', 0.021), ('remembered', 0.02), ('object', 0.02), ('deviation', 0.02), ('stored', 0.02), ('psychological', 0.02), ('descriptor', 0.02), ('individual', 0.02), ('khosla', 0.019), ('scenes', 0.019), ('segmented', 0.019), ('bank', 0.019), ('llc', 0.019), ('postulate', 0.019), ('overall', 0.019), ('pooling', 0.018), ('sun', 0.018), ('oliva', 0.017), ('lbp', 0.017), ('spearman', 0.017), ('constantly', 0.017), ('thousands', 0.017), ('indoor', 0.017), ('exposed', 0.017), ('type', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 210 nips-2012-Memorability of Image Regions

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

2 0.11521566 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

Author: Ryan Kiros, Csaba Szepesvári

Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1

3 0.090606719 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the stateof-the-art algorithms. 1

4 0.088969655 185 nips-2012-Learning about Canonical Views from Internet Image Collections

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

5 0.07731685 256 nips-2012-On the connections between saliency and tracking

Author: Vijay Mahadevan, Nuno Vasconcelos

Abstract: A model connecting visual tracking and saliency has recently been proposed. This model is based on the saliency hypothesis for tracking which postulates that tracking is achieved by the top-down tuning, based on target features, of discriminant center-surround saliency mechanisms over time. In this work, we identify three main predictions that must hold if the hypothesis were true: 1) tracking reliability should be larger for salient than for non-salient targets, 2) tracking reliability should have a dependence on the defining variables of saliency, namely feature contrast and distractor heterogeneity, and must replicate the dependence of saliency on these variables, and 3) saliency and tracking can be implemented with common low level neural mechanisms. We confirm that the first two predictions hold by reporting results from a set of human behavior studies on the connection between saliency and tracking. We also show that the third prediction holds by constructing a common neurophysiologically plausible architecture that can computationally solve both saliency and tracking. This architecture is fully compliant with the standard physiological models of V1 and MT, and with what is known about attentional control in area LIP, while explaining the results of the human behavior experiments.

6 0.067337595 344 nips-2012-Timely Object Recognition

7 0.066611312 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

8 0.065839134 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

9 0.065037243 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

10 0.064811788 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

11 0.064026453 202 nips-2012-Locally Uniform Comparison Image Descriptor

12 0.060859568 201 nips-2012-Localizing 3D cuboids in single-view images

13 0.060834069 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

14 0.059628922 303 nips-2012-Searching for objects driven by context

15 0.057845842 8 nips-2012-A Generative Model for Parts-based Object Segmentation

16 0.056830347 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

17 0.055622194 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

18 0.055392846 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

19 0.055173188 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

20 0.055155858 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.116), (1, 0.036), (2, -0.158), (3, -0.009), (4, 0.09), (5, -0.068), (6, 0.001), (7, -0.017), (8, 0.013), (9, 0.018), (10, -0.02), (11, 0.012), (12, 0.013), (13, -0.044), (14, -0.021), (15, 0.069), (16, 0.006), (17, -0.037), (18, -0.006), (19, -0.048), (20, 0.034), (21, -0.053), (22, -0.014), (23, 0.044), (24, -0.013), (25, 0.004), (26, -0.012), (27, 0.104), (28, 0.009), (29, 0.088), (30, -0.02), (31, 0.024), (32, 0.027), (33, -0.115), (34, 0.094), (35, 0.034), (36, -0.041), (37, 0.003), (38, 0.056), (39, -0.076), (40, 0.018), (41, -0.014), (42, -0.062), (43, 0.045), (44, 0.077), (45, -0.043), (46, 0.005), (47, 0.012), (48, 0.007), (49, 0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95680517 210 nips-2012-Memorability of Image Regions

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

2 0.79157025 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

Author: Shulin Yang, Liefeng Bo, Jue Wang, Linda G. Shapiro

Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the stateof-the-art algorithms. 1

3 0.76225132 185 nips-2012-Learning about Canonical Views from Internet Image Collections

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1

4 0.75499022 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

Author: Ryan Kiros, Csaba Szepesvári

Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1

5 0.75083077 202 nips-2012-Locally Uniform Comparison Image Descriptor

Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie

Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1

6 0.71616715 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

7 0.68752182 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

8 0.67313659 146 nips-2012-Graphical Gaussian Vector for Image Categorization

9 0.64469433 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

10 0.59923655 8 nips-2012-A Generative Model for Parts-based Object Segmentation

11 0.57346708 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

12 0.56598532 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

13 0.56254578 201 nips-2012-Localizing 3D cuboids in single-view images

14 0.55625427 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

15 0.52898973 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

16 0.52549386 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

17 0.52487946 40 nips-2012-Analyzing 3D Objects in Cluttered Images

18 0.52459556 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

19 0.5153774 303 nips-2012-Searching for objects driven by context

20 0.51355243 193 nips-2012-Learning to Align from Scratch


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.049), (3, 0.219), (17, 0.013), (21, 0.042), (38, 0.072), (39, 0.012), (42, 0.038), (54, 0.03), (55, 0.055), (60, 0.012), (74, 0.104), (76, 0.137), (80, 0.066), (92, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79503942 210 nips-2012-Memorability of Image Regions

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

2 0.69949603 304 nips-2012-Selecting Diverse Features via Spectral Regularization

Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar

Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1

3 0.67471939 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

4 0.67184418 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

Author: James T. Kwok, Ryan P. Adams

Abstract: Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model. 1

5 0.66741168 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

Author: Aaron Wilson, Alan Fern, Prasad Tadepalli

Abstract: We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection. 1

6 0.66468614 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

7 0.66422909 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

8 0.65996313 193 nips-2012-Learning to Align from Scratch

9 0.65931833 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

10 0.65901762 303 nips-2012-Searching for objects driven by context

11 0.6566965 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

12 0.65625024 168 nips-2012-Kernel Latent SVM for Visual Recognition

13 0.65416276 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

14 0.65365881 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

15 0.65279138 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

16 0.65252709 188 nips-2012-Learning from Distributions via Support Measure Machines

17 0.65104973 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

18 0.64915127 185 nips-2012-Learning about Canonical Views from Internet Image Collections

19 0.64869499 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

20 0.64750385 8 nips-2012-A Generative Model for Parts-based Object Segmentation