nips nips2011 nips2011-293 knowledge-graph by maker-knowledge-mining

293 nips-2011-Understanding the Intrinsic Memorability of Images


Source: pdf

Author: Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Abstract: Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember. While it may seem like image memorability is purely subjective, recent work shows that it is not an inexplicable phenomenon: variation in memorability of images is consistent across subjects, suggesting that some images are intrinsically more memorable than others, independent of a subjects’ contexts and biases. In this paper, we used the publicly available memorability dataset of Isola et al. [13], and augmented the object and scene annotations with interpretable spatial, content, and aesthetic image properties. We used a feature-selection scheme with desirable explaining-away properties to determine a compact set of attributes that characterizes the memorability of any individual image. We find that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful scenes are not. Contrary to popular belief, unusual or aesthetically pleasing scenes do not tend to be highly memorable. This work represents one of the first attempts at understanding intrinsic image memorability, and opens a new domain of investigation at the interface between human cognition and computer vision. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we used the publicly available memorability dataset of Isola et al. [sent-7, score-0.781]

2 [13], and augmented the object and scene annotations with interpretable spatial, content, and aesthetic image properties. [sent-8, score-0.55]

3 We used a feature-selection scheme with desirable explaining-away properties to determine a compact set of attributes that characterizes the memorability of any individual image. [sent-9, score-0.964]

4 We find that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful scenes are not. [sent-10, score-0.605]

5 In a recent paper [13], we quantified the memorability of 2222 photographs as the rate at which subjects detect a repeat presentation of the image a few minutes after its initial presentation. [sent-21, score-1.025]

6 The memorability of these images was found to be consistent across subjects and across a variety of contexts, making some of these images intrinsically more memorable than others, independent of the subjects’ past experiences or biases. [sent-22, score-1.38]

7 Thus, while image memorability may seem like a quality that is hard to quantify, our recent work suggests that it is not an inexplicable phenomenon. [sent-23, score-0.904]

8 19 U (a) m A (b) ↑U ↓M (c) ↓U ↑M (d) (e) ↑A ↓M (f) ↓A ↑M (g) (h) ↑m ↓M (i) ↓m ↑M Figure 2: Distribution of memorability M of photographs with respect to unusualness U (left), aesthetics A (middle) and subjects’ guess on how memorable an image is m (right). [sent-25, score-1.268]

9 All 2222 images from the memorability dataset were rated along these three aspects by 10 subjects each. [sent-26, score-1.012]

10 Contrary to popular belief, unusual and aesthetically pleasing images are not predominantly the most memorable ones. [sent-27, score-0.444]

11 Clearly, which images are memorable is not intuitive, as seen by poor estimates from subjects (g). [sent-31, score-0.501]

12 But then again, subjective intuitions of what make an image memorable may need to be revised. [sent-32, score-0.414]

13 Surprisingly, as shown in Figure 2, we find that these are weakly correlated (and, in fact, negatively correlated) with memorability as measured in [13]. [sent-37, score-0.782]

14 Further, when subjects were asked to rate how memorable they think an image would be, their responses were weakly (negatively) correlated to true memorability (Figure 2)! [sent-38, score-1.278]

15 While our previous work aimed at predicting memorability [13], here we aim to better understand memorability. [sent-39, score-0.789]

16 Any realistic use of the memorability of images requires an understanding of the key factors that underly memorability; be it for cognitive scientists to discover the mechanisms behind memory or for advertisement designers to create more effective visual media. [sent-40, score-0.984]

17 Thus, the goal of this paper is to identify a collection of human-understandable visual attributes that are highly informative about image memorability. [sent-41, score-0.367]

18 First, we annotate the memorability dataset [13] with interpretable and semantic attributes. [sent-42, score-0.814]

19 Second, we employ a greedy feature selection algorithm with desirable explaining-away properties that allows us to explicitly determine a compact set of characteristics that make an image memorable. [sent-43, score-0.294]

20 As most of us would expect, image memorability depends on the user context and is likely to be subject to some inter-subject variability [12]. [sent-46, score-0.877]

21 This suggests that there is something intrinsic to images that make some more memorable than others, and in [13] we developed a computer vision algorithm to predict this intrinsic memorability. [sent-48, score-0.507]

22 Attributes are attractive because they allow for transferlearning among categories that share attributes [18]. [sent-54, score-0.233]

23 In this work, we exploit attributes to understand which properties of an image make it memorable. [sent-56, score-0.327]

24 Predicting image properties: While image memorability is vastly unexplored, many other photographic properties have been studied in the literature, such as photo quality [21], saliency [14], attractiveness [20], composition [10, 24], color harmony [5], and object importance [29]. [sent-57, score-1.107]

25 [7], who use attributes to predict the aesthetic quality of an image. [sent-59, score-0.371]

26 Towards the goal of improved prediction, they use a list of attributes known to influence the aesthetic quality of an image. [sent-60, score-0.367]

27 In our work, since it is not known what makes an image 1 Images (a,d,e) are among the most memorable images in our dataset, while (b,c,f) are among the least. [sent-61, score-0.512]

28 photo (j) ↓peaceful Figure 3: Example images depicting varying values of a subset of attributes annotated by subjects. [sent-64, score-0.394]

29 memorable, we use an exhaustive list of attributes, and use a feature selection scheme to identify which attributes make an image memorable. [sent-65, score-0.453]

30 3 Attribute annotations We investigate memorability using the memorability dataset from [13]. [sent-66, score-1.667]

31 The dataset consists of 2222 natural images of everyday scenes and events selected from the SUN dataset [32], as well as memorability scores for each image. [sent-67, score-1.004]

32 The memorability scores were obtained via 665 subjects playing a ‘memory game’ on Amazon’s Mechanical Turk. [sent-68, score-0.866]

33 The memorability score of an image corresponds to the number of subjects that correctly detected a repeat presentation of the image. [sent-71, score-0.986]

34 The images in the memorability dataset come from ∼700 scene categories [32]. [sent-75, score-1.017]

35 While the scene and object categories depicted in an image may very well influence its memorability, there are many other properties of an image that could be at play. [sent-77, score-0.399]

36 To get a handle on these, we constructed an extensive list of image properties or attributes, and had the 2222 images annotated with these properties using Amazon’s Mechanical Turk. [sent-78, score-0.267]

37 An organization of the attributes collected is shown in Table 1. [sent-79, score-0.207]

38 ’, while multi-valued attributes (on a scale of 1-5) are listed with a ‘;’. [sent-81, score-0.253]

39 Each image was annotated by 10 subjects for each of the attributes. [sent-82, score-0.229]

40 The ‘Length of description’ attribute was computed as the average number of words subjects used to describe the image (free-form). [sent-84, score-0.3]

41 The spatial layout attributes were based on the work of Oliva and Torralba [23]. [sent-85, score-0.207]

42 Many of the aesthetic attributes are based on the work of Dhar et al. [sent-86, score-0.342]

43 However even among images containing people, there is a variation in memorability that is consistent across subjects (split half rank correlation = 0. [sent-89, score-1.089]

44 In an effort to better understand memorability of images containing people, we collected several attributes that are specific to people. [sent-91, score-1.086]

45 The annotations of these attributes were collected only on images containing people (and are considered to be absent for images not containing people). [sent-93, score-0.657]

46 Some of the people-attributes were referring to the entire image (‘whole image’) while others were referring to each person in the image (‘per-person’). [sent-100, score-0.311]

47 The per-person attributes were aggregated across all subjects and all people in the image. [sent-101, score-0.393]

48 3 Table 1: General attributes Spatial layout: Enclosed space vs. [sent-103, score-0.207]

49 Magnitude of correlation with memorability For further analysis, we utilize the most freEnclosed space quent 106 of the ∼1300 objects present in the 0. [sent-150, score-0.818]

50 36 Person: face visible images (their presence, count, area in the imPerson: eye contact age, and for a subset of these objects, area Sky Number of people in image occupied in four quadrants of the image), 237 of the ∼700 scene categories, and the 127 attributes listed in Tables 1 and 2. [sent-151, score-0.741]

51 We also Attributes append image annotations with a scene hiScenes Objects erarchy provided with the SUN dataset [32] that groups similar categories into a metacategory (e. [sent-152, score-0.387]

52 The scene hierarchy resulted in 19 additional scene meta-categories, while 922 Features the object hierarchy resulted in 134 additional meta-categories. [sent-155, score-0.267]

53 We see that the attributes have a total of 923 features. [sent-158, score-0.207]

54 face vistures that characterizes the memorability of ible and eye contact), suggesting a need for our feature an image. [sent-162, score-0.835]

55 4 Feature selection Our goal is to identify a compact set of features that characterizes the memorability of an image. [sent-166, score-0.91]

56 Hence, it becomes crucial that our feature selection algorithm has explaining away properties so as to determine a set of distinct characteristics that make an image memorable. [sent-171, score-0.247]

57 If a naive feature selection approach picked ‘haircolor’ as an informative feature, it would be unclear whether the mere presence or absence of a person in the image is what contributes to memorability, or if the color of the hair really matters. [sent-177, score-0.316]

58 Employing an information-theoretic approach to feature selection allows us to naturally capture both these goals: selecting a compact set of non-redundant features and calibrating features based on the information they contain. [sent-244, score-0.313]

59 1 Information-theoretic We formulate our problem as that of selecting features that maximize mutual information with memorability, such that the total number of bits required to encode all selected features (i. [sent-246, score-0.356]

60 C(F ) ≤ B (1) where F is a subset of the features, I (F ; M ) is the mutual information between F and memorability M , B is the budget (in bits), and C(F ) is the total number of bits required to encode F . [sent-251, score-0.982]

61 We assume that each feature is encoded independently, and thus n C(fi ), fi ∈ F C(F ) = (2) i=1 where C(fi ) is the number of bits required to encode feature fi , computed as H(fi ), the entropy of feature fi across the training images. [sent-252, score-0.423]

62 The algorithm selects features with the maximum ratio of improvement in mutual information to their cost, while the total cost of the features does not exceed the allotted budget. [sent-264, score-0.304]

63 5 However, this lazy-greedy approach still requires the computation of mutual information between memorability and subsets of features. [sent-270, score-0.818]

64 At each iteration, the additional information provided by a candidate feature fi over an existing set of features F would be the following: IG (fi ) = I (F ∪ fi ; M ) − I (F ; M ) (3) This computation is not feasible given our large number of features and limited training data. [sent-271, score-0.404]

65 Hence, we greedily add features that maximize an approximation to the mutual information between a subset of features and memorability, as also employed by Ullman et al. [sent-272, score-0.315]

66 Intuitively, this ensures that the feature selected at each iteration maximizes the per-bit minimal gain in mutual information over each of the individual features already selected. [sent-275, score-0.25]

67 Given a budget B, we first greedily add features using a budget of 2B, and then greedily remove features (that reduce the mutual information the least) until we fall within the allotted budget B. [sent-277, score-0.718]

68 Feature selection within the realm of a predictive model allows us to better capture features that achieve a concrete and practical measure of performance: “which set of features allows us to make the best predictions about an image’s memorability? [sent-284, score-0.325]

69 ” While selecting such features would be computationally expensive to do over all our 923 features, using a pruned set of features obtained via information-theoretic selection makes this feasible. [sent-285, score-0.259]

70 In the greedy feature selection approach, the budget of bits, which is interpretable, can be explicitly enforced. [sent-295, score-0.258]

71 5 Results Attribute annotations help: We first tested the degree to which each general feature-type annotation in our feature set is effective at predicting memorability. [sent-296, score-0.215]

72 We split the dataset from [13] into 2/3 training images scored by half the subjects and 1/3 test images scored by the left out half of the subjects. [sent-297, score-0.453]

73 For the new attributes we introduced, and for the object and scene hierarchy 6 features, we used RBF kernels, while for the rest of the features we used the same kernel functions as in [13]. [sent-299, score-0.469]

74 We found that our new Table 3: Performance (rank correlation) attributes annotations performed quite well (ρ = 0. [sent-302, score-0.336]

75 528): of different types of features at predicting they outperform higher dimensional object and scene image memorability. [sent-303, score-0.391]

76 494 features in our set according to the feature selection alScene annotations 0. [sent-306, score-0.336]

77 We selected reduced feature sets by both running information-theoretic selection and predictive selection on our 2/3 training splits, for budgets ranging from 1 to 100-bits. [sent-312, score-0.267]

78 45 Rank corr For predictive selection, we further split our training set in half and trained SVRs on one half to predict memorability on the other half. [sent-316, score-0.928]

79 At each iteration of selection, we greedily selected the feature that maximized predictive performance averaged over 3 random splits trials, with predictive performance again measured as rank correlation between predictions and ground truth memorabilities. [sent-317, score-0.343]

80 Taking this union, rather than just the features selected at a 100-bit budget, ensures that candidates were not missed when they are only effective in small budget sets. [sent-323, score-0.245]

81 Both selection algorithms create feature sets that are similarly effective at predicting memorability (Figure 5). [sent-336, score-0.89]

82 We created a final list of features by run- Table 4: Information-theoretic and predictive feaning the above feature selection methods ture selections for a budget of 10 bits. [sent-344, score-0.447]

83 Correlations on the entire dataset (no held out data) with memorability are listed after each feature (arrow for a budget of 10 bits. [sent-345, score-0.991]

84 For example, on the far right we have highly memorable “pictures of people in an enclosed space” and on the far left we have forgettable “peaceful, open, unfamiliar spaces, devoid of people. [sent-361, score-0.408]

85 63 (a) Hierarchical clustering Figure 6: Hierarchical clustering of images in ‘memorability space’ as achieved via a regression-tree [2], along with examples images from each cluster. [sent-379, score-0.244]

86 Memorability of each cluster given at the leaf nodes, and also depicted as shade of cluster image borders (darker borders correspond to lower memorability than brighter borders). [sent-380, score-0.943]

87 The only previous work predicting memorability is our recent paper [13]. [sent-382, score-0.789]

88 In that paper, we made predictions on the basis of a suite of global image features – pixel histograms, GIST, SIFT, HOG, SSIM [13]. [sent-383, score-0.226]

89 Here we attempt to do better by using our selected features as an abstraction layer between raw images and memorability. [sent-386, score-0.257]

90 We trained a suite of SVRs to predict annotations from images, and Table 5: Performance (rank another SVR to predict memorability from these predicted anno- correlation) of automatic tations. [sent-387, score-0.944]

91 memorability prediction For the annotation types, we used the feature types selected by our methods. [sent-389, score-0.84]

92 468 set in half and predicted annotations for one half by training on the Indirect 0. [sent-393, score-0.205]

93 479 We then trained a final SVR to predict memorability on the test set in three ways: 1) using only image features (Direct), 2) using only predicted annotations (Indirect), and 3) using both (Direct + Indirect) (Table 5). [sent-396, score-1.141]

94 We augmented the object and scene annotations of the dataset of Isola et al. [sent-402, score-0.286]

95 [13] with attribute annotations describing the spatial layout, content, and aesthetic properties of the images. [sent-403, score-0.335]

96 We employed a greedy feature selection scheme to obtain compact lists of features that are highly informative about memorability and highly predictive of memorability. [sent-404, score-1.077]

97 We found that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful settings are not. [sent-405, score-0.579]

98 Through this work, we have begun to uncover some of the core features that contribute to image memorability. [sent-407, score-0.226]

99 We hope that by parsing memorability into a concise and understandable set of attributes, we have provided a description that will interface well with other domains of knowledge and may provide fodder for future theories and applications of memorability. [sent-409, score-0.757]

100 High level describable attributes for predicting aesthetics and interestingness. [sent-474, score-0.28]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('memorability', 0.757), ('memorable', 0.27), ('attributes', 0.207), ('aesthetic', 0.135), ('peaceful', 0.135), ('annotations', 0.129), ('images', 0.122), ('image', 0.12), ('budget', 0.11), ('subjects', 0.109), ('features', 0.106), ('scene', 0.088), ('people', 0.077), ('attribute', 0.071), ('fi', 0.069), ('predictive', 0.066), ('photo', 0.065), ('enclosed', 0.061), ('mutual', 0.061), ('feature', 0.054), ('isola', 0.054), ('bits', 0.054), ('sky', 0.052), ('hair', 0.048), ('person', 0.047), ('selection', 0.047), ('greedy', 0.047), ('listed', 0.046), ('object', 0.045), ('greedily', 0.042), ('aesthetics', 0.041), ('konkle', 0.041), ('pleasant', 0.041), ('svrs', 0.041), ('unusualness', 0.041), ('visual', 0.04), ('selections', 0.039), ('photographs', 0.039), ('oliva', 0.038), ('indirect', 0.038), ('memory', 0.038), ('half', 0.038), ('leskovec', 0.037), ('engaging', 0.036), ('dhar', 0.036), ('svr', 0.036), ('brady', 0.036), ('correlation', 0.036), ('visible', 0.035), ('interpretable', 0.033), ('funny', 0.033), ('borders', 0.033), ('torralba', 0.032), ('predicting', 0.032), ('allotted', 0.031), ('vision', 0.03), ('alvarez', 0.029), ('selected', 0.029), ('predict', 0.029), ('krause', 0.028), ('intrinsic', 0.028), ('understanding', 0.027), ('advertisers', 0.027), ('aesthetically', 0.027), ('beauty', 0.027), ('ig', 0.027), ('inexplicable', 0.027), ('jewelry', 0.027), ('leyvand', 0.027), ('memorabilities', 0.027), ('photographers', 0.027), ('professional', 0.027), ('teenager', 0.027), ('unpleasant', 0.027), ('vistas', 0.027), ('rank', 0.027), ('characteristics', 0.026), ('scenes', 0.026), ('xiao', 0.026), ('categories', 0.026), ('objects', 0.025), ('unusual', 0.025), ('negatively', 0.025), ('list', 0.025), ('psychology', 0.024), ('others', 0.024), ('face', 0.024), ('budgets', 0.024), ('senior', 0.024), ('artists', 0.024), ('dataset', 0.024), ('subjective', 0.024), ('submodular', 0.024), ('hierarchy', 0.023), ('splits', 0.023), ('think', 0.022), ('spearman', 0.022), ('strange', 0.022), ('contact', 0.022), ('everyday', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 293 nips-2011-Understanding the Intrinsic Memorability of Images

Author: Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Abstract: Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember. While it may seem like image memorability is purely subjective, recent work shows that it is not an inexplicable phenomenon: variation in memorability of images is consistent across subjects, suggesting that some images are intrinsically more memorable than others, independent of a subjects’ contexts and biases. In this paper, we used the publicly available memorability dataset of Isola et al. [13], and augmented the object and scene annotations with interpretable spatial, content, and aesthetic image properties. We used a feature-selection scheme with desirable explaining-away properties to determine a compact set of attributes that characterizes the memorability of any individual image. We find that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful scenes are not. Contrary to popular belief, unusual or aesthetically pleasing scenes do not tend to be highly memorable. This work represents one of the first attempts at understanding intrinsic image memorability, and opens a new domain of investigation at the interface between human cognition and computer vision. 1

2 0.13987486 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

Author: Vicente Ordonez, Girish Kulkarni, Tamara L. Berg

Abstract: We develop and demonstrate automatic image description methods using a large captioned photo collection. One contribution is our technique for the automatic collection of this new dataset – performing a huge number of Flickr queries and then filtering the noisy results down to 1 million images with associated visually relevant captions. Such a collection allows us to approach the extremely challenging problem of description generation using relatively simple non-parametric methods and produces surprisingly effective results. We also develop methods incorporating many state of the art, but fairly noisy, estimates of image content to produce even more pleasing results. Finally we introduce a new objective performance measure for image captioning. 1

3 0.10170352 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen

Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1

4 0.091107346 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition

Author: Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon

Abstract: We introduce P I C O D ES: a very compact image descriptor which nevertheless allows high performance on object category recognition. In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built. Instead, the training images defining the category are supplied at query time. We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance. In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes. In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance. Optimization directly for binary features is difficult and nonconvex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. P I C O D ES of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.

5 0.08389166 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

6 0.07742548 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

7 0.07457757 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

8 0.073537461 154 nips-2011-Learning person-object interactions for action recognition in still images

9 0.070574626 35 nips-2011-An ideal observer model for identifying the reference frame of objects

10 0.069133021 261 nips-2011-Sparse Filtering

11 0.06748613 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

12 0.064188763 303 nips-2011-Video Annotation and Tracking with Active Learning

13 0.062645614 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

14 0.059776273 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

15 0.059679382 244 nips-2011-Selecting Receptive Fields in Deep Networks

16 0.058573078 127 nips-2011-Image Parsing with Stochastic Scene Grammar

17 0.057187729 165 nips-2011-Matrix Completion for Multi-label Image Classification

18 0.053350121 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

19 0.053151671 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

20 0.051164288 168 nips-2011-Maximum Margin Multi-Instance Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.142), (1, 0.098), (2, -0.068), (3, 0.124), (4, 0.068), (5, 0.025), (6, 0.01), (7, 0.046), (8, -0.0), (9, 0.046), (10, 0.016), (11, -0.024), (12, 0.008), (13, 0.022), (14, 0.038), (15, -0.014), (16, -0.013), (17, 0.067), (18, 0.007), (19, -0.0), (20, -0.044), (21, -0.0), (22, -0.018), (23, 0.027), (24, -0.004), (25, -0.009), (26, 0.054), (27, -0.042), (28, 0.044), (29, 0.09), (30, -0.014), (31, 0.007), (32, -0.012), (33, 0.001), (34, 0.065), (35, -0.003), (36, 0.027), (37, -0.021), (38, -0.058), (39, 0.005), (40, 0.049), (41, 0.08), (42, -0.065), (43, 0.025), (44, -0.033), (45, 0.045), (46, -0.052), (47, -0.034), (48, 0.015), (49, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9251796 293 nips-2011-Understanding the Intrinsic Memorability of Images

Author: Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Abstract: Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember. While it may seem like image memorability is purely subjective, recent work shows that it is not an inexplicable phenomenon: variation in memorability of images is consistent across subjects, suggesting that some images are intrinsically more memorable than others, independent of a subjects’ contexts and biases. In this paper, we used the publicly available memorability dataset of Isola et al. [13], and augmented the object and scene annotations with interpretable spatial, content, and aesthetic image properties. We used a feature-selection scheme with desirable explaining-away properties to determine a compact set of attributes that characterizes the memorability of any individual image. We find that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful scenes are not. Contrary to popular belief, unusual or aesthetically pleasing scenes do not tend to be highly memorable. This work represents one of the first attempts at understanding intrinsic image memorability, and opens a new domain of investigation at the interface between human cognition and computer vision. 1

2 0.74672139 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

Author: Vicente Ordonez, Girish Kulkarni, Tamara L. Berg

Abstract: We develop and demonstrate automatic image description methods using a large captioned photo collection. One contribution is our technique for the automatic collection of this new dataset – performing a huge number of Flickr queries and then filtering the noisy results down to 1 million images with associated visually relevant captions. Such a collection allows us to approach the extremely challenging problem of description generation using relatively simple non-parametric methods and produces surprisingly effective results. We also develop methods incorporating many state of the art, but fairly noisy, estimates of image content to produce even more pleasing results. Finally we introduce a new objective performance measure for image captioning. 1

3 0.72109896 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen

Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1

4 0.68087125 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation

Author: Fahad S. Khan, Joost Weijer, Andrew D. Bagdanov, Maria Vanrell

Abstract: We describe a novel technique for feature combination in the bag-of-words model of image classification. Our approach builds discriminative compound words from primitive cues learned independently from training images. Our main observation is that modeling joint-cue distributions independently is more statistically robust for typical classification problems than attempting to empirically estimate the dependent, joint-cue distribution directly. We use Information theoretic vocabulary compression to find discriminative combinations of cues and the resulting vocabulary of portmanteau1 words is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. State-of-theart results on both the Oxford Flower-102 and Caltech-UCSD Bird-200 datasets demonstrate the effectiveness of our technique compared to other, significantly more complex approaches to multi-cue image representation. 1

5 0.66870463 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

6 0.66396588 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

7 0.65122563 154 nips-2011-Learning person-object interactions for action recognition in still images

8 0.64598328 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

9 0.64590609 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

10 0.64167935 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

11 0.62680978 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition

12 0.58862847 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

13 0.58671308 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

14 0.55975342 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

15 0.55372488 35 nips-2011-An ideal observer model for identifying the reference frame of objects

16 0.53975058 127 nips-2011-Image Parsing with Stochastic Scene Grammar

17 0.51862711 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

18 0.50748211 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability

19 0.49962431 130 nips-2011-Inductive reasoning about chimeric creatures

20 0.49790034 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.017), (4, 0.037), (20, 0.04), (26, 0.012), (31, 0.057), (33, 0.038), (43, 0.036), (45, 0.508), (57, 0.037), (65, 0.013), (74, 0.031), (83, 0.025), (84, 0.013), (99, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99399692 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data

Author: Youwei Zhang, Laurent E. Ghaoui

Abstract: Sparse PCA provides a linear combination of small number of features that maximizes variance across data. Although Sparse PCA has apparent advantages compared to PCA, such as better interpretability, it is generally thought to be computationally much more expensive. In this paper, we demonstrate the surprising fact that sparse PCA can be easier than PCA in practice, and that it can be reliably applied to very large data sets. This comes from a rigorous feature elimination pre-processing result, coupled with the favorable fact that features in real-life data typically have exponentially decreasing variances, which allows for many features to be eliminated. We introduce a fast block coordinate ascent algorithm with much better computational complexity than the existing first-order ones. We provide experimental results obtained on text corpora involving millions of documents and hundreds of thousands of features. These results illustrate how Sparse PCA can help organize a large corpus of text data in a user-interpretable way, providing an attractive alternative approach to topic models. 1

2 0.99203575 63 nips-2011-Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

Author: Mark Schmidt, Nicolas L. Roux, Francis R. Bach

Abstract: We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems. 1

same-paper 3 0.99176723 293 nips-2011-Understanding the Intrinsic Memorability of Images

Author: Phillip Isola, Devi Parikh, Antonio Torralba, Aude Oliva

Abstract: Artists, advertisers, and photographers are routinely presented with the task of creating an image that a viewer will remember. While it may seem like image memorability is purely subjective, recent work shows that it is not an inexplicable phenomenon: variation in memorability of images is consistent across subjects, suggesting that some images are intrinsically more memorable than others, independent of a subjects’ contexts and biases. In this paper, we used the publicly available memorability dataset of Isola et al. [13], and augmented the object and scene annotations with interpretable spatial, content, and aesthetic image properties. We used a feature-selection scheme with desirable explaining-away properties to determine a compact set of attributes that characterizes the memorability of any individual image. We find that images of enclosed spaces containing people with visible faces are memorable, while images of vistas and peaceful scenes are not. Contrary to popular belief, unusual or aesthetically pleasing scenes do not tend to be highly memorable. This work represents one of the first attempts at understanding intrinsic image memorability, and opens a new domain of investigation at the interface between human cognition and computer vision. 1

4 0.99022615 28 nips-2011-Agnostic Selective Classification

Author: Yair Wiener, Ran El-Yaniv

Abstract: For a learning problem whose associated excess loss class is (β, B)-Bernstein, we show that it is theoretically possible to track the same classification performance of the best (unknown) hypothesis in our class, provided that we are free to abstain from prediction in some region of our choice. The (probabilistic) volume of this √ rejected region of the domain is shown to be diminishing at rate O(Bθ( 1/m)β ), where θ is Hanneke’s disagreement coefficient. The strategy achieving this performance has computational barriers because it requires empirical error minimization in an agnostic setting. Nevertheless, we heuristically approximate this strategy and develop a novel selective classification algorithm using constrained SVMs. We show empirically that the resulting algorithm consistently outperforms the traditional rejection mechanism based on distance from decision boundary. 1

5 0.98947102 272 nips-2011-Stochastic convex optimization with bandit feedback

Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f (x) at any query point x ∈ X . We demonstrate √ a generalization of the ellipsoid algorithm that √ incurs O(poly(d) T ) regret. Since any algorithm has regret at least Ω( T ) on this problem, our algorithm is optimal in terms of the scaling with T . 1

6 0.98804468 143 nips-2011-Learning Anchor Planes for Classification

7 0.9850204 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

8 0.953444 190 nips-2011-Nonlinear Inverse Reinforcement Learning with Gaussian Processes

9 0.92994112 19 nips-2011-Active Classification based on Value of Classifier

10 0.92026693 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition

11 0.91604424 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

12 0.90892875 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers

13 0.90010512 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing

14 0.89977938 220 nips-2011-Prediction strategies without loss

15 0.89826608 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent

16 0.89733648 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time

17 0.89229965 78 nips-2011-Efficient Methods for Overlapping Group Lasso

18 0.89168388 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

19 0.88782597 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

20 0.88406432 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation