cvpr cvpr2013 cvpr2013-200 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Quannan Li, Jiajun Wu, Zhuowen Tu
Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. [sent-2, score-0.815]
2 Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. [sent-3, score-0.38]
3 Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. [sent-4, score-0.814]
4 Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. [sent-5, score-0.23]
5 In this paper, we propose a scheme to build a path from words to visual concepts; using this scheme, effective midlevel representations are automatically exploited from a large amount of web images. [sent-17, score-0.354]
6 A term, “classeme”, was introduced in [29] which also explores Internet images using word-based queries; however, only one classeme is learned for each category and the objective of the classeme work is to learn image-level representations. [sent-23, score-0.336]
7 Instead, our goal here is to learn a dictionary 8 8 845 4 9 1 9 of mid-level visual concepts for the purpose of performing general image understanding, which goes out of the scope of classeme [29] as it is computationally prohibitive for [29] to train on a large scale. [sent-24, score-0.873]
8 In [37], saliency detection is utilized to create bags of image patches, but only one object is assumed in each image for the task of object discovery. [sent-28, score-0.258]
9 Automatic Visual Concept Learning Starting from a pool of words, we crawl a large number of Internet images using the literal words as queries;patches are then sampled and visual concepts are learned in a weakly supervised manner. [sent-33, score-1.241]
10 Following our path of harvesting visual concepts from words, many algorithms can be used to learn the visual concepts. [sent-36, score-0.913]
11 In this paper, we adopt a simple scheme, using the max-margin formulation for multiple instance learning in [1] to automatically find positive midlevel patches; we then create visual concepts by performing K-means on the positive patches. [sent-37, score-1.023]
12 The visual concepts learned in this way are the mid-level representations of enormous Internet images with decent diversity, and can be used to encode novel images and to categorize novel categories. [sent-38, score-0.906]
13 The flow chart of our scheme of creating visual concepts from words. [sent-48, score-0.815]
14 For example, for the word “table”, besides the images of dinning tables, images of spreadsheets appear as well. [sent-59, score-0.29]
15 The diversity in these crawled images makes it inappropriate to train only a single classifier on the images, forcing us to investigate the multiple cluster property. [sent-61, score-0.226]
16 Saliency Guided Bag Construction The problem of visual concept learning is firstly unsupervised because we did not manually label or annotate the crawled images. [sent-65, score-0.338]
17 However, if we view the query words as the labels for the images, the problem can be formulated in a weakly supervised setting, making our problem more focused and easier to be tackled. [sent-66, score-0.221]
18 Firstly, we convert each image to a bag of image patches with size greater than or equal to 64 64 that are more likely ttoh carry rseeamtearn tthiac meanings. [sent-67, score-0.261]
19 oIn 6s4te×ad6 o4f t having randomly or densely sampled patches as th in [28], we adopt a saliency detection technique to reduce the search space. [sent-68, score-0.379]
20 3 shows sample saliency detection results (the top 5 saliency windows for each image) by [9], a window based saliency detection method. [sent-71, score-0.576]
21 3, we observe that within the top 5 saliency windows, objects such as airplanes, birds, caterpillars, crosses, dogs, and horses are covered by the saliency windows. [sent-73, score-0.328]
22 In our experiment, the top 50 salient windows are used as the instances of a positive bag directly. [sent-76, score-0.34]
23 Although the saliency assumption is reasonable, not all category images satisfy this assumption. [sent-81, score-0.197]
24 For example, for the images of “beach”, the salient windows only cover patterns such as birds, trees, and clouds (see the salient windows of the “beach” image in Fig. [sent-82, score-0.42]
25 Although these covered patterns are also related to “beach”, they cannot capture the scene as a whole because an image of “beach” is a mixture of visual concepts including sea, sky, and sands. [sent-84, score-0.831]
26 To avoid missing non-salient regions for a word, besides using the salient windows, we also randomly sample some image patches from non-salient regions. [sent-85, score-0.263]
27 As non-salient regions are often relatively uniform with less variation in the appearance, a smaller number of patches are sampled from the regions. [sent-86, score-0.215]
28 After the patches are sampled, we perform overlap checks between the image patches with similar scale is performed. [sent-87, score-0.344]
29 Iftwo patches are ofthe similar scale and have high overlap, one patch will be removed. [sent-88, score-0.222]
30 Each bag constructed in this way thus consists of patches from both salient and non-salient regions. [sent-89, score-0.352]
31 A portion of the patches may be unrelated to the word of interest, e. [sent-90, score-0.396]
32 , the patches corresponding to the sea in the image of “horse” in 888885555533111 Fig. [sent-92, score-0.202]
33 Such patches are uncommon for the word “horse”, and will be naturally filtered under the multiple instance learning framework. [sent-94, score-0.501]
34 Top five salient windows for images from 12 words. [sent-97, score-0.208]
35 Except for the words “sky”, “beach”, and “yard”, the patterns ofinterest can be covered by a few top salient windows. [sent-98, score-0.223]
36 Our Formulation To learn visual concepts from the bags constructed above, there are two basic requirements: 1) the irrelevant image patches should be filtered, and 2) the multiple cluster property of these visual patches should be investigated. [sent-102, score-1.33]
37 In multiple instance learning, the labeling information is significantly weakened as the labels are assigned only to the bags with latent instance level lables. [sent-106, score-0.23]
38 1 u푏a Visual Concept Learning via miSVM Using miSVM and assigning the literal words as the labels for the Internet images, visual concept learning for each word can be converted from an unsupervised learning prob- lem into a weakly supervised learning problem. [sent-112, score-0.884]
39 For a word 푘, its bag 퐵푖 is assigned with a label 푌푖 = 1. [sent-113, score-0.345]
40 For negative bags, we create a large negative bag 퐵−using a large amount of instances (patches) from words other than the word of interest. [sent-115, score-0.478]
41 Treh eth purpose uomf creating lt thhee large negative bag is to model the visual world, making the visual concepts learned for a word discriminant enough from the other words. [sent-117, score-1.267]
42 The positive patches related to the word are also automatically found by miSVM. [sent-120, score-0.464]
43 Given a patch, the linear SVM 푓푘 can output a confidence value indicating the relevance of the patch to the word of interest. [sent-121, score-0.317]
44 Therefore, the linear SVM 푓푘 itself can be treated as a visual concept that models the patches of a word as a whole. [sent-122, score-0.609]
45 Due to the embedded multi-cluster nature of diversity in the image concepts, a single classifier is insufficient to capture the diverse visual representations to a word concept. [sent-124, score-0.471]
46 Thus, we apply another step in our algorithm: the positive instances (patches) automatically identified by the single-concept classifier are clustered to form some codes 퐶푘 = {퐶푘1, 퐶2푘, . [sent-125, score-0.266]
47 t W Wfreo mca ltlh eth single-concept ic-lcalsussitfei-r er, each multi-cluster visual concept corresponds to a compact image concept. [sent-131, score-0.213]
48 Therefore, for each word 푘, we learn two types of visual concepts, the single-concept classifier and the multicluster visual concepts 퐶푘. [sent-132, score-1.247]
49 , 푓푀}, and a set of multi-cluster visual concepts {퐶푓 = {퐶푘 , 1} ≤ a 푘d ≤a e푀t} o. [sent-136, score-0.749]
50 f Tmhuel single-concept lcl caosnscifeieprtss a퐶nd = =the { 퐶vis,u1al concepts can b Teh applied -tcoo nnoceveplt images as the descriptors for categorization. [sent-137, score-0.662]
51 4, we illustrate the outputs of the single-concept classifiers on the images, as well as the assignments of patches to the multi-cluster visual concepts. [sent-139, score-0.388]
52 For clarity, for each word we cluster six multi-cluster visual concepts from the positive patches and assign them different colors randomly. [sent-140, score-1.234]
53 Illustration of the single-concept classifiers and the multi-cluster visual concepts for 6 words. [sent-142, score-0.845]
54 We then assign the patches labeled as positive to the six multi-cluster visual concepts in a nearest neighborhood manner and display the colors of the assigned visual concepts in the centers of the patches. [sent-147, score-1.738]
55 Though learned in a weakly supervised manner, the single-concept classifiers can predict rather well. [sent-150, score-0.277]
56 , for the word “building”, the walls of the left two images are different from the walls of the right three images. [sent-153, score-0.363]
57 On the contrary, the multi-cluster visual concepts can capture such differences. [sent-154, score-0.749]
58 The walls in the left two images of the word “building” have the same patten, and they are assigned to the same multi-cluster visual concept that has relatively sparse and rectangle windows (indicated in green). [sent-155, score-0.639]
59 The walls on the right three images have a different pattern and they are assigned to another visual concept that has square and denser windows (indicated in magenta). [sent-156, score-0.415]
60 For the word “balcony”, the columns are assigned to a multi-cluster visual concept indicated in yellow. [sent-157, score-0.469]
61 This illustrates that the single-concept classifiers and the multicluster visual concepts correspond to different aspects of images and complement each other. [sent-159, score-1.002]
62 Application for Image Classification As our visual concept representation has two components, the single-concept classifiers 퐹 = {푓1 , . [sent-162, score-0.309]
63 , 푓푀} and ×× ≤ ≤ tnheen tms,u tlhtei-c sliunsgteler- cviosnucaelp concepts 퐶rs =퐹 {=퐶 {푘푓 , 1 푘 }푀 an}d, we apply tchlues ttwero v components separately on 1n ≤ove 푘l images. [sent-165, score-0.629]
64 The single-concept classifier 푓푘 is applied to the densely sampled patches from the grids, and the responses of the classifiers are pooled in a max-pooling manner. [sent-167, score-0.341]
65 Since our method works on the patch level and the visual concepts are learned with image patches of different scales, two or three scales are enough for testing images. [sent-169, score-1.06]
66 In this way, for each novel image, we obtain a feature vector of dimension 푀 푛 21, iwmhaegree 푛 ies thbeta innu am fbeeatru orfe mveuclttoir-c olufs dtiemr venissuioaln concepts f21or, each word. [sent-172, score-0.629]
67 One is to train a linear classifier model for each visual concept, and apply the classifiers to the novel images. [sent-174, score-0.246]
68 In this paper, we simply use the basic scheme to illustrate the effectiveness of the visual concepts we learned. [sent-175, score-0.785]
69 Finally, the features corresponding to the single-concept classifiers and the multi-cluster visual concepts are combined like multiple kernel learning [2, 13]. [sent-176, score-0.916]
70 The kernels 퐾퐹 for the single-concept classifiers and 퐾퐶 for the multicluster visual concepts are computed respectively and combined linearly: 퐾 = 푤퐾퐹 + (1 − 푤)퐾퐶. [sent-177, score-0.969]
71 Experiments and Results On the PASCAL VOC 2007 [6], scene-15 [14], MIT indoor scene [27], UIUC-Sport [16] and Inria horse [10] image sets, we evaluate the visual concepts learned from the Internet images. [sent-181, score-1.008]
72 On these image sets, the visual concepts achieve the state-of-the-art performances, demonstrating its good cross-dataset generalization capability. [sent-182, score-0.749]
73 To create the visual concepts, on the patches labeled as positive by miSVM, 20 clusters are found using K-means; Thus, 716 20 = 14320 multi-cluster visual concepts are created f7o1r6 t×he2 7016 = w 14o3rd2s0. [sent-188, score-1.154]
74 The first codebook is created by quantizing the densely sampled multi-scale image patches from images of all the words. [sent-190, score-0.35]
75 As the two codebooks are created without using the saliency assumption and the multiple instance learning framework, they serve as two good baselines. [sent-193, score-0.316]
76 When applying the visual concepts to the dataset, image patches of three ×× scales 64 64, 128 128 and 192 192 are used. [sent-199, score-0.955]
77 Firstly, we compare the visual concepts with the two baselines KMS-ALL and KMS-SUB. [sent-201, score-0.749]
78 The multi-cluster visual concepts outperform both KMS-ALL and KMS-SUB, indicating that, the multicluster visual concepts learned are more effective. [sent-202, score-1.677]
79 By combining the single-concept classifiers and the multi-cluster visual concepts, the mAP is 57. [sent-204, score-0.216]
80 We also compare our visual concepts with the improved Fisher-kernel (FK), locality-constrained linear coding (LLC)[32], and vector quantization (VQ). [sent-206, score-0.779]
81 LLC [32] projects the patch descriptors to the local linear subspaces spanned by some visual words closest to the patch descriptors, and the feature vector is obtained by max-pooling the reconstruction weights. [sent-209, score-0.315]
82 From Table 1, we can observe that even though we do not use images from PASCAL VOC 2007 in the learning stage, the result of our visual concepts approach is comparable to that of the states-of-the-arts. [sent-213, score-0.82]
83 We investigate the complementariness of our visual concepts with the model learned from the images of the PASCAL VOC 2007 image set with advanced Fisher-kernels. [sent-214, score-0.887]
84 The kernel matrices of the visual concepts and the improved Fisher-Kernels are combined linearly. [sent-215, score-0.812]
85 This illustrates that our visual concepts do add extra information useful to the models learned from specific data sets. [sent-222, score-0.804]
86 Multiple clustered instance learning (MCIL) [34] investigates the multiple cluster property at the instance level in the MIL-Boost framework. [sent-235, score-0.267]
87 We applied MCIL to learn a mixture of 20 cluster classifiers for each word, and used the outputs of the cluster classifiers as the features to encode the novel images. [sent-236, score-0.298]
88 The reason is that, in MCIL, as the number of weak classifiers increases, the number of positive instances decreases dramatically and the cluster classifiers in MCIL learn little knowledge about the image set because of the lack of positive instances. [sent-238, score-0.357]
89 Also, there is no competition between the cluster classifiers in MCIL, making the multiple cluster property of the image data not fully investigated. [sent-239, score-0.235]
90 Scene Classification We evaluate the visual concepts in the task of scene classification on three scene image sets, Scene-15 [14], MIT indoor scene [27], and UIUC-Sport event [17]. [sent-240, score-0.949]
91 On the scene image sets, image patches of two scales 64 64, 128 128 are used. [sent-245, score-0.251]
92 , our vreissuualtls concepts approach outperforms KMS-ALL and KMS-SUB significantly. [sent-248, score-0.629]
93 Even though our visual concepts are learned in a weakly supervised manner, the visual concepts still outperform the detection models of object bank. [sent-250, score-1.679]
94 The main reason for the superiority of our visual concepts is that, while object bank tries to capture an object using a single detection model, our method can capture the multiple cluster property with 14, 200 visual concepts and can model the diversity of the Internet images. [sent-251, score-1.654]
95 On all the three scene image sets, our visual concepts perform comparably to VQ though we do not use the images from those image sets. [sent-291, score-0.827]
96 For VQ, with the number of codes increased, the performance will saturate: we have tested VQ with 24, 000 codes on the MIT indoor scene image set, and the accuracy is 47. [sent-294, score-0.282]
97 Inria Horse Image Set INRIA horse dataset contains 170 horse images and 170 background images taken from the Internet. [sent-297, score-0.254]
98 On this image set, the accuracy of our visual concepts is 92. [sent-299, score-0.749]
99 Conclusion In this paper, we have introduced a scheme to automatically exploit mid-level representations, called visual concepts, from large-scale Internet images retrieved using word-based queries. [sent-304, score-0.262]
100 From more than a quarter of a million images, over 14,000 visual concepts are automatically learned. [sent-305, score-0.814]
wordName wordTfidf (topN-words)
[('concepts', 0.629), ('word', 0.224), ('mcil', 0.174), ('patches', 0.172), ('saliency', 0.164), ('vq', 0.159), ('internet', 0.153), ('misvm', 0.149), ('classeme', 0.124), ('multicluster', 0.124), ('visual', 0.12), ('beach', 0.105), ('classifiers', 0.096), ('words', 0.095), ('horse', 0.094), ('concept', 0.093), ('salient', 0.091), ('vc', 0.089), ('bag', 0.089), ('codes', 0.086), ('windows', 0.084), ('literal', 0.074), ('llc', 0.068), ('instance', 0.067), ('weakly', 0.066), ('crawl', 0.066), ('indoor', 0.065), ('bags', 0.064), ('fk', 0.062), ('diversity', 0.061), ('supervised', 0.06), ('balcony', 0.058), ('pages', 0.056), ('codebook', 0.055), ('learned', 0.055), ('walls', 0.053), ('cluster', 0.053), ('toolbox', 0.051), ('patch', 0.05), ('autoencoder', 0.05), ('caterpillar', 0.05), ('complementariness', 0.05), ('ferry', 0.05), ('mvc', 0.05), ('sccs', 0.05), ('yard', 0.05), ('crawled', 0.049), ('imagenet', 0.049), ('created', 0.047), ('mit', 0.046), ('scene', 0.045), ('harvesting', 0.044), ('kwitt', 0.044), ('relevance', 0.043), ('sampled', 0.043), ('clustered', 0.042), ('bank', 0.042), ('voc', 0.041), ('pascal', 0.041), ('retrieved', 0.041), ('instances', 0.04), ('smo', 0.038), ('niu', 0.038), ('unsupervised', 0.038), ('learning', 0.038), ('patterns', 0.037), ('tiger', 0.037), ('positive', 0.036), ('scheme', 0.036), ('representations', 0.036), ('lanckriet', 0.035), ('midlevel', 0.035), ('bird', 0.035), ('wordnet', 0.034), ('cancer', 0.034), ('scales', 0.034), ('pyramid', 0.033), ('kernel', 0.033), ('competition', 0.033), ('quarter', 0.033), ('pandey', 0.033), ('images', 0.033), ('automatically', 0.032), ('pietik', 0.032), ('assigned', 0.032), ('queries', 0.031), ('birds', 0.031), ('fisher', 0.03), ('discriminant', 0.03), ('bing', 0.03), ('improved', 0.03), ('inria', 0.03), ('create', 0.03), ('classifier', 0.03), ('sea', 0.03), ('quattoni', 0.03), ('chart', 0.03), ('validating', 0.03), ('engines', 0.03), ('svm', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
Author: Quannan Li, Jiajun Wu, Zhuowen Tu
Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.
2 0.20234877 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
Author: Parthipan Siva, Chris Russell, Tao Xiang, Lourdes Agapito
Abstract: We propose a principled probabilistic formulation of object saliency as a sampling problem. This novel formulation allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. We then sample the object saliency map to propose object locations. We show that using only a single object location proposal per image, we are able to correctly select an object in over 42% of the images in the PASCAL VOC 2007 dataset, substantially outperforming existing approaches. Furthermore, we show that our object proposal can be used as a simple unsupervised approach to the weakly supervised annotation problem. Our simple unsupervised approach to annotating objects of interest in images achieves a higher annotation accuracy than most weakly supervised approaches.
3 0.1787582 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
4 0.17166823 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, backgroundness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.
5 0.17155512 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
Author: Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu
Abstract: We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on es- tablished co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.
6 0.17021012 202 cvpr-2013-Hierarchical Saliency Detection
9 0.13765793 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
10 0.13478146 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
11 0.13003126 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
12 0.13001974 325 cvpr-2013-Part Discovery from Partial Correspondence
13 0.12084485 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
14 0.12044435 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
15 0.11698215 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
16 0.1147171 133 cvpr-2013-Discriminative Segment Annotation in Weakly Labeled Video
18 0.11362593 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
19 0.11258411 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
20 0.10559818 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
topicId topicWeight
[(0, 0.219), (1, -0.169), (2, 0.138), (3, 0.111), (4, -0.004), (5, 0.036), (6, -0.074), (7, 0.004), (8, -0.028), (9, -0.001), (10, -0.061), (11, -0.008), (12, 0.047), (13, -0.012), (14, 0.024), (15, -0.044), (16, 0.016), (17, 0.016), (18, 0.053), (19, -0.064), (20, 0.098), (21, -0.02), (22, 0.091), (23, -0.073), (24, -0.047), (25, 0.071), (26, -0.015), (27, 0.056), (28, -0.014), (29, -0.061), (30, -0.048), (31, -0.011), (32, -0.024), (33, 0.083), (34, 0.072), (35, 0.048), (36, -0.046), (37, 0.042), (38, -0.05), (39, -0.092), (40, 0.014), (41, -0.063), (42, -0.041), (43, -0.032), (44, -0.012), (45, 0.09), (46, 0.003), (47, -0.056), (48, 0.038), (49, -0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.9542802 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
Author: Quannan Li, Jiajun Wu, Zhuowen Tu
Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.
2 0.73585808 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
Author: Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
Abstract: The Inverse Document Frequency (IDF) is prevalently utilized in the Bag-of-Words based image search. The basic idea is to assign less weight to terms with high frequency, and vice versa. However, the estimation of visual word frequency is coarse and heuristic. Therefore, the effectiveness of the conventional IDF routine is marginal, and far from optimal. To tackle thisproblem, thispaper introduces a novel IDF expression by the use of Lp-norm pooling technique. . edu . cn qit i @ c s an . ut s a . edu ? ? ? ? ? ? ? ? Carefully designed, the proposed IDF takes into account the term frequency, document frequency, the complexity of images, as well as the codebook information. Optimizing the IDF function towards optimal balancing between TF and pIDF weights yields the so-called Lp-norm IDF (pIDF). WpIDe sFho wwe ithghatts sth yeie clodsnv tehnetio son-acla IlDleFd i Ls a special case of our generalized version, and two novel IDFs, i.e. the average IDF and the max IDF, can also be derived from our formula. Further, by counting for the term-frequency in each image, the proposed Lp-norm IDF helps to alleviate the viismuaalg we,o trhde b purrosptionseesds phenomenon. Our method is evaluated through extensive experiments on three benchmark datasets (Oxford 5K, Paris 6K and Flickr 1M). We report a performance improvement of as large as 27.1% over the baseline approach. Moreover, since the Lp-norm IDF is computed offline, no extra computation or memory cost is introduced to the system at all.
3 0.70323157 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
Author: Lei Wang, Jianjia Zhang, Luping Zhou, Wanqing Li
Abstract: Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.
4 0.68074411 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
Author: Parthipan Siva, Chris Russell, Tao Xiang, Lourdes Agapito
Abstract: We propose a principled probabilistic formulation of object saliency as a sampling problem. This novel formulation allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. We then sample the object saliency map to propose object locations. We show that using only a single object location proposal per image, we are able to correctly select an object in over 42% of the images in the PASCAL VOC 2007 dataset, substantially outperforming existing approaches. Furthermore, we show that our object proposal can be used as a simple unsupervised approach to the weakly supervised annotation problem. Our simple unsupervised approach to annotating objects of interest in images achieves a higher annotation accuracy than most weakly supervised approaches.
5 0.67719436 183 cvpr-2013-GRASP Recurring Patterns from a Single View
Author: Jingchen Liu, Yanxi Liu
Abstract: We propose a novel unsupervised method for discovering recurring patterns from a single view. A key contribution of our approach is the formulation and validation of a joint assignment optimization problem where multiple visual words and object instances of a potential recurring pattern are considered simultaneously. The optimization is achieved by a greedy randomized adaptive search procedure (GRASP) with moves specifically designed for fast convergence. We have quantified systematically the performance of our approach under stressed conditions of the input (missing features, geometric distortions). We demonstrate that our proposed algorithm outperforms state of the art methods for recurring pattern discovery on a diverse set of 400+ real world and synthesized test images.
6 0.67275614 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
7 0.66051209 434 cvpr-2013-Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior
10 0.62773049 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
11 0.61713827 464 cvpr-2013-What Makes a Patch Distinct?
12 0.61007452 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
13 0.60982198 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
14 0.60294551 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
15 0.60285521 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
16 0.59510386 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
17 0.587767 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images
18 0.58433366 202 cvpr-2013-Hierarchical Saliency Detection
19 0.58370841 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
topicId topicWeight
[(10, 0.125), (16, 0.022), (26, 0.039), (33, 0.291), (67, 0.115), (69, 0.052), (76, 0.012), (77, 0.011), (80, 0.013), (81, 0.174), (87, 0.053)]
simIndex simValue paperId paperTitle
1 0.91872215 429 cvpr-2013-The Generalized Laplacian Distance and Its Applications for Visual Matching
Author: Elhanan Elboer, Michael Werman, Yacov Hel-Or
Abstract: The graph Laplacian operator, which originated in spectral graph theory, is commonly used for learning applications such as spectral clustering and embedding. In this paper we explore the Laplacian distance, a distance function related to the graph Laplacian, and use it for visual search. We show that previous techniques such as Matching by Tone Mapping (MTM) are particular cases of the Laplacian distance. Generalizing the Laplacian distance results in distance measures which are tolerant to various visual distortions. A novel algorithm based on linear decomposition makes it possible to compute these generalized distances efficiently. The proposed approach is demonstrated for tone mapping invariant, outlier robust and multimodal template matching.
same-paper 2 0.89673269 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
Author: Quannan Li, Jiajun Wu, Zhuowen Tu
Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.
3 0.89471418 351 cvpr-2013-Recovering Line-Networks in Images by Junction-Point Processes
Author: Dengfeng Chai, Wolfgang Förstner, Florent Lafarge
Abstract: The automatic extraction of line-networks from images is a well-known computer vision issue. Appearance and shape considerations have been deeply explored in the literature to improve accuracy in presence of occlusions, shadows, and a wide variety of irrelevant objects. However most existing works have ignored the structural aspect of the problem. We present an original method which provides structurally-coherent solutions. Contrary to the pixelbased and object-based methods, our result is a graph in which each node represents either a connection or an ending in the line-network. Based on stochastic geometry, we develop a new family of point processes consisting in sampling junction-points in the input image by using a Monte Carlo mechanism. The quality of a configuration is measured by a probability density which takes into account both image consistency and shape priors. Our experiments on a variety of problems illustrate the potential of our approach in terms of accuracy, flexibility and efficiency.
4 0.88472664 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.
5 0.88467431 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
6 0.88421643 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
7 0.88324475 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
8 0.88222206 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.88160741 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
10 0.87881106 414 cvpr-2013-Structure Preserving Object Tracking
11 0.87869734 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.87843019 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
13 0.87820566 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
14 0.87813681 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
15 0.87798977 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
16 0.87774122 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
17 0.87727475 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
18 0.87686318 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
19 0.87671131 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
20 0.87612689 438 cvpr-2013-Towards Pose Robust Face Recognition