cvpr cvpr2013 cvpr2013-157 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
Reference: text
sentIndex sentText sentNum sentScore
1 cn , Abstract In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. [sent-4, score-1.52]
2 Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. [sent-5, score-0.72]
3 Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. [sent-6, score-0.888]
4 The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. [sent-7, score-0.783]
5 To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. [sent-8, score-0.475]
6 In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. [sent-9, score-0.758]
7 Introduction Measuring representativeness is an important basis for solving heuristic problems such as “How to choose five words to describe one of your friend? [sent-12, score-0.696]
8 In the early studies ofrepresentativeness heuristic, Kahneman and Tversky [1] expressed representativeness according to which the subjective probability ofan item is determined by the degree to which it is similar in essential characteristics to its parent set. [sent-15, score-0.857]
9 Based on this expression, the representativeness of an item in a given data set could be quantitatively measured ∗This work 2 {x j wang , was performed at Microsoft Research Asia le i zhang} @mi cro s o ft . [sent-16, score-0.759]
10 An illustration of representativeness function for b) Bayesian measure [6], c) naive prototypes, and d) our proposed model. [sent-18, score-0.734]
11 Taking the prototypes as a middle-level representation, our model characterizes visual representativeness based on not only the visible data(*) but also the potential negatives (o) inferred from ontological knowledge bases. [sent-19, score-1.136]
12 The paper focuses on three issues: 1) Given a set of images as positives, how to automatically acquire semantic reliable negatives; 2) How to discover all possible prototypes without knowing the precise number; 3) How to measure representativeness based on the customized image knowledge. [sent-20, score-1.235]
13 [6] extended the Bayesian measure which defines the representativeness of a sample from a distribution [5] to define a measure of the representativeness of an item to a set. [sent-25, score-1.455]
14 During the development of modern computer vision and multimedia technologies, especially for intelligent image browsing systems and search engines, people also investigate computational definition of representativeness to rank the retrieved images under keywords or example queries. [sent-26, score-0.811]
15 In [9], a graph-based representation, namely ImageKB, was proposed to efficiently organize semantic categories, entities and billions of images. [sent-31, score-0.117]
16 Given an pair in ImageKB, the representa- tiveness of an image is measured using semantic similarity, category relevance and representative confidence of its K nearest neighbors. [sent-32, score-0.201]
17 Generally, most of the related works are consistent with ([8]-[14]) or directly derived from ([6, 15]) the classic Bayesian definition of representativeness [5]: R(d,h) = logPP(h(h|d) = log? [sent-38, score-0.721]
18 nTohtee representativeness R(d, h) is the ratio of posterior to prior probability, which characterizes the extent to which the observation of d could increase the probability of h. [sent-44, score-0.725]
19 As shown in Figure 1, such statistical assumption leads to very coarse middlelevel representations of the items, and will possibly fail to provide an accurate representativeness function because the visual world is undoubtedly much more complex. [sent-51, score-0.753]
20 The formal definition of our visual representativeness is as follows. [sent-53, score-0.727]
21 Given x∗ as a test image, and Dr a reference data set containing multiple images gwei,th a snidm Dilar semantic concepts. [sent-54, score-0.141]
22 The representativeness of x∗ for Dr is defined as: R(x∗,Dr) =pp((xx∗∗||DDnr) =? [sent-55, score-0.696]
23 nrpp((xx∗∗||nr) pp((nr| DDrn) , (2) where r denotes a prototype of? [sent-57, score-0.276]
24 Twhhiesr ed refi dneintiootne sh aas p trwotoo advantages: 1d) nIn astperaodt ootyf palel possible alternatives, it only focuses on the reference data and its negatives; 2) Probability inferences are conducted within a small yet compact prototype space. [sent-59, score-0.3]
25 In the implementation phase, we focus on two main issues: 1) How to find negative references with reasonable semantic meanings; 2) How to discover all possible prototypes without knowing the pre- cise number. [sent-60, score-0.401]
26 For the first issue, we apply image ontology datasets, e. [sent-61, score-0.26]
27 ImageNet [16] and ImageKB [9], to get the required semantic structure and relevant image instances and then group them together to form a negative reference set. [sent-63, score-0.165]
28 For the other issue, we proposed a prototype discovery algorithm which automatically determines the number of prototypes based on an efficient statistical unimodality test. [sent-64, score-0.616]
29 In summary, this paper makes the following contributions: • We proposed a semantic embedded visual representatWiveen persosp moseodde al. [sent-65, score-0.148]
30 eAms a hybrid medoddeedl vdeisriuvaeld r efrpromese prototype theory and the Bayesian measure of representativeness, our model has a solid foundation from cognitive research on representativeness and use dynamic prototypes to get a flexible mid-level representation. [sent-66, score-1.326]
31 • Ontological knowledge bases are adopted to embed vOisntuoallo sgeicmalan ktincosw line gthee proposed afrdaompteewdo trok ewmhbicehd avoids modeling the complex visual world with fixed types of statistical distribution and limited number of parameters. [sent-67, score-0.128]
32 Together with the prototype theory, semantic embedding ensures a more accurate and meaningful measure of representativeness for general visual concepts. [sent-68, score-1.12]
33 In Section 2, we provide some background information, including semantic ontology databases and the prototype theory. [sent-71, score-0.653]
34 Section 3 then delivers the details of our semantic embedded representativeness model. [sent-72, score-0.813]
35 We present the experimental results in Section 4 and give a discussion in Section 5 focusing on the relationship between the proposed representativeness and other visual properties such as saliency. [sent-73, score-0.727]
36 Related Works Semantic Knowledge Base Semantic knowledge bases provide the relationships between words (WordNet[17]) or meaningful semantic concepts (NeedleSeek [18]) which establish the foundation of automatic semantic analysis for natural language processing and information retrieval. [sent-76, score-0.359]
37 [16] proposed a largescale hierarchical image database named ImageNet based on the structure of WordNet which serves as an image ontology base containing 21,841 WordNet synsets and over 14 million highly selected images (201 1 Fall release). [sent-78, score-0.295]
38 Compared to previous image datasets, ImageNet inherits the semantic hierarchy of WordNet and meanwhile provides high resolution images that are manually verified to contain the relevant concepts. [sent-79, score-0.117]
39 The ImageNet distance exploits semantic similarity measured through the ImageNet semantic hierarchy, which outperforms and goes beyond direct visual distances in traditional vision research. [sent-81, score-0.265]
40 The promising results [16, 19, 20] inspired us to embed semantic ontological knowledge into visual representativeness computation to make our model more reasonable and consistent with both semantic and appearance aspects of the visual world. [sent-82, score-1.148]
41 Prototype Theory In cognitive science, the prototype theory [21, 22] states that categories tends to be defined in terms of prototypes or prototypical instances that contain the attributes most representative of items inside and least representative of items outside a category. [sent-83, score-0.892]
42 Sufficiently specific categories can be defined as a single prototype represented by typical shapes and attributes [2, 19]. [sent-84, score-0.276]
43 As shown in Figure 1, using prototypes as an intermediate representation has two advantages: 1) consistent with the cognitive understanding of semantic categories and 2) leads to a continuous and well-bounded measure function. [sent-85, score-0.418]
44 Our model takes three kinds of input: 1) keywords that represent a concept, 2) keywords + the related images, and 3) unlabeled images. [sent-88, score-0.176]
45 Taking the keyword as a seed, we build a customized image ontology based on large-scale semantic ontology databases and image search engines. [sent-90, score-0.888]
46 The customized image ontology contains images for both the input concept as well as the confusable semantic neighbors (negative references). [sent-91, score-0.671]
47 Potential prototypes are mined by a dynamic prototype discovery algorithm which is designed for arbitrary data dis- Figure 2. [sent-92, score-0.543]
48 Finally we estimate the representativeness of related images by Eq. [sent-95, score-0.696]
49 Embedding Ontological Visual Semantics We embed visual semantics in the proposed model by customizing a small image ontology from semantic knowledge bases such as WordNet [17] and NeedleSeek [18]. [sent-99, score-0.479]
50 The customized image ontology is built and organized like ImageNet containing both the semantic entities and relevant images. [sent-100, score-0.539]
51 Note that ImageNet has covered over twenty thousand synsets of WordNet and most of its attached images have been manually verified, we can directly build our customized image ontology from ImageNet as long as the query entity is available. [sent-102, score-0.48]
52 Given an image set, the construction ofthe customized image ontology can be done by the following steps: Step 1: Locating the Semantic Concept If the concept (represented as keyword) of the image set is given, we directly go to Step 2. [sent-103, score-0.483]
53 If the concept is not specified, we obtain the keyword by simply applying the on-line annotation service from Google Image Search. [sent-104, score-0.147]
54 Step 2: Finding the Negative References Given the keyword, we search for the most related and confusable concepts on either WordNet or NeedleSeek. [sent-105, score-0.179]
55 Step 3: Building Customized Image Ontology For those concepts that are available on ImageNet, we directly attach the corresponding images as a part of our customized image ontology. [sent-107, score-0.243]
56 Dynamic Prototype Discovery In the above section, we have constructed a customized image ontology which consists of two groups of images including Dr (the original input image set) and Dn (the negcatluivdei n rgef Deren(tchee image asle it)n. [sent-111, score-0.422]
57 p Tth iem goal eot)f tahnids Dsection is to discover the prototypes from Dr and Dn as a middle-level representation footro our representativeness model. [sent-112, score-0.933]
58 Algorithm 1 shows the dynamic prototype discov- + ery algorithm in details. [sent-117, score-0.3]
59 Algorithm 1: Algorithm 1: Dynamic prototype discovery 1 2 3 4 Input: Dataset X = {xi}iN=1 , the initial number of prototypes =kin {itx, a splitting number m, a statistic significance level α for the unimodality test, threshold vthd for spliting the candidate prototype. [sent-122, score-0.625]
60 Output: Prototypes P = {pj }jk=1 and the conditional probability p(p|X) k ← kinitp ; Run k-means on X to obtain cluster centers C = {cj }jk=1 ; IRnuintia kl-imzee etahnes prototype sbetta by: Plus ← Ccen; repeat 16 17 18 19 for j = 1, . [sent-123, score-0.276]
61 Top: image set of the Golden Gate Bridge; Bottom: the discovered prototypes with conditional probability. [sent-128, score-0.204]
62 Our algorithm is able to incrementally discover semantically meaningful prototypes without knowing the exact number of potential topics. [sent-129, score-0.284]
63 In this case, the prototypes summarize the image set by environmental conditions. [sent-130, score-0.204]
64 Computing Representativeness We obtain the customized image ontology in Section 3. [sent-133, score-0.422]
65 1, and the prototypes with conditional probabilities in Section 3. [sent-134, score-0.204]
66 For simplicity, the conditional probability p(x∗ |r) is defined as the similarity between item x∗ and the underlying prototype r: p(x∗ |r) = exp{−λ|x∗ r|2}, (3) where λ is a scaling constant to keep p(x∗ |r) in a reasonable interval. [sent-137, score-0.339]
67 Based on all pre-computed terms, itnhea frienaaslo score for representativeness is computed according to Eq. [sent-138, score-0.696]
68 Note that, the user-click data are acquired from a different image search engine, specifically the Bing Image Search, in order to eliminate the potential ranking bias caused by the usage of query association data. [sent-145, score-0.125]
69 By taking user-clicks as votes for the representativeness of tested images, we define another evaluation metric based on the choices made by Web-users: ? [sent-147, score-0.696]
70 1 where iis the image index, Ri denotes the ranking position of image i, and UCi is the number of user clicks for test image irecorded in the query association log of Bing Search1 . [sent-151, score-0.17]
71 Similar as [6], we asked human subjects to label the representativeness score ranging from 1 to 10 (10 for the best) for each image used in the experiment. [sent-155, score-0.722]
72 Bayesian Model - The Bayesian Model [6] is a natural generalization of the cognitive theory of representativeness [5] and implemented based on Bayesian Sets [11] which is a statistical technique initially proposed for measuring how appropriately a new sample can fit into a given set of data. [sent-163, score-0.848]
73 , gxivNe}n ⊂ a dDat representing a conceptual group, Bayesian Sets measures th reep representativeness toufa a given sample x∗ ∈ {D\Ds} by the following equation: Bscore(x∗,Ds) =p(px(x∗)∗p,(DDs)s). [sent-168, score-0.696]
74 − − Naive Prototype Model -[2] We implemented the naive prototype model of representativeness following the procedure of [6]. [sent-200, score-1.01]
75 Given a dataset D, we select its prototype sample by: ? [sent-201, score-0.276]
76 The representativeness score is then defined as the similarity between the input and the prototype: NPT(x∗, Ds) = exp{−λ|x∗ − xproto|2}, (10) where xproto, represented as a BoW vector, is the prototype of Ds, and λ is a scaling constant which actually does not oafff Dect the ranking results. [sent-206, score-1.023]
77 Ranking Images on ImageNet ImageNet [16] is a large-scale image ontology dataset which provides us the essential knowledge of the visual world including not only semantic hierarchies but also the relevant image instances. [sent-209, score-0.431]
78 Although all images in ImageNet are manually verified to contain the relevant concepts, their quality and representativeness are still left un-labeled. [sent-210, score-0.696]
79 Practically, given a key- word, we search for the k most related words using a public available semantic ontology named NeedleSeek [18]. [sent-227, score-0.404]
80 Then, we crawl images by querying Google with all the related keywords to build a customized image ontology. [sent-228, score-0.25]
81 Based on the auto-built image ontology, we set up the representativeness model following the procedure in Section 3. [sent-229, score-0.696]
82 Note that this procedure can be applied to refine the results of commercial image search engines since it is fully automatic, semantic-aware, and psychological plausible. [sent-231, score-0.118]
83 We test the representativeness models with three concepts: Wolf(animal), Paris(city) and Rose (flower). [sent-232, score-0.696]
84 For each keyword, we crawled 200 images from Google Image Search3 to build the customized image ontology. [sent-234, score-0.162]
85 Related keywords of the tested concepts conceptrelated keywords at NeedleSeek [18] PRWaorsliefcbRPwhoasimredlf,stbmuloiBSbyeop,cdsralnitec,haylfre ,LoknsWda,xin ldteshvornim,glceTuotmy,nkogtldNeao,iwbBseylau,Dirsjcenablhgri,avMteor,nsbclaowrk,5. [sent-238, score-0.292]
86 Thus, our representativeness can be explained as “Likelihood + 2http : //www. [sent-263, score-0.696]
87 Unlike the saliency method, our model locates those regions which contain both salient and discriminative contents such as the golden roof of the Chinese Palace Museum and the huge pillars of the German Berlin Museum. [sent-273, score-0.147]
88 Saliency”, which favors the items that are not only well fitted into the observed concept but also remarkably salient to other related, confusable concepts. [sent-274, score-0.191]
89 Figure 6 shows some comparisons between our representativeness model × and AIM saliency (Attention by Information Maximization [3 1]) on natural images. [sent-275, score-0.769]
90 To show the real differences, we use the same features to compute the response map of our representativeness model. [sent-277, score-0.696]
91 the building corners and human bodies, whereas our model favors the representative components such as the golden roof and huge pillars which are indeed the most recognizable elements for eastern and western architectural styles. [sent-287, score-0.158]
92 Evaluation Bias In the experiment, our second evaluation metric SW explicitly characterizes the representativeness of a given image by the number ofuser-clicks the image has received. [sent-288, score-0.725]
93 The potential problem with this metric is that users might click an image according to their person- al interests instead of the real semantic relevance. [sent-290, score-0.186]
94 However, such imperfection does not affect the validity of this evaluation metric for comparing the relative performance of different representativeness models. [sent-292, score-0.696]
95 Conclusion In this paper, we have introduced a novel computational model for visual representativeness based on ontological semantic embedding and dynamic prototype discovery. [sent-298, score-1.25]
96 The embedded image ontology provides additional image statistics helping the model to identify true outliers. [sent-300, score-0.26]
97 Meanwhile, the intermediate prototype representation enhances the cognitive plausibil- ity of our model and ensures the accuracy and effectiveness of the probabilistic inference. [sent-301, score-0.373]
98 Experimental results demonstrate the superior performance of the proposed approach against the state-of-the-art representativeness models as well as commercial image search engines. [sent-302, score-0.758]
99 Testing a bayesian measure of representativeness using a large image database. [sent-349, score-0.775]
100 Sun: A bayesian framework for saliency using natural statistics. [sent-502, score-0.152]
wordName wordTfidf (topN-words)
[('representativeness', 0.696), ('prototype', 0.276), ('ontology', 0.26), ('prototypes', 0.204), ('imagenet', 0.164), ('customized', 0.162), ('needleseek', 0.141), ('semantic', 0.117), ('ontological', 0.106), ('cognitive', 0.097), ('keywords', 0.088), ('wordnet', 0.084), ('representative', 0.084), ('concepts', 0.081), ('bayesian', 0.079), ('saliency', 0.073), ('confusable', 0.071), ('unimodality', 0.071), ('log', 0.065), ('item', 0.063), ('subjective', 0.063), ('keyword', 0.062), ('concept', 0.061), ('google', 0.06), ('items', 0.059), ('abbott', 0.053), ('imagekb', 0.053), ('xproto', 0.053), ('ranking', 0.051), ('jj', 0.05), ('dr', 0.05), ('museum', 0.047), ('npt', 0.047), ('psychology', 0.047), ('sw', 0.043), ('discovery', 0.039), ('golden', 0.039), ('naive', 0.038), ('bm', 0.038), ('bscore', 0.035), ('conceptrelated', 0.035), ('harbin', 0.035), ('hongxun', 0.035), ('kahneman', 0.035), ('lihood', 0.035), ('ofrepresentativeness', 0.035), ('palace', 0.035), ('pillars', 0.035), ('synsets', 0.035), ('vthd', 0.035), ('xiaoshuai', 0.035), ('commercial', 0.035), ('psychological', 0.035), ('mining', 0.033), ('ds', 0.033), ('discover', 0.033), ('paris', 0.033), ('bing', 0.032), ('griffiths', 0.031), ('clicks', 0.031), ('heller', 0.031), ('dn', 0.031), ('visual', 0.031), ('bow', 0.03), ('theory', 0.029), ('ghahramani', 0.029), ('xij', 0.029), ('characterizes', 0.029), ('bruce', 0.027), ('search', 0.027), ('embed', 0.027), ('deng', 0.026), ('statistical', 0.026), ('lik', 0.026), ('skewed', 0.026), ('rational', 0.026), ('kennedy', 0.026), ('iconic', 0.026), ('asked', 0.026), ('practically', 0.025), ('classic', 0.025), ('potential', 0.024), ('reference', 0.024), ('click', 0.024), ('geographic', 0.024), ('service', 0.024), ('doersch', 0.024), ('negative', 0.024), ('dynamic', 0.024), ('knowledge', 0.023), ('knowing', 0.023), ('query', 0.023), ('negatives', 0.023), ('tenenbaum', 0.022), ('ss', 0.021), ('users', 0.021), ('bases', 0.021), ('bodies', 0.021), ('engine', 0.021), ('engines', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
2 0.25621802 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof
Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
3 0.11362593 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
Author: Quannan Li, Jiajun Wu, Zhuowen Tu
Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.
4 0.090312533 34 cvpr-2013-Adaptive Active Learning for Image Classification
Author: Xin Li, Yuhong Guo
Abstract: Recently active learning has attracted a lot of attention in computer vision field, as it is time and cost consuming to prepare a good set of labeled images for vision data analysis. Most existing active learning approaches employed in computer vision adopt most uncertainty measures as instance selection criteria. Although most uncertainty query selection strategies are very effective in many circumstances, they fail to take information in the large amount of unlabeled instances into account and are prone to querying outliers. In this paper, we present a novel adaptive active learning approach that combines an information density measure and a most uncertainty measure together to select critical instances to label for image classifications. Our experiments on two essential tasks of computer vision, object recognition and scene recognition, demonstrate the efficacy of the proposed approach.
5 0.086656749 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
6 0.084049381 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
7 0.082274795 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
8 0.079987593 202 cvpr-2013-Hierarchical Saliency Detection
9 0.072585016 325 cvpr-2013-Part Discovery from Partial Correspondence
10 0.071662918 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
11 0.071548603 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
12 0.071208738 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
13 0.07114362 220 cvpr-2013-In Defense of Sparsity Based Face Recognition
14 0.069571979 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
15 0.067048505 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
16 0.066191003 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
17 0.064576611 28 cvpr-2013-A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
18 0.061759416 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
19 0.060827609 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
20 0.059106901 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
topicId topicWeight
[(0, 0.121), (1, -0.071), (2, 0.06), (3, 0.048), (4, 0.028), (5, 0.013), (6, -0.056), (7, -0.004), (8, -0.023), (9, 0.013), (10, -0.005), (11, 0.018), (12, -0.014), (13, 0.012), (14, -0.028), (15, -0.037), (16, 0.018), (17, 0.005), (18, 0.014), (19, -0.052), (20, 0.028), (21, -0.02), (22, 0.014), (23, 0.042), (24, -0.025), (25, -0.001), (26, 0.009), (27, 0.032), (28, 0.013), (29, -0.02), (30, -0.028), (31, -0.001), (32, -0.063), (33, -0.002), (34, -0.017), (35, -0.037), (36, -0.099), (37, -0.001), (38, -0.102), (39, -0.09), (40, -0.072), (41, -0.06), (42, 0.003), (43, 0.041), (44, 0.055), (45, 0.051), (46, -0.005), (47, -0.017), (48, 0.018), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.91118276 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
2 0.68998885 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof
Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
3 0.6028688 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
Author: Quannan Li, Jiajun Wu, Zhuowen Tu
Abstract: Obtaining effective mid-level representations has become an increasingly important task in computer vision. In this paper, we propose a fully automatic algorithm which harvests visual concepts from a large number of Internet images (more than a quarter of a million) using text-based queries. Existing approaches to visual concept learning from Internet images either rely on strong supervision with detailed manual annotations or learn image-level classifiers only. Here, we take the advantage of having massive wellorganized Google and Bing image data; visual concepts (around 14, 000) are automatically exploited from images using word-based queries. Using the learned visual concepts, we show state-of-the-art performances on a variety of benchmark datasets, which demonstrate the effectiveness of the learned mid-level representations: being able to generalize well to general natural images. Our method shows significant improvement over the competing systems in image classification, including those with strong supervision.
4 0.57947886 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
Author: C. Lawrence Zitnick, Devi Parikh
Abstract: Relating visual information to its linguistic semantic meaning remains an open and challenging area of research. The semantic meaning of images depends on the presence of objects, their attributes and their relations to other objects. But precisely characterizing this dependence requires extracting complex visual information from an image, which is in general a difficult and yet unsolved problem. In this paper, we propose studying semantic information in abstract images created from collections of clip art. Abstract images provide several advantages. They allow for the direct study of how to infer high-level semantic information, since they remove the reliance on noisy low-level object, attribute and relation detectors, or the tedious hand-labeling of images. Importantly, abstract images also allow the ability to generate sets of semantically similar scenes. Finding analogous sets of semantically similar real images would be nearly impossible. We create 1,002 sets of 10 semantically similar abstract scenes with corresponding written descriptions. We thoroughly analyze this dataset to discover semantically important features, the relations of words to visual features and methods for measuring semantic similarity.
5 0.57745039 34 cvpr-2013-Adaptive Active Learning for Image Classification
Author: Xin Li, Yuhong Guo
Abstract: Recently active learning has attracted a lot of attention in computer vision field, as it is time and cost consuming to prepare a good set of labeled images for vision data analysis. Most existing active learning approaches employed in computer vision adopt most uncertainty measures as instance selection criteria. Although most uncertainty query selection strategies are very effective in many circumstances, they fail to take information in the large amount of unlabeled instances into account and are prone to querying outliers. In this paper, we present a novel adaptive active learning approach that combines an information density measure and a most uncertainty measure together to select critical instances to label for image classifications. Our experiments on two essential tasks of computer vision, object recognition and scene recognition, demonstrate the efficacy of the proposed approach.
6 0.56916547 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
7 0.56394845 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
8 0.55671936 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
9 0.55276173 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
10 0.53831518 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
11 0.53107649 134 cvpr-2013-Discriminative Sub-categorization
12 0.52640426 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
13 0.52034479 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
14 0.5122233 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
15 0.50655323 406 cvpr-2013-Spatial Inference Machines
16 0.50474656 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill
17 0.49884745 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
18 0.4971737 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
19 0.49657893 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images
20 0.49057719 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes
topicId topicWeight
[(10, 0.097), (14, 0.297), (16, 0.026), (26, 0.046), (33, 0.225), (67, 0.058), (69, 0.054), (80, 0.015), (87, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.75912863 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
2 0.73475391 169 cvpr-2013-Fast Patch-Based Denoising Using Approximated Patch Geodesic Paths
Author: Xiaogang Chen, Sing Bing Kang, Jie Yang, Jingyi Yu
Abstract: Patch-based methods such as Non-Local Means (NLM) and BM3D have become the de facto gold standard for image denoising. The core of these approaches is to use similar patches within the image as cues for denoising. The operation usually requires expensive pair-wise patch comparisons. In this paper, we present a novel fast patch-based denoising technique based on Patch Geodesic Paths (PatchGP). PatchGPs treat image patches as nodes and patch differences as edge weights for computing the shortest (geodesic) paths. The path lengths can then be used as weights of the smoothing/denoising kernel. We first show that, for natural images, PatchGPs can be effectively approximated by minimum hop paths (MHPs) that generally correspond to Euclidean line paths connecting two patch nodes. To construct the denoising kernel, we further discretize the MHP search directions and use only patches along the search directions. Along each MHP, we apply a weightpropagation scheme to robustly and efficiently compute the path distance. To handle noise at multiple scales, we conduct wavelet image decomposition and apply PatchGP scheme at each scale. Comprehensive experiments show that our approach achieves comparable quality as the state-of-the-art methods such as NLM and BM3D but is a few orders of magnitude faster.
3 0.70040542 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
4 0.6992209 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
Author: Anton Milan, Konrad Schindler, Stefan Roth
Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
5 0.66591012 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
6 0.66319561 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
7 0.6629678 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
8 0.66145313 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
9 0.66090631 325 cvpr-2013-Part Discovery from Partial Correspondence
10 0.66063839 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
11 0.66038144 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
12 0.66023499 414 cvpr-2013-Structure Preserving Object Tracking
13 0.65999669 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
14 0.6598013 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
15 0.65976715 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
16 0.65965241 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
17 0.65946507 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
18 0.65895087 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
19 0.65881956 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
20 0.65858388 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image