cvpr cvpr2013 cvpr2013-374 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Long Mai, Yuzhen Niu, Feng Liu
Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.
Reference: text
sentIndex sentText sentNum sentScore
1 Saliency Aggregation: A Data-driven Approach Long Mai Yuzhen Niu Feng Liu Department of Computer Science, Portland State University Portland, OR, 97207 USA {mt long ,yuzhen , Abstract A variety of methods have been developed for visual saliency analysis. [sent-1, score-0.782]
2 This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. [sent-3, score-1.297]
3 Second, the performance of a saliency analysis method varies with individual images. [sent-6, score-0.929]
4 Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. [sent-7, score-1.495]
5 This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. [sent-8, score-0.382]
6 Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. [sent-9, score-2.094]
7 To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. [sent-10, score-0.974]
8 Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods. [sent-11, score-2.122]
9 Introduction Visual saliency measures low-level stimuli to the human vision system that grabs a viewer’s attention in the early stage of visual processing [17]. [sent-13, score-0.798]
10 It has been used in a wide range of computer vision, multimedia, and graphics applications, such as automatic object detection [16], image retrieval [23], video summarization [25], adaptive image compression [7], and content-aware image/video resizing [32]. [sent-14, score-0.033]
11 There is a rich literature on image saliency analysis [1, 2, 4–6, 8–13, 15, 17–20, 22, 24–31, 33, 35–45]. [sent-15, score-0.798]
12 These methods design a variety of biologically plausible models or use fl iu}@cs . [sent-16, score-0.015]
13 Individual saliency methods, such as GC [6], FT [2], and CA [12], often complement each other. [sent-20, score-0.827]
14 Saliency aggregation can effectively combine their results and perform better than each of them. [sent-21, score-0.397]
15 data-driven approaches to compute a saliency map from an image. [sent-22, score-0.81]
16 While these methods achieve good results statistically on public benchmarks, each of these methods has its own advantages and disadvantages. [sent-23, score-0.024]
17 More interestingly, different saliency methods can often complement each other. [sent-25, score-0.827]
18 Therefore, the aggregation of these saliency analysis results can likely outperform each individual one, as reported in a recent study [5]. [sent-26, score-1.28]
19 Their study shows that the combination of a few best-performed saliency analysis methods using pre-defined functions, such as averaging, can improve each individual one. [sent-27, score-0.916]
20 In this paper, we present a data-driven approach to saliency aggregation. [sent-28, score-0.782]
21 Our method combines saliency maps from various methods with diverged properties and large performance gaps. [sent-29, score-0.829]
22 To effectively combine these saliency maps, our approach uses machine learning methods to learn an aggregation model that appropriately determines the con- tribution of each individual method. [sent-30, score-1.349]
23 Specifically, we use a Conditional Random Field (CRF) framework [21] for 1 1 1 1 1 123 2 9 1 9 saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboring pixels. [sent-31, score-2.142]
24 It has been observed that the performance of each individual method varies over images. [sent-32, score-0.131]
25 Therefore, saliency aggregation should be customized to each individual image. [sent-33, score-1.282]
26 To account for the dependence of aggregation on individual image, our approach first selects from a training dataset a subset of similar images to the input image and trains the CRF aggregation model only using this subset instead of the whole training set. [sent-34, score-0.974]
27 Compared to standard aggregation methods that use predefined combination functions and treat each individual method equally, our data-driven method has the following advantages. [sent-35, score-0.529]
28 First, our method considers the performance gaps among individual saliency analysis methods and better determines their contribution in aggregation. [sent-36, score-0.999]
29 Second, our method considers that the performance of each individual saliency analysis method varies over images and is able to customize an appropriate aggregation model to each input image. [sent-37, score-1.349]
30 As more and more saliency analysis methods have been developed recently, our research provides a way to best use the existing and forthcoming saliency methods and allows the possibility for pushing forward the state-theart results in saliency analysis. [sent-38, score-2.378]
31 Saliency Aggregation Our method starts from running a set of m saliency analysis algorithms, {Mi ? [sent-40, score-0.798]
32 1 ≤ i ≤ m}, on a given image I, aynsids produces m saliency maps, { mSi} }? [sent-41, score-0.782]
33 , ,1 ≤n ai g≤iv emn} i,m one ef Ior, aeandch p algorithm. [sent-42, score-0.023]
34 i1n a saliency map encodes the saliency value at pixel p. [sent-44, score-1.656]
35 The saliency value in each map is normalized to [0, 1]. [sent-45, score-0.824]
36 Our goal is to take these m saliency maps as input and produce a final saliency map S. [sent-46, score-1.631]
37 This section begins with the standard aggregation methods from previous work that use pre-defined combination functions and then elaborates our data-driven saliency aggregation approaches. [sent-47, score-1.592]
38 Standard Saliency Aggregation To serve as our baseline method, we first apply the combination strategies from [5] to saliency aggregation. [sent-50, score-0.8]
39 We then discuss their performance to motivate our data-driven aggregation approaches. [sent-51, score-0.396]
40 1 ≤ i ≤ m} computed feronm a an image aI,l tehnec aggregated saliency ≤va mlu}e cSo(mp)at pixel p of I modeled as the probability is S(p) = P(yp = 1|S1(p) , S2 (p) , . [sent-53, score-0.862]
41 m=1ζ(Si(p) , (1) where Si (p) represents the saliency value of pixel p in the saliency map Si, yp is a binary random variable taking the value 1 if p is a salient pixel and 0 otherwise, and Z is a constant. [sent-56, score-2.018]
42 Following [5], we implemented three different options for the function ζ in Equation 1, including ζ1(x) = x, ζ2(x) = exp(x), and ζ3(x) =lo−g(1x). [sent-57, score-0.014]
43 (2) We used these standard aggregation methods to combine a range of saliency analysis methods and tested them on two public saliency benchmarks FT [2] and SS [3 1]. [sent-58, score-2.031]
44 Figure 2 (a) shows that when these methods are used to aggregate three best-performed methods, they can produce encouraging results. [sent-59, score-0.036]
45 On the FT benchmark, the aggregation methods produce comparable results to the best individual method, and on the SS benchmark, they outperform each individual one. [sent-60, score-0.597]
46 When the individual methods have large performance gaps, these standard aggregation methods produce less successful results, as shown in Figure 2 (b). [sent-61, score-0.497]
47 The main reason is that they do not consider the performance difference among individual methods and treat them equally. [sent-62, score-0.115]
48 Therefore, the low-performance individual methods compromise the aggregation result. [sent-63, score-0.496]
49 This happens even when only the best-performed methods are aggregated. [sent-64, score-0.014]
50 Data-driven Saliency Aggregation We observe that while various saliency analysis methods often complement each other, there are performance gaps among them. [sent-67, score-0.912]
51 Moreover, the performance of each method varies over individual images. [sent-68, score-0.131]
52 Therefore, saliency aggregation should be individual method-aware and individual image-aware. [sent-69, score-1.364]
53 We design data-driven approaches to achieve such saliency aggregation. [sent-70, score-0.782]
54 1 Pixel-wise Aggregation Our first method associates each pixel p with a feature vector x(p) = (S1(p) , S2 (p) , · · · , Sm (p)), where Si (p) is the saliency value at p in t(hpe) saliency map Si. [sent-73, score-1.658]
55 We also assign a binary random variable yp, which indicates whether the pixel is salient or not. [sent-74, score-0.066]
56 We compute the final saliency value S(p) as the posterior probability P(yp = 1|x(p)). [sent-76, score-0.796]
57 Specifically, we model P(yp = 1|x(p)) using th=e logistic m. [sent-77, score-0.02]
58 m+ 1} is the set ofmodel parameters wwhheicrhe weigh th|ie = =co 1n. [sent-87, score-0.062]
59 ) denotes the sigmoid function σ(z) = 1 + ex1p(−z) (4) The parameter λ can be learned using a standard logistic regression technique on the training data. [sent-91, score-0.034]
60 LIN, EXP, and LOG refer to the three ζ functions in Equation 2 that are used in the standard combination function defined in Equation 1, respectively. [sent-94, score-0.032]
61 can then appropriately account for the performance gaps among individual saliency methods. [sent-95, score-0.983]
62 2 Aggregation using Conditional Random Field One potential problem with estimating the saliency value for each pixel individually is its ignorance of the interaction between neighboring pixels. [sent-98, score-0.919]
63 Our second method addresses this problem by modeling saliency estimation using binary Conditional Random Field (CRF) [21]. [sent-99, score-0.799]
64 We use CRF to capture the relation between neighboring pixels. [sent-100, score-0.048]
65 Their method estimates saliency map directly using image features. [sent-103, score-0.81]
66 In contrast, our method uses CRF to aggregate saliency analysis results from multiple methods. [sent-104, score-0.819]
67 Like the pixel-wise aggregation method, we associate each node with a saliency feature vector x(p) = (S1(p) , S2 (p) , · · · , Sm(p)) and a binary random label yp, 1 for sali(epn)t a·n·d· ,0S for non-salient. [sent-106, score-1.164]
68 The saliency label of each pixel depends not only on its feature vector, but also the labels of neighboring pixels. [sent-107, score-0.886]
69 We use a grid-shaped CRF to model the relationship between the label and feature and the feature-dependent relationship between the labels of neighboring pixels. [sent-109, score-0.119]
70 We define the conditional distribution of labels Y = {yp|p ∈ I} on eth teh efe caotnurdeisti oXn =l {xp |ibpu ∈ti oIn} as lfaoblleolws Ys, P(Y |X;θ) = Z1exp(? [sent-110, score-0.097]
71 ∈Np where p is a pixel in image I, xp is its feature, and yp is its saliency label. [sent-115, score-1.241]
72 fd(xp, yp) is the feature function that defines the relationship between the feature and label. [sent-117, score-0.025]
73 fs (xp, xq, yp, yq) is another feature function that defines the feature-dependent relationship between the labels of neighboring pixels p and q. [sent-118, score-0.138]
74 We define the feature function fd(xp, yp) based on only the input saliency maps Si. [sent-122, score-0.806]
75 1 where {λi} is a subset of the CRF model parameters and wSih h(per)e ei s{ λthe} saliency vseatlu oef a tht pixel p imn otdhee saliency map aSndi. [sent-127, score-1.696]
76 The feature function fs (xp, xq, yp, yq) has two components to model the data-dependent relationship between the 1 1 1 1 1 13 3 3 31 1 labels of neighboring pixels. [sent-128, score-0.138]
77 Particularly, if a pixel takes a high saliency value than its neighbor in an individual saliency map, it is also likely to take a more salient label after aggregation. [sent-130, score-1.744]
78 =1αi(1(yp= 1,yq= 0)− (8) 1(yp = 0, yq = 1)) (Si(p) − Si (q)) where αi are CRF model parameters in this feature function. [sent-133, score-0.189]
79 fc(xp, xq , yp, yq) follows the idea from [24] to incorporate the observation that neighboring pixels with similar col- ors should have similar saliency labels. [sent-136, score-0.968]
80 is the color difference between pixel p ahnerde q Iin( pth)e − R IG(Bq) ? [sent-141, score-0.055]
wordName wordTfidf (topN-words)
[('saliency', 0.782), ('aggregation', 0.382), ('yp', 0.297), ('yq', 0.189), ('xp', 0.127), ('xq', 0.122), ('crf', 0.114), ('individual', 0.1), ('gaps', 0.069), ('neighboring', 0.048), ('complement', 0.045), ('fs', 0.044), ('fd', 0.04), ('portland', 0.04), ('fc', 0.037), ('si', 0.037), ('pixel', 0.035), ('isi', 0.033), ('appropriately', 0.032), ('conditional', 0.032), ('varies', 0.031), ('salient', 0.031), ('benchmarks', 0.03), ('dependence', 0.03), ('map', 0.028), ('fe', 0.028), ('trains', 0.028), ('sm', 0.027), ('relationship', 0.025), ('public', 0.024), ('maps', 0.024), ('ilso', 0.023), ('customize', 0.023), ('caotnurdeisti', 0.023), ('emn', 0.023), ('tehnec', 0.023), ('ofmodel', 0.023), ('meanpts', 0.023), ('ior', 0.023), ('diverged', 0.023), ('aeandch', 0.023), ('wwhheicrhe', 0.023), ('ft', 0.023), ('aggregated', 0.022), ('labels', 0.021), ('efe', 0.021), ('tribution', 0.021), ('sali', 0.021), ('otdhee', 0.021), ('aggregate', 0.021), ('interaction', 0.02), ('logistic', 0.02), ('np', 0.02), ('bq', 0.02), ('ahnerde', 0.02), ('ignorance', 0.02), ('niu', 0.02), ('combination', 0.018), ('customized', 0.018), ('selects', 0.018), ('resizing', 0.018), ('subset', 0.017), ('addresses', 0.017), ('associates', 0.017), ('gc', 0.017), ('oxn', 0.017), ('determines', 0.017), ('viewer', 0.016), ('ors', 0.016), ('stimuli', 0.016), ('pushing', 0.016), ('weigh', 0.016), ('analysis', 0.016), ('exp', 0.016), ('tht', 0.016), ('ss', 0.016), ('produce', 0.015), ('biologically', 0.015), ('hpe', 0.015), ('considers', 0.015), ('treat', 0.015), ('combine', 0.015), ('cso', 0.015), ('imn', 0.015), ('encodes', 0.015), ('summarization', 0.015), ('value', 0.014), ('motivate', 0.014), ('mai', 0.014), ('options', 0.014), ('happens', 0.014), ('field', 0.014), ('ys', 0.014), ('ca', 0.014), ('oin', 0.014), ('sigmoid', 0.014), ('compromise', 0.014), ('begins', 0.014), ('functions', 0.014), ('mp', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
Author: Long Mai, Yuzhen Niu, Feng Liu
Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.
Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin
Abstract: Driven by recent vision and graphics applications such as image segmentation and object recognition, assigning pixel-accurate saliency values to uniformly highlight foreground objects becomes increasingly critical. More often, such fine-grained saliency detection is also desired to have a fast runtime. Motivated by these, we propose a generic and fast computational framework called PISA Pixelwise Image Saliency Aggregating complementary saliency cues based on color and structure contrasts with spatial priors holistically. Overcoming the limitations of previous methods often using homogeneous superpixel-based and color contrast-only treatment, our PISA approach directly performs saliency modeling for each individual pixel and makes use of densely overlapping, feature-adaptive observations for saliency measure computation. We further impose a spatial prior term on each of the two contrast measures, which constrains pixels rendered salient to be compact and also centered in image domain. By fusing complementary contrast measures in such a pixelwise adaptive manner, the detection effectiveness is significantly boosted. Without requiring reliable region segmentation or post– relaxation, PISA exploits an efficient edge-aware image representation and filtering technique and produces spatially coherent yet detail-preserving saliency maps. Extensive experiments on three public datasets demonstrate PISA’s superior detection accuracy and competitive runtime speed over the state-of-the-arts approaches.
3 0.59191203 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
4 0.58572358 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, backgroundness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.
5 0.58416331 202 cvpr-2013-Hierarchical Saliency Detection
Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia
Abstract: When dealing with objects with complex structures, saliency detection confronts a critical problem namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. This issue is common in natural images and forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. The final saliency map is produced in a hierarchical model. Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. Our approach improves saliency detection on many images that cannot be handled well traditionally. A new dataset is also constructed. –
6 0.53020585 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
7 0.4366031 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
8 0.30179748 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images
9 0.27168509 418 cvpr-2013-Submodular Salient Region Detection
10 0.23424463 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
11 0.22068462 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes
12 0.21598595 325 cvpr-2013-Part Discovery from Partial Correspondence
13 0.18002678 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
14 0.13765793 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
15 0.12506483 263 cvpr-2013-Learning the Change for Automatic Image Cropping
16 0.088323772 464 cvpr-2013-What Makes a Patch Distinct?
17 0.084689915 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
18 0.080468304 326 cvpr-2013-Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation
19 0.066191003 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
20 0.064335793 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition
topicId topicWeight
[(0, 0.125), (1, -0.226), (2, 0.614), (3, 0.322), (4, -0.151), (5, -0.039), (6, -0.012), (7, -0.084), (8, 0.076), (9, 0.033), (10, -0.02), (11, 0.058), (12, -0.028), (13, 0.014), (14, -0.048), (15, 0.057), (16, -0.02), (17, -0.0), (18, -0.017), (19, -0.045), (20, -0.04), (21, -0.038), (22, 0.019), (23, 0.071), (24, -0.026), (25, 0.022), (26, -0.046), (27, -0.016), (28, 0.003), (29, 0.018), (30, -0.009), (31, 0.005), (32, -0.012), (33, -0.007), (34, -0.017), (35, -0.024), (36, -0.017), (37, -0.025), (38, 0.021), (39, -0.005), (40, 0.006), (41, -0.002), (42, -0.002), (43, -0.024), (44, 0.029), (45, -0.019), (46, 0.008), (47, 0.021), (48, -0.032), (49, -0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99116164 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
Author: Long Mai, Yuzhen Niu, Feng Liu
Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.
Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin
Abstract: Driven by recent vision and graphics applications such as image segmentation and object recognition, assigning pixel-accurate saliency values to uniformly highlight foreground objects becomes increasingly critical. More often, such fine-grained saliency detection is also desired to have a fast runtime. Motivated by these, we propose a generic and fast computational framework called PISA Pixelwise Image Saliency Aggregating complementary saliency cues based on color and structure contrasts with spatial priors holistically. Overcoming the limitations of previous methods often using homogeneous superpixel-based and color contrast-only treatment, our PISA approach directly performs saliency modeling for each individual pixel and makes use of densely overlapping, feature-adaptive observations for saliency measure computation. We further impose a spatial prior term on each of the two contrast measures, which constrains pixels rendered salient to be compact and also centered in image domain. By fusing complementary contrast measures in such a pixelwise adaptive manner, the detection effectiveness is significantly boosted. Without requiring reliable region segmentation or post– relaxation, PISA exploits an efficient edge-aware image representation and filtering technique and produces spatially coherent yet detail-preserving saliency maps. Extensive experiments on three public datasets demonstrate PISA’s superior detection accuracy and competitive runtime speed over the state-of-the-arts approaches.
3 0.90710455 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, backgroundness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.
4 0.87693882 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
5 0.85367996 202 cvpr-2013-Hierarchical Saliency Detection
Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia
Abstract: When dealing with objects with complex structures, saliency detection confronts a critical problem namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. This issue is common in natural images and forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. The final saliency map is produced in a hierarchical model. Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. Our approach improves saliency detection on many images that cannot be handled well traditionally. A new dataset is also constructed. –
6 0.84131789 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images
7 0.75547302 418 cvpr-2013-Submodular Salient Region Detection
8 0.74345845 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
9 0.74295789 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
10 0.49110892 263 cvpr-2013-Learning the Change for Automatic Image Cropping
11 0.4409219 464 cvpr-2013-What Makes a Patch Distinct?
12 0.41609341 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
13 0.3144339 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes
14 0.30579889 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
15 0.29706302 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
16 0.27602804 325 cvpr-2013-Part Discovery from Partial Correspondence
17 0.2066936 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition
18 0.16367993 321 cvpr-2013-PDM-ENLOR: Learning Ensemble of Local PDM-Based Regressions
19 0.1593888 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
20 0.15634128 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
topicId topicWeight
[(1, 0.211), (10, 0.076), (16, 0.014), (26, 0.034), (33, 0.223), (67, 0.169), (69, 0.049), (87, 0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.81655616 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
Author: Long Mai, Yuzhen Niu, Feng Liu
Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.
2 0.8095417 103 cvpr-2013-Decoding Children's Social Behavior
Author: James M. Rehg, Gregory D. Abowd, Agata Rozga, Mario Romero, Mark A. Clements, Stan Sclaroff, Irfan Essa, Opal Y. Ousley, Yin Li, Chanho Kim, Hrishikesh Rao, Jonathan C. Kim, Liliana Lo Presti, Jianming Zhang, Denis Lantsman, Jonathan Bidwell, Zhefan Ye
Abstract: We introduce a new problem domain for activity recognition: the analysis of children ’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1–2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3–5 minute child-adult interaction. In each session, the adult examiner followed a semistructured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.
3 0.80644614 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
4 0.80601025 383 cvpr-2013-Seeking the Strongest Rigid Detector
Author: Rodrigo Benenson, Markus Mathias, Tinne Tuytelaars, Luc Van_Gool
Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called “components ”), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETHand Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.
5 0.80558527 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.
6 0.80547011 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
7 0.80471992 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
8 0.80163032 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
9 0.79746753 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
10 0.79461515 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
11 0.79347664 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
12 0.79330677 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
13 0.78906482 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
14 0.78702062 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
15 0.78381336 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
16 0.7766695 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
17 0.77407438 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
18 0.76592797 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
19 0.76456434 438 cvpr-2013-Towards Pose Robust Face Recognition
20 0.76278245 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation