cvpr cvpr2013 cvpr2013-258 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor
Abstract: During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel methodfor video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previousframe. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.
Reference: text
sentIndex sentText sentNum sentScore
1 Learning video saliency from human gaze using candidate selection Dmitry Rudoy Dan B Goldman Technion Adobe Research Haifa, Israel Seattle, WA dmit ry . [sent-1, score-1.345]
2 com Abstract During recent years remarkable progress has been made in visual saliency modeling. [sent-4, score-0.543]
3 Therefore, video saliency estimation methods should differ substantially from image saliency methods. [sent-8, score-1.188]
4 In this paper we propose a novel methodfor video saliency estimation, which is inspired by the way people watch videos. [sent-9, score-0.685]
5 We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previousframe. [sent-10, score-0.749]
6 Another application that might take advantage of human gaze prediction is video editing [1]: knowing where the viewer looks could help to create smoother shot transitions. [sent-17, score-0.683]
7 Moreover, we hypothesize that reliable gaze prediction may drive gaze-aware video compression or key-frame selection [15]. [sent-18, score-0.581]
8 Image saliency is well explored in the computer vision community. [sent-19, score-0.543]
9 The saliency maps overlayed on the images show that video saliency is tighter and more concentrated on a single object, while image saliency covers several interesting locations. [sent-30, score-1.76]
10 The difference between human fixations when viewing a static image versus a video frame is exemplified in Figure 1. [sent-33, score-0.578]
11 In this work we propose a method that predicts saliency by explicitly accounting for gaze transitions over time. [sent-35, score-1.101]
12 Rather than trying to model where people look in each frame independently, we predict the gaze location given the previous frame’s fixation map. [sent-36, score-0.834]
13 In this way, we handle interframe dynamics of the gaze transitions, along with within- frame salient locations. [sent-37, score-0.666]
14 To this end we learn a model that predicts a saliency map for a frame given the fixation map from a recent preceding moment and test it on a large set of realistic videos. [sent-38, score-0.97]
15 A key contribution of this work is the observation that saliency in video is typically very sparse and computing it at each and every pixel is redundant. [sent-39, score-0.645]
16 Instead, we select a set of candidate gaze locations, and compute saliency only at these locations. [sent-40, score-1.194]
17 We verify experimentally that our candidate-based approach outperforms the pixel based approach, and is significantly better than an image saliency 1 1 1 1 1 14 4 47 5 5 based approach. [sent-42, score-0.573]
18 Since a video is a stream of frames, the human gaze in each frame depends on the previous gaze locations. [sent-44, score-1.177]
19 Later, Koch and Ullman [20] proposed a feedforward model for the integration, along with the concept of a saliency map a measure of visual attraction of every point in the scene. [sent-64, score-0.582]
20 Since then much progress in image saliency has been made. [sent-67, score-0.543]
21 Seo and Milanfar [26] propose using self-resemblance in both static and space-time saliency detection. [sent-76, score-0.652]
22 [8] take a different approach: they concentrate on motion saliency only and detect it by using temporal spectral analysis. [sent-78, score-0.631]
23 Our work differs from previous video saliency methods by narrowing the focus to a small number of candidate gaze locations, and learning conditional gaze transitions over time. [sent-81, score-1.825]
24 Motivation and overview Most previous saliency modeling methods calculate a saliency value for every pixel. [sent-83, score-1.133]
25 In our work we propose to calculate saliency at a small set of candidate locations, instead of at every pixel. [sent-84, score-0.787]
26 First, we observe that image saliency studies concentrate on a single image stimulus, without any prior. [sent-86, score-0.572]
27 This is usually achieved by “resetting” the participants’ gaze –presenting a black screen or a single target in the center. [sent-87, score-0.454]
28 Here, the gaze varies little between frames, and when it does change significantly it is highly constrained to local regions. [sent-89, score-0.454]
29 Our second observation is that when watching dynamic scenes people usually follow the action and the characters by shifting their gaze to a new interesting location in the scene. [sent-91, score-0.639]
30 Focusing on a sparse candidate set of salient locations allows us to model and learn these transitions explicitly with a relatively small computational effort. [sent-92, score-0.377]
31 To accommodate these observations our system consists of three phases: identifying candidate gaze locations at each frame (Section 4), extracting features for those locations (Section 5. [sent-93, score-0.859]
32 1) and learning or predicting gaze probabilities for each candidate (Section 5. [sent-94, score-0.677]
33 The static and semantic candidate locations are gener- ated separately for every video frame. [sent-103, score-0.507]
34 The motion candidates are computed using optical flow between neighboring pairs of frames, and therefore implicitly account for the dynamics in the video. [sent-104, score-0.427]
35 Static candidates Since a video is composed of individual frames we start with candidates that attract peoples’ attention due to static cues. [sent-108, score-0.848]
36 For a given frame of interest we calculate the graph-based visual saliency (GBVS), proposed by Harel et al. [sent-109, score-0.708]
37 We preferred GBVS over other image saliency methods for two main reasons: (i) it has been shown that GBVS accurately predicts human fixations in static images [3], and (ii) it is fast to calculate compared to more accurate methods [18]. [sent-111, score-0.944]
38 We hypothesize that other image saliency detection methods may be used instead. [sent-112, score-0.543]
39 Given the image saliency map we wish to find the most attractive candidate regions within it. [sent-113, score-0.833]
40 We treat the normalized saliency map as a distribution and use it to sample a large number of random points. [sent-114, score-0.582]
41 Finally, we estimate the covariance matrix of each candidate by fitting a Gaussian to the saliency map in the neighborhood of the candidate location. [sent-117, score-1.037]
42 Motion candidates Modeling the saliency in independent frames is insufficient for videos since it ignores the dynamics. [sent-126, score-0.887]
43 To produce motion candidates we first calculate the optical flow between consecutive frames [22]. [sent-129, score-0.483]
44 The motion candidates are created from the DoG map in the same way as the static candidates are created from the image saliency map (i. [sent-132, score-1.323]
45 Semantic candidates Finally we wish to add semantic candidates to our set. [sent-140, score-0.588]
46 The original frame is shown in gray (for visualization) It is overlaid with: (a) the GBVS saliency map and (b) optical flow magnitude. [sent-143, score-0.768]
47 Modeling gaze dynamics Having extracted a set of candidates we next wish to select the most salient one. [sent-165, score-0.818]
48 We accomplish this by learning transition probability the probability to shift from one gaze location in a source frame to a new one in a destination frame. [sent-166, score-1.106]
49 This transition is different from a saccade we are dealing with a shift of the entire distribution, while a saccade is a rapid movement of a gaze point. [sent-167, score-0.632]
50 This allows us to model the gaze dynamics in the video and predict the saliency more accurately. [sent-169, score-1.133]
51 Features – – To model changes in focus of attention we associate a feature vector with pairs of source and destination candidates in a given pair of frames. [sent-172, score-0.67]
52 The features can be categorized into two sets: destination frame features and inter-frame features. [sent-174, score-0.379]
53 We experimented with the use of source frame features as well, but found these features led to overfitting in the learning process, as they are only slightly different from the destination frame features. [sent-175, score-0.578]
54 It is important to note that all types of features are computed for all the destination candidates regardless of the type of the candidate. [sent-187, score-0.501]
55 Gaze transitions for training We pose the learning problem as classification: whether a gaze transition occurs from a given source candidate to a given target candidate. [sent-190, score-0.882]
56 To train such a classifier based on the features described in the previous section we need (i) to choose relevant pairs of frames, and (ii) to label positive and negative gaze transitions between these frames. [sent-191, score-0.555]
57 Since it takes 5 to 10 frames for humans to fixate on a new object of interest we set the destination frame 15 frames after the cut [13]. [sent-194, score-0.578]
58 This ensures that we will not learn from incomplete or partial gaze transitions. [sent-195, score-0.454]
59 Next, we need to obtain examples of positive and negative gaze transitions. [sent-198, score-0.454]
60 We take all pairs of source locations and destination candidates for our training set. [sent-204, score-0.653]
61 Pairs with a destination candidate near a focus of the destination frame are labeled as positive. [sent-205, score-0.837]
62 At the inference stage the trained model classifies every transition between source and destination candidates and provides a confidence value. [sent-217, score-0.657]
63 We use the normalized confidence as the transition probability P(d|si) – the transition probability tfrraonms tthioen source si tyo Pth(ed |csurrent destination candidate d. [sent-218, score-0.769]
64 The transition pairs are overlayed on the source (top) and destination (bottom) frames, together with source (magenta) and destination (yellow) gaze maps. [sent-222, score-1.268]
65 ndidate saliency and S is the set of all the sources. [sent-227, score-0.543]
66 Finally, we produce the saliency map in a similar fashion to how Gaussian mixture models are used to create a continuous distribution: we replace each candidate with a Gaussian of corresponding covariance and sum them up using the candidate saliency as weight. [sent-228, score-1.577]
67 Experimental validation In this section we experimentally validate the proposed video saliency detection method. [sent-230, score-0.675]
68 The dataset is provided together with gaze tracks of about 50 participants per video. [sent-233, score-0.454]
69 (b), (c) Example frames, together with human fixation points (green) and our extracted candidates (yellow). [sent-238, score-0.458]
70 Verification of the candidates First, we wish to demonstrate that human fixations can be modeled well by our limited candidate set. [sent-242, score-0.683]
71 To do so we count the number of candidate locations that are “close enough” to a fixation point. [sent-243, score-0.411]
72 This means that on most of the frames most of the fixations can be modeled well by our candidate set. [sent-249, score-0.433]
73 Since our method computes the probability to shift from a location in a source frame to a location in a destination frame, we calculate the video saliency in a sequential order. [sent-257, score-1.269]
74 For every following frame we compute transition 1 1 1 1 1 145 4 9 1 9 probability to its candidate set using the predicted saliency map from the previous frame as the source. [sent-259, score-1.116]
75 This method does not drift over time, since the transitions are largely independent of the source frame properties (recall that features of the source frame were excluded and the destination candidates are computed independently for each frame). [sent-260, score-0.974]
76 The first metric is the area-under-curve (AUC), which utilizes the receiver-operator curve to compute the similarity between human fixations and the predicted saliency map. [sent-264, score-0.759]
77 Since the AUC considers the saliency results only at the locations ofthe ground truth fixation points, it cannot distinguish well between a peaky saliency map and a smooth one. [sent-267, score-1.371]
78 The χ2 distance will prefer a peaky saliency map over a broad one, when comparing them to the tight distribution of the ground truth. [sent-270, score-0.614]
79 We convert the sparse ground truth fixation map, recorded by the gaze tracker, to a dense probability map by convolving it with a constant size Gaussian kernel. [sent-272, score-0.688]
80 We compare the proposed saliency prediction approach with five different methods. [sent-275, score-0.568]
81 The first, referred to as humans, serves as an upper bound for the saliency prediction and measures how much the fixation map explains itself. [sent-276, score-0.776]
82 We further compare our results to the image saliency approach of GBVS [12], and two video saliency methods PQFT [11] and the method of Hou and Zhang [14] (annotated in figures and tables as Hou for brevity). [sent-282, score-1.188]
83 Both methods are among the highest rated video saliency algorithms according to the recent benchmark of Borji et al. [sent-283, score-0.645]
84 Using χ2 further emphasizes the benefits of our approach: we produce a tight distribution that is more similar to the original gaze map. [sent-303, score-0.454]
85 We further visually compare our saliency maps to those of other methods. [sent-311, score-0.543]
86 As can be seen, the saliency maps produced 1 1 1 1 1 15 5 52 0 0 Table 1. [sent-313, score-0.543]
87 63 by the proposed method are more visually consistent with the shape, size, and location of the ground truth gaze map than the maps of the other methods. [sent-330, score-0.519]
88 Conclusions In this paper we proposed a novel method for video saliency prediction. [sent-338, score-0.645]
89 The method is substantially different from existing methods and uses a sparse candidate set to model the saliency map. [sent-339, score-0.74]
90 It is shown experimentally that using candidates boosts the accuracy of the saliency prediction and speeds up the algorithm. [sent-340, score-0.838]
91 Furthermore, the proposed method accounts for the temporal dimension of the video by learning the probability to shift between saliency locations. [sent-341, score-0.71]
92 When determining the motion candidates we filter out all regions with optical flow magnitude lower that 2 pixels. [sent-346, score-0.391]
93 d ×× When calculating static and motion features in the neighborhood of a candidate we use three different neighborhoods, sized 5 5, 9 9 and 17 17 pixels. [sent-357, score-0.396]
94 We thank the DIEM database for making the gaze tracking results publicly available. [sent-363, score-0.454]
95 Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. [sent-378, score-0.543]
96 Examples of saliency detection results using different methods show that the saliency predicted by the proposed method better approximates the human gaze map. [sent-418, score-1.589]
97 Spatio-temporal [12] [13] [14] [15] [16] [17] [18] [19] [20] saliency detection using phase spectrum of quaternion fourier transform. [sent-423, score-0.569]
98 Spatiotemporal saliency detection and its applications in static and dynamic scenes. [sent-465, score-0.693]
99 Clustering of gaze during dynamic scene viewing is predicted by motion. [sent-491, score-0.528]
100 Predicting human gaze using quaternion dct image signature saliency and face detection. [sent-496, score-1.096]
wordName wordTfidf (topN-words)
[('saliency', 0.543), ('gaze', 0.454), ('destination', 0.261), ('candidates', 0.24), ('candidate', 0.197), ('fixation', 0.169), ('fixations', 0.167), ('gbvs', 0.137), ('frame', 0.118), ('diem', 0.118), ('static', 0.109), ('video', 0.102), ('source', 0.081), ('watching', 0.078), ('transitions', 0.075), ('transition', 0.075), ('frames', 0.069), ('attention', 0.062), ('auc', 0.061), ('salient', 0.06), ('motion', 0.059), ('adobe', 0.058), ('borji', 0.056), ('semantic', 0.054), ('hou', 0.054), ('human', 0.049), ('calculate', 0.047), ('locations', 0.045), ('koch', 0.043), ('dynamic', 0.041), ('cues', 0.041), ('people', 0.04), ('shift', 0.039), ('map', 0.039), ('advertising', 0.039), ('blindness', 0.039), ('detectionheight', 0.039), ('foci', 0.039), ('mital', 0.039), ('pqft', 0.039), ('rudoy', 0.039), ('eye', 0.036), ('videos', 0.035), ('flow', 0.035), ('israel', 0.035), ('cg', 0.035), ('observers', 0.035), ('center', 0.035), ('body', 0.035), ('dynamics', 0.034), ('median', 0.034), ('detections', 0.034), ('viewing', 0.033), ('preceding', 0.033), ('optical', 0.033), ('saccade', 0.032), ('brush', 0.032), ('peaky', 0.032), ('faces', 0.031), ('psychology', 0.031), ('humans', 0.031), ('dog', 0.031), ('neighborhood', 0.031), ('harel', 0.03), ('treisman', 0.03), ('fixate', 0.03), ('haifa', 0.03), ('seattle', 0.03), ('covariance', 0.03), ('experimentally', 0.03), ('wish', 0.03), ('predicts', 0.029), ('concentrate', 0.029), ('overlayed', 0.029), ('goldstein', 0.029), ('create', 0.028), ('si', 0.028), ('mahadevan', 0.028), ('viewers', 0.028), ('look', 0.027), ('cognitive', 0.027), ('created', 0.027), ('cui', 0.027), ('technion', 0.027), ('edit', 0.027), ('probability', 0.026), ('predicting', 0.026), ('pairs', 0.026), ('quaternion', 0.026), ('attract', 0.026), ('location', 0.026), ('viewer', 0.025), ('judd', 0.025), ('gaussian', 0.025), ('prediction', 0.025), ('cue', 0.024), ('regions', 0.024), ('face', 0.024), ('add', 0.024), ('others', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
Author: Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor
Abstract: During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel methodfor video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previousframe. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.
2 0.45525673 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
3 0.45422661 202 cvpr-2013-Hierarchical Saliency Detection
Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia
Abstract: When dealing with objects with complex structures, saliency detection confronts a critical problem namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. This issue is common in natural images and forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. The final saliency map is produced in a hierarchical model. Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. Our approach improves saliency detection on many images that cannot be handled well traditionally. A new dataset is also constructed. –
4 0.44701529 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, backgroundness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.
Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin
Abstract: Driven by recent vision and graphics applications such as image segmentation and object recognition, assigning pixel-accurate saliency values to uniformly highlight foreground objects becomes increasingly critical. More often, such fine-grained saliency detection is also desired to have a fast runtime. Motivated by these, we propose a generic and fast computational framework called PISA Pixelwise Image Saliency Aggregating complementary saliency cues based on color and structure contrasts with spatial priors holistically. Overcoming the limitations of previous methods often using homogeneous superpixel-based and color contrast-only treatment, our PISA approach directly performs saliency modeling for each individual pixel and makes use of densely overlapping, feature-adaptive observations for saliency measure computation. We further impose a spatial prior term on each of the two contrast measures, which constrains pixels rendered salient to be compact and also centered in image domain. By fusing complementary contrast measures in such a pixelwise adaptive manner, the detection effectiveness is significantly boosted. Without requiring reliable region segmentation or post– relaxation, PISA exploits an efficient edge-aware image representation and filtering technique and produces spatially coherent yet detail-preserving saliency maps. Extensive experiments on three public datasets demonstrate PISA’s superior detection accuracy and competitive runtime speed over the state-of-the-arts approaches.
6 0.4366031 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
7 0.43014577 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
8 0.41249815 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
9 0.22963226 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images
10 0.20988998 418 cvpr-2013-Submodular Salient Region Detection
11 0.19682156 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.1958546 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes
13 0.18530862 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
14 0.11698215 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
15 0.11601181 263 cvpr-2013-Learning the Change for Automatic Image Cropping
16 0.11319776 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
17 0.10494483 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
18 0.10474902 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
19 0.10278564 406 cvpr-2013-Spatial Inference Machines
20 0.098880634 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes
topicId topicWeight
[(0, 0.227), (1, -0.194), (2, 0.486), (3, 0.175), (4, -0.152), (5, -0.035), (6, 0.016), (7, -0.082), (8, 0.102), (9, 0.061), (10, 0.002), (11, 0.031), (12, 0.014), (13, -0.028), (14, -0.014), (15, 0.045), (16, 0.045), (17, 0.022), (18, -0.085), (19, -0.055), (20, -0.086), (21, -0.048), (22, -0.004), (23, 0.015), (24, -0.026), (25, -0.023), (26, 0.026), (27, 0.003), (28, -0.001), (29, -0.029), (30, -0.033), (31, -0.012), (32, -0.02), (33, 0.019), (34, -0.026), (35, 0.032), (36, -0.019), (37, 0.092), (38, -0.078), (39, 0.033), (40, -0.081), (41, -0.006), (42, -0.062), (43, 0.048), (44, 0.019), (45, 0.004), (46, 0.025), (47, -0.061), (48, -0.008), (49, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.93252015 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
Author: Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor
Abstract: During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel methodfor video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previousframe. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.
2 0.89083278 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
Author: Long Mai, Yuzhen Niu, Feng Liu
Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.
Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin
Abstract: Driven by recent vision and graphics applications such as image segmentation and object recognition, assigning pixel-accurate saliency values to uniformly highlight foreground objects becomes increasingly critical. More often, such fine-grained saliency detection is also desired to have a fast runtime. Motivated by these, we propose a generic and fast computational framework called PISA Pixelwise Image Saliency Aggregating complementary saliency cues based on color and structure contrasts with spatial priors holistically. Overcoming the limitations of previous methods often using homogeneous superpixel-based and color contrast-only treatment, our PISA approach directly performs saliency modeling for each individual pixel and makes use of densely overlapping, feature-adaptive observations for saliency measure computation. We further impose a spatial prior term on each of the two contrast measures, which constrains pixels rendered salient to be compact and also centered in image domain. By fusing complementary contrast measures in such a pixelwise adaptive manner, the detection effectiveness is significantly boosted. Without requiring reliable region segmentation or post– relaxation, PISA exploits an efficient edge-aware image representation and filtering technique and produces spatially coherent yet detail-preserving saliency maps. Extensive experiments on three public datasets demonstrate PISA’s superior detection accuracy and competitive runtime speed over the state-of-the-arts approaches.
4 0.87839812 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, backgroundness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.
5 0.84813607 202 cvpr-2013-Hierarchical Saliency Detection
Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia
Abstract: When dealing with objects with complex structures, saliency detection confronts a critical problem namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. This issue is common in natural images and forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. The final saliency map is produced in a hierarchical model. Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. Our approach improves saliency detection on many images that cannot be handled well traditionally. A new dataset is also constructed. –
6 0.83966738 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images
7 0.82920671 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
8 0.76596373 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
9 0.6916765 418 cvpr-2013-Submodular Salient Region Detection
10 0.62613678 263 cvpr-2013-Learning the Change for Automatic Image Cropping
11 0.51170611 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
12 0.48793852 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
13 0.47776204 464 cvpr-2013-What Makes a Patch Distinct?
14 0.44250175 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
15 0.42296988 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes
16 0.40945306 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
17 0.39039418 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes
18 0.36330083 118 cvpr-2013-Detecting Pulse from Head Motions in Video
19 0.36276552 325 cvpr-2013-Part Discovery from Partial Correspondence
20 0.35794142 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition
topicId topicWeight
[(10, 0.107), (16, 0.018), (26, 0.036), (28, 0.02), (33, 0.274), (67, 0.137), (69, 0.033), (79, 0.193), (80, 0.01), (87, 0.082), (99, 0.014)]
simIndex simValue paperId paperTitle
Author: Margret Keuper, Thorsten Schmidt, Maja Temerinac-Ott, Jan Padeken, Patrick Heun, Olaf Ronneberger, Thomas Brox
Abstract: With volumetric data from widefield fluorescence microscopy, many emerging questions in biological and biomedical research are being investigated. Data can be recorded with high temporal resolution while the specimen is only exposed to a low amount of phototoxicity. These advantages come at the cost of strong recording blur caused by the infinitely extended point spread function (PSF). For widefield microscopy, its magnitude only decays with the square of the distance to the focal point and consists of an airy bessel pattern which is intricate to describe in the spatial domain. However, the Fourier transform of the incoherent PSF (denoted as Optical Transfer Function (OTF)) is well localized and smooth. In this paper, we present a blind -fre iburg .de Figure 1. As for widefield microscopy the convolution ofthe signal deconvolution method that improves results of state-of-theart deconvolution methods on widefield data by exploiting the properties of the widefield OTF.
2 0.87018806 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure
Author: Ming Zeng, Jiaxiang Zheng, Xuan Cheng, Xinguo Liu
Abstract: This paper presents a method for quasi-rigid objects modeling from a sequence of depth scans captured at different time instances. As quasi-rigid objects, such as human bodies, usually have shape motions during the capture procedure, it is difficult to reconstruct their geometries. We represent the shape motion by a deformation graph, and propose a model-to-partmethod to gradually integrate sampled points of depth scans into the deformation graph. Under an as-rigid-as-possible assumption, the model-to-part method can adjust the deformation graph non-rigidly, so as to avoid error accumulation in alignment, which also implicitly achieves loop-closure. To handle the drift and topological error for the deformation graph, two algorithms are introduced. First, we use a two-stage registration to largely keep the rigid motion part. Second, in the step of graph integration, we topology-adaptively integrate new parts and dynamically control the regularization effect of the deformation graph. We demonstrate the effectiveness and robustness of our method by several depth sequences of quasi-rigid objects, and an application in human shape modeling.
same-paper 3 0.86550575 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
Author: Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor
Abstract: During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel methodfor video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previousframe. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.
4 0.86101317 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
5 0.85963738 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.
6 0.85552347 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
7 0.85476577 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
8 0.85412961 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
9 0.85366136 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
10 0.85150516 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
11 0.85093677 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
12 0.84906858 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
13 0.84884536 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
14 0.84793329 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
15 0.84584874 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
16 0.84578705 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
17 0.84306526 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
18 0.84282106 438 cvpr-2013-Towards Pose Robust Face Recognition
19 0.84242618 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
20 0.84194577 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections