cvpr cvpr2013 cvpr2013-202 knowledge-graph by maker-knowledge-mining

202 cvpr-2013-Hierarchical Saliency Detection

Source: pdf

Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia

Abstract: When dealing with objects with complex structures, saliency detection confronts a critical problem namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. This issue is common in natural images and forms a fundamental challenge for prior methods. We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. The final saliency map is produced in a hierarchical model. Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. Our approach improves saliency detection on many images that cannot be handled well traditionally. A new dataset is also constructed. –

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk/ leo j ia /pro j ect s /hsal iency/ Abstract When dealing with objects with complex structures, saliency detection confronts a critical problem namely that detection accuracy could be adversely affected if salient foreground or background in an image contains small-scale high-contrast patterns. [sent-8, score-1.168]

2 We tackle it from a scale point of view and propose a multi-layer approach to analyze saliency cues. [sent-10, score-0.778]

3 The final saliency map is produced in a hierarchical model. [sent-11, score-0.869]

4 Different from varying patch sizes or downsizing images, our scale-based region handling is by finding saliency values optimally in a tree model. [sent-12, score-0.919]

5 Our approach improves saliency detection on many images that cannot be handled well traditionally. [sent-13, score-0.753]

6 Introduction Saliency detection, which is closely related to selective processing in human visual system [22], aims to locate important regions or objects in images. [sent-16, score-0.145]

7 Knowing where important regions are broadly benefits applications, including classification [24], retrieval [11] and object co-segmentation [3], for optimally allocating computation. [sent-18, score-0.149]

8 Stemming from psychological science [28, 22], the commonly adopted saliency definition is based on how pixels/regions stand out and is dependent of what kind of visual stimuli human respond to most. [sent-19, score-0.822]

9 Local methods [13, 10, 1, 15] rely on pixel/region difference in the vicinity, while global methods [2, 4, 23, 30] rely mainly on color uniqueness in terms of global statistics. [sent-21, score-0.145]

10 For the first two examples, the boards containing characters are salient foreground objects. [sent-26, score-0.275]

11 But they are actually part of the background when viewing the picture as a whole, confusing saliency detection. [sent-32, score-0.795]

12 These examples are not special, and exhibit one com- mon problem that is, when objects contain salient smallscale patterns, saliency could generally be misled by their complexity. [sent-33, score-0.958]

13 It easily turns extracting salient objects to finding cluttered fragments of local details, complicating detection and making results not usable in, for example, object recognition [29], where connected regions with reasonable sizes are favored. [sent-35, score-0.314]

14 Aiming to solve this notorious and universal problem, we propose a hierarchical model, to analyze saliency cues from multiple levels of structure, and then integrate them to infer the final saliency map. [sent-36, score-1.603]

15 Our model finds foundation from studies in psychology [20, 17], which show the selection process in human attention system operates from more than one levels, and the interaction between levels is more complex than a feed-forward scheme. [sent-37, score-0.173]

16 With our multi-level analysis and hierarchical inference, the model is able to deal with salient small-scale structure, so that salient objects are labeled more uniformly. [sent-38, score-0.484]

17 In addition, contributions in this paper also include 1) a new measure of region scales, which is compatible with human perception on object scales, and 2) construction of a new scene dataset, which contains challenging natural images for saliency detection. [sent-39, score-0.919]

18 Related Work Bottom-up saliency analysis generally follows locationand object-based attention formation [22]. [sent-42, score-0.802]

19 ods physically obtain human attention shift continuously with eye tracking, while the latter set of approaches aim to find salient objects from images. [sent-46, score-0.308]

20 A survey of human attention and saliency detection is provided in [27]. [sent-48, score-0.859]

21 [10] proposed a method to nonlinearly combine local uniqueness maps from different feature channels to concentrate conspicuity. [sent-53, score-0.087]

22 Global methods have their difficulty in distinguishing among similar colors in both foreground and background. [sent-68, score-0.093]

23 Note that assuming background is smooth could be invalid for many natural images, as explained in Section 1. [sent-70, score-0.107]

24 The concept of center bias that is, image center is more likely to contain salient objects than other regions was employed in [18, 14, 25, 30]. [sent-73, score-0.286]

25 Prior work does not consider the situation that locally smooth regions could be inside a salient object and global– – ly salient color, contrarily, could be from the background. [sent-75, score-0.55]

26 These difficulties boil down to the same type of problems and indicate that saliency is ambiguous in one single scale. [sent-76, score-0.725]

27 First, three image layers of different scales are extracted from the input. [sent-81, score-0.235]

28 2(c), are coarse representation of the input with different degrees of details, balanc1 1 1 1 1 15 5 56 4 4 (a) Input image (b) Final saliency map each of these layers. [sent-92, score-0.753]

29 The layer number is fixed to 3 in our experiments. [sent-97, score-0.188]

30 Specifically, we sort all regions in the initial map according to their scales in an ascending order. [sent-105, score-0.176]

31 If a region scale is below 3, we merge it to its nearest region, in terms of average CIELUV color distance, and update its scale. [sent-106, score-0.251]

32 We also update the color of the region as their average color. [sent-107, score-0.172]

33 After all regions are processed, we take the resulting region map as the bottom layer L1. [sent-108, score-0.427]

34 The middle and top layers L2 and L3 are generated similarly from L1 and L2 with larger scale thresholds. [sent-109, score-0.224]

35 In our experiment, we set thresholds for the three layers as {3, 17, 33} Figure4. [sent-110, score-0.171]

36 In this illustration, the scales of regions a and b are less than 5, and that of c is larger than 5. [sent-112, score-0.148]

37 Note a region in the middle or top layer embraces corresponding ones in the lower levels. [sent-117, score-0.315]

38 We use it for saliency inference described in Section 3. [sent-118, score-0.769]

39 2 Region Scale Definition In methods of [5, 7] and many others, the region size is measured by the number of pixels. [sent-122, score-0.127]

40 As shown in (c), all colors in R1 are updated compared to the input, indicating a scale smaller than 3. [sent-128, score-0.101]

41 sive experiments suggest this measure could be wildly inap- × propriate for processing and understanding general natural images. [sent-129, score-0.125]

42 In fact, a large pixel number does not necessarily correspond to a large-scale region in human perception. [sent-130, score-0.156]

43 But it is not regarded as a large region in human perception due to its high inhomogeneity. [sent-134, score-0.156]

44 With this fact, we define a new encompassment scale measure based on shape uniformities and use it to obtain region sizes in the merging process. [sent-136, score-0.281]

45 With this relation, we define the scale of region R as scale(R) = arg mtax{Rt×t|Rt×t ⊆ R}, (1) where Rt×t is a t t square region. [sent-141, score-0.18]

46 4, the scales of regions a and b are smaller than 5 while the scale of c is above it. [sent-143, score-0.201]

47 In fact, in the merging process in a level, we only need to know whether the scale of a region is below the given threshold t or not. [sent-148, score-0.218]

48 Single-Layer Saliency Cues For each layer we extract, saliency cues are applied to find important pixels from the perspectives of color, position and size. [sent-156, score-0.952]

49 Local contrast Image regions contrasting their surroundings are general eye-catching [4]. [sent-158, score-0.112]

50 We define the local contrast saliency cue for Ri in an image with a total ofn regions as a weighed sum of color difference from other regions: n Ci = ? [sent-159, score-0.893]

51 w(Rj)φ(i,j)||ci − cj||2, (2) j=1 where ci and cj are colors of regions Ri and Rj respectively. [sent-160, score-0.199]

52 φ(i, j) is set to exp{−D(Ri, Rj)/σ2} controlling the spatial distance influence between two regions iand j, where D(Ri, Rj) is a square of Euclidean distances between region centers of Ri and Rj . [sent-163, score-0.211]

53 Location heuristic Psychophysical study shows that human attention favors central regions [26]. [sent-171, score-0.19]

54 So pixels close to a natural image center could be salient in many cases. [sent-172, score-0.271]

55 2}, (3) where {x0, x1 · · · } is the set of pixel coordinates in region Ri, and xc is the coordinate of the image center. [sent-176, score-0.161]

56 After computing si for all layers, we obtain initial saliency maps separately, as demonstrated in Fig. [sent-180, score-0.764]

57 We propose a hierarchical inference procedure to fuse them for multi-scale saliency detection. [sent-182, score-0.849]

58 Hierarchical Inference Cue maps reveal saliency in different scales and could be quite different. [sent-185, score-0.859]

59 At the bottom level, small regions are produced while top layers contain large-scale structures. [sent-186, score-0.291]

60 Due to possible diversity, none of the single layer information is guaranteed to be perfect. [sent-187, score-0.188]

61 Also, it is hard to determine which layer is the best by heuristics. [sent-188, score-0.188]

62 Multi-layer fusion by naively averaging all maps is not a good choice, considering possibly complex background 1 1 1 1 1 15 5 58 6 6 (a)Input(b)CuemapatLayerL1(c)CuemapatLayerL2(d)CuemapatLayerL3(e)Finalsaliencymap Figure 6. [sent-189, score-0.149]

63 Saliency cue maps in three layers and our final saliency map. [sent-190, score-0.974]

64 On the other hand, in our region merging steps, a segment is guaranteed to be encompassed by the corresponding ones in upper levels. [sent-192, score-0.226]

65 We therefore resort to hierarchical inference based on a tree-structure graphical model. [sent-193, score-0.124]

66 For instance, the blue node j corresponds to the region marked in blue in (d). [sent-196, score-0.127]

67 For a node corresponding to region iin layer Ll, we define a saliency variable sil. [sent-199, score-1.04]

68 Data term ED (sli) is to gather separate saliency confidence, and hence is defined, for every node, as ED(sil) = βl||sil −¯ s il||22, (6) where βl controls the layer confidence and sil is the initial saliency value calculated in Eq. [sent-206, score-1.892]

69 If Ril and Rjl+1 are corresponding in two layers, we must have sjl+1) Ril ⊆ Rjl+1 based on our encompassment definition and the segment generation procedure. [sent-210, score-0.127]

70 The hierarchical term makes saliency assignment for corresponding regions in different levels similar, beneficial to effectively correcting initial saliency errors. [sent-212, score-1.648]

71 The bottom-up step updates variables sil in two neighboring layers by minimizing Eq. [sent-217, score-0.394]

72 (5), resulting in new saliency sil representation with regard sjl+1. [sent-218, score-0.948]

73 to the initial values sil and those of parent nodes This step brings information up in the tree model by progressively substituting high-level variables for low-level ones. [sent-219, score-0.223]

74 In each layer, since there is already a minimum energy representation obtained in the previous step, we optimize it to get new saliency values. [sent-221, score-0.725]

75 After all variables sjl are updated in a top-down fashion, we obtain the final saliency map in L1. [sent-222, score-0.892]

76 6 where separate layers in (b)-(d) miss out either large- or small-scale structures. [sent-224, score-0.171]

77 Our result in (e) contains information from all scales, making the saliency map better than any of the single-layer ones. [sent-225, score-0.753]

78 In fact, solving the three layer hierarchical model via belief propagation is equivalent to applying a weighted average to all single-layer saliency cue maps. [sent-226, score-1.06]

79 Our method differs from naive multi-layer fusion by selecting weights optimally for each region in hierarchical inference instead of global weighting. [sent-227, score-0.316]

80 The computationally most expensive part is extraction of image layers with different scale parameters, which is also the core of our algorithm. [sent-233, score-0.224]

81 MSRA-1000 [2] and 5000 Datasets [18] We first test our method on the saliency datasets MSRA1000 [2] and MSRA-5000 [18] where MSRA-1000 is a subset of MSRA-5000 and contains 1000 natural images with their corresponding ground truth masks. [sent-236, score-0.763]

82 For LC, MZ, SF and LR, we directly use author-provided saliency results. [sent-255, score-0.725]

83 Our experiment follows the setting in [2, 4], where saliency maps are binarized at each possible threshold within range [0, 255] . [sent-262, score-0.764]

84 It is because combining saliency information from three scales makes background generally have low saliency values. [sent-264, score-1.552]

85 Only sufficiently salient objects can be detected in this case. [sent-265, score-0.202]

86 On these difficult examples, our method can still produce reasonable saliency maps. [sent-283, score-0.725]

87 The difference between our method and others is clear, manifesting the importance to capture hierarchical saliency in a computationally feasible framework. [sent-292, score-0.805]

88 (4) in different layers, as well as the average of them, as the saliency values and evaluate how they work respectively when applied to our CSSD image data. [sent-298, score-0.725]

89 Result from layer L1 is the worst since it contains many small structures. [sent-301, score-0.188]

90 Results in the other two layer with larger-scale regions perform better, but still contain various problems related to scale determination. [sent-302, score-0.325]

91 The result by naively averaging the three single-layer maps is also worse than our final one produced by optimal inference. [sent-303, score-0.147]

92 Concluding Remarks We have tackled a fundamental problem that small-scale structures would adversely affect salient detection. [sent-305, score-0.274]

93 In order to obtain a uniformly high-response salien- cy map, we propose a hierarchical framework that infers importance values from three image layers in different scales. [sent-307, score-0.251]

94 Our proposed method achieves high performance and broadens the feasibility to apply saliency detection to more 1 1 1 1 1 165 561 9 9 Precision-Recall 1 Recall Figure 11. [sent-308, score-0.753]

95 Content based image retrieval us- [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] ing color boosted salient points and shape features of an image. [sent-401, score-0.247]

96 Center-surround divergence of feature statistics for salient object detection. [sent-430, score-0.202]

97 Directing attention to locations and to sensory modalities: Multiple levels of selective processing revealed with pet. [sent-465, score-0.171]

98 Saliency filters: Contrast based filtering for salient region detection. [sent-483, score-0.329]

99 A unified approach to salient object detection via low rank matrix recovery. [sent-494, score-0.23]

100 Visual attention detection in video sequences using spatiotemporal cues. [sent-534, score-0.105]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('saliency', 0.725), ('sil', 0.223), ('salient', 0.202), ('layer', 0.188), ('layers', 0.171), ('sjl', 0.139), ('region', 0.127), ('rj', 0.105), ('regions', 0.084), ('hierarchical', 0.08), ('attention', 0.077), ('sli', 0.067), ('rc', 0.066), ('mz', 0.064), ('scales', 0.064), ('cssd', 0.063), ('encompassment', 0.063), ('rjl', 0.063), ('hc', 0.057), ('scale', 0.053), ('psychophysical', 0.052), ('contrarily', 0.052), ('ril', 0.052), ('achanta', 0.051), ('harel', 0.049), ('zhai', 0.049), ('colors', 0.048), ('ri', 0.048), ('uniqueness', 0.048), ('lr', 0.045), ('color', 0.045), ('foreground', 0.045), ('perazzi', 0.044), ('inference', 0.044), ('lc', 0.044), ('adversely', 0.043), ('naively', 0.042), ('gb', 0.041), ('ci', 0.041), ('pages', 0.041), ('estrada', 0.04), ('sf', 0.04), ('optimally', 0.039), ('cuhk', 0.039), ('cue', 0.039), ('cues', 0.039), ('maps', 0.039), ('natural', 0.038), ('background', 0.038), ('merging', 0.038), ('hi', 0.038), ('stand', 0.037), ('recall', 0.037), ('produced', 0.036), ('itti', 0.036), ('koch', 0.034), ('xc', 0.034), ('levels', 0.034), ('rt', 0.034), ('psychology', 0.033), ('segment', 0.033), ('viewing', 0.032), ('selective', 0.032), ('controls', 0.031), ('definition', 0.031), ('could', 0.031), ('es', 0.03), ('averaging', 0.03), ('hong', 0.029), ('kong', 0.029), ('human', 0.029), ('structures', 0.029), ('propagation', 0.028), ('map', 0.028), ('detection', 0.028), ('abbreviations', 0.028), ('directing', 0.028), ('raecllal', 0.028), ('boards', 0.028), ('downsizing', 0.028), ('confronts', 0.028), ('contrasting', 0.028), ('encompassed', 0.028), ('jianping', 0.028), ('mtax', 0.028), ('propriate', 0.028), ('sensory', 0.028), ('wildly', 0.028), ('wils', 0.028), ('ft', 0.028), ('cheng', 0.027), ('plotted', 0.027), ('global', 0.026), ('merge', 0.026), ('cj', 0.026), ('kschischang', 0.026), ('photons', 0.026), ('prentice', 0.026), ('riesenhuber', 0.026), ('allocating', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 202 cvpr-2013-Hierarchical Saliency Detection

Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia

2 0.65686935 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking

Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang

Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.

3 0.63533294 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach

Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li

Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, backgroundness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.

4 0.62818891 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin

Abstract: Driven by recent vision and graphics applications such as image segmentation and object recognition, assigning pixel-accurate saliency values to uniformly highlight foreground objects becomes increasingly critical. More often, such fine-grained saliency detection is also desired to have a fast runtime. Motivated by these, we propose a generic and fast computational framework called PISA Pixelwise Image Saliency Aggregating complementary saliency cues based on color and structure contrasts with spatial priors holistically. Overcoming the limitations of previous methods often using homogeneous superpixel-based and color contrast-only treatment, our PISA approach directly performs saliency modeling for each individual pixel and makes use of densely overlapping, feature-adaptive observations for saliency measure computation. We further impose a spatial prior term on each of the two contrast measures, which constrains pixels rendered salient to be compact and also centered in image domain. By fusing complementary contrast measures in such a pixelwise adaptive manner, the detection effectiveness is significantly boosted. Without requiring reliable region segmentation or post– relaxation, PISA exploits an efficient edge-aware image representation and filtering technique and produces spatially coherent yet detail-preserving saliency maps. Extensive experiments on three public datasets demonstrate PISA’s superior detection accuracy and competitive runtime speed over the state-of-the-arts approaches.

5 0.58416331 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach

Author: Long Mai, Yuzhen Niu, Feng Liu

Abstract: A variety of methods have been developed for visual saliency analysis. These methods often complement each other. This paper addresses the problem of aggregating various saliency analysis methods such that the aggregation result outperforms each individual one. We have two major observations. First, different methods perform differently in saliency analysis. Second, the performance of a saliency analysis method varies with individual images. Our idea is to use data-driven approaches to saliency aggregation that appropriately consider the performance gaps among individual methods and the performance dependence of each method on individual images. This paper discusses various data-driven approaches and finds that the image-dependent aggregation method works best. Specifically, our method uses a Conditional Random Field (CRF) framework for saliency aggregation that not only models the contribution from individual saliency map but also the interaction between neighboringpixels. To account for the dependence of aggregation on an individual image, our approach selects a subset of images similar to the input image from a training data set and trains the CRF aggregation model only using this subset instead of the whole training set. Our experiments on public saliency benchmarks show that our aggregation method outperforms each individual saliency method and is robust with the selection of aggregated methods.

6 0.56633204 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection

7 0.45422661 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection

8 0.34541777 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images

9 0.33659682 418 cvpr-2013-Submodular Salient Region Detection

10 0.29007846 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images

11 0.23665833 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes

12 0.22714734 325 cvpr-2013-Part Discovery from Partial Correspondence

13 0.17021012 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

14 0.15771429 263 cvpr-2013-Learning the Change for Automatic Image Cropping

15 0.1539212 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

16 0.1462187 464 cvpr-2013-What Makes a Patch Distinct?

17 0.11081063 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras

18 0.09964294 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

19 0.084157832 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition

20 0.083501957 357 cvpr-2013-Revisiting Depth Layers from Occlusions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.236), (1, -0.23), (2, 0.64), (3, 0.341), (4, -0.136), (5, -0.035), (6, 0.015), (7, -0.056), (8, 0.074), (9, 0.03), (10, -0.016), (11, 0.033), (12, -0.029), (13, -0.017), (14, 0.001), (15, 0.061), (16, -0.027), (17, 0.043), (18, 0.008), (19, -0.001), (20, -0.011), (21, -0.028), (22, -0.026), (23, -0.012), (24, -0.007), (25, 0.001), (26, 0.002), (27, -0.003), (28, 0.017), (29, 0.03), (30, 0.012), (31, 0.009), (32, 0.001), (33, 0.004), (34, -0.017), (35, -0.024), (36, 0.005), (37, 0.002), (38, 0.055), (39, -0.011), (40, -0.033), (41, 0.008), (42, 0.014), (43, 0.002), (44, -0.017), (45, 0.025), (46, 0.004), (47, 0.024), (48, 0.002), (49, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97232461 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach

Author: Long Mai, Yuzhen Niu, Feng Liu

2 0.96458614 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

Author: Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin

3 0.95782661 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach

Author: Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, Shipeng Li

same-paper 4 0.94956005 202 cvpr-2013-Hierarchical Saliency Detection

Author: Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia

5 0.93022418 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking

Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang

6 0.90585971 411 cvpr-2013-Statistical Textural Distinctiveness for Salient Region Detection in Natural Images

7 0.83500093 418 cvpr-2013-Submodular Salient Region Detection

8 0.8145805 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection

9 0.79721898 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection

10 0.62800026 263 cvpr-2013-Learning the Change for Automatic Image Cropping

11 0.55270714 464 cvpr-2013-What Makes a Patch Distinct?

12 0.54522836 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images

13 0.38790944 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images

14 0.38097978 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes

15 0.35848239 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling

16 0.33438712 325 cvpr-2013-Part Discovery from Partial Correspondence

17 0.29569465 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors

18 0.28885627 321 cvpr-2013-PDM-ENLOR: Learning Ensemble of Local PDM-Based Regressions

19 0.28578091 264 cvpr-2013-Learning to Detect Partially Overlapping Instances

20 0.27990612 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.13), (16, 0.042), (26, 0.057), (33, 0.255), (67, 0.151), (69, 0.043), (80, 0.012), (87, 0.081), (89, 0.141)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90428799 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson

Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.

2 0.89746869 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.

3 0.89348519 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

4 0.89344919 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

Author: Jianguo Li, Yimin Zhang

Abstract: This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the wellknown Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the stateof-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.

5 0.89218366 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

6 0.89087284 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

7 0.89038938 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

8 0.89038044 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

same-paper 9 0.88905346 202 cvpr-2013-Hierarchical Saliency Detection

10 0.88864422 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

11 0.888246 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

12 0.88701802 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

13 0.88592201 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

14 0.88538307 235 cvpr-2013-Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines

15 0.88535804 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search

16 0.88087028 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

17 0.88078159 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking

18 0.8801285 414 cvpr-2013-Structure Preserving Object Tracking

19 0.87994462 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

20 0.87953812 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation