cvpr cvpr2013 cvpr2013-148 knowledge-graph by maker-knowledge-mining

148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes


Source: pdf

Author: Xiaobo Ren, Tony X. Han, Zhihai He

Abstract: We consider video object cut as an ensemble of framelevel background-foreground object classifiers which fuses information across frames and refine their segmentation results in a collaborative and iterative manner. Our approach addresses the challenging issues of modeling of background with dynamic textures and segmentation of foreground objects from cluttered scenes. We construct patch-level bagof-words background models to effectively capture the background motion and texture dynamics. We propose a foreground salience graph (FSG) to characterize the similarity of an image patch to the bag-of-words background models in the temporal domain and to neighboring image patches in the spatial domain. We incorporate this similarity information into a graph-cut energy minimization framework for foreground object segmentation. The background-foreground classification results at neighboring frames are fused together to construct a foreground probability map to update the graph weights. The resulting object shapes at neighboring frames are also used as constraints to guide the energy minimization process during graph cut. Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes, including the new Change Detection Challenge Dataset, demonstrate that the proposed ensemble video object cut method outperforms various state-ofthe-art algorithms.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 mi Abstract We consider video object cut as an ensemble of framelevel background-foreground object classifiers which fuses information across frames and refine their segmentation results in a collaborative and iterative manner. [sent-4, score-0.822]

2 Our approach addresses the challenging issues of modeling of background with dynamic textures and segmentation of foreground objects from cluttered scenes. [sent-5, score-0.821]

3 We construct patch-level bagof-words background models to effectively capture the background motion and texture dynamics. [sent-6, score-0.554]

4 We propose a foreground salience graph (FSG) to characterize the similarity of an image patch to the bag-of-words background models in the temporal domain and to neighboring image patches in the spatial domain. [sent-7, score-1.632]

5 We incorporate this similarity information into a graph-cut energy minimization framework for foreground object segmentation. [sent-8, score-0.46]

6 The background-foreground classification results at neighboring frames are fused together to construct a foreground probability map to update the graph weights. [sent-9, score-0.578]

7 The resulting object shapes at neighboring frames are also used as constraints to guide the energy minimization process during graph cut. [sent-10, score-0.365]

8 Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes, including the new Change Detection Challenge Dataset, demonstrate that the proposed ensemble video object cut method outperforms various state-ofthe-art algorithms. [sent-11, score-0.673]

9 Introduction Detecting and segmenting moving objects from the background is the enabling step in intelligent video analysis [23, 11]. [sent-13, score-0.333]

10 A number of methods and algorithms have been developed for background subtraction and moving object detection [15, 23]. [sent-14, score-0.439]

11 However, accurate and reliable moving object detection from cluttered and highly dynamic background remains as a challenging problem. [sent-15, score-0.547]

12 These types of scenes are usually cluttered and dynamic with swaying trees, ripping water, moving shadows and sun spots, rain, etc. [sent-20, score-0.272]

13 The key challenge here is how to establish effective models to capture the complex background motion and texture dynamics. [sent-21, score-0.334]

14 In this work, we consider video object segmentation as an ensemble of frame-level background-foreground object classifiers which fuses information across frames and refine their segmentation results in a collaborative and iterative manner, as illustrated in Fig. [sent-22, score-0.745]

15 Our approach integrates patch-level local background modeling with bags of words, region-level foreground object segmentation with graph cuts, and temporal domain information fusion among foreground-background classifiers at neighboring frames. [sent-24, score-0.998]

16 Sections 2 and 5 present the bag-of-words background models, foreground salience map, and our graph-cut algorithm for foreground object segmentation. [sent-30, score-1.459]

17 Related Work There is a significant body of research conducted during the past two decades on background modeling and foreground object detection. [sent-34, score-0.591]

18 Early work on background subtraction operated on the assumption of stationary background [26]. [sent-36, score-0.518]

19 To handle motion in the background, methods with pixel-level motion matching and background model relaxation within the pixel neighborhood have been investigated. [sent-37, score-0.328]

20 For example, a non-parametric technique was proposed in [4] for estimating background probabilities using Kernel density functions. [sent-38, score-0.211]

21 explicitly addressed the issue of background subtraction in a non-stationary scene by introducing the concept of a spatial distribution of Gaussians (SDG) [20]. [sent-41, score-0.307]

22 Considering spatial context and neighborhood constraints, graph cut optimization has achieved fairly good performance in image segmentation [2]. [sent-46, score-0.394]

23 Iterated graph cut is used in [21] to search over a nonlocal parameter space. [sent-47, score-0.252]

24 Background cut is proposed in [24] which combines background subtraction and color/contrast based models. [sent-48, score-0.485]

25 In this work, we propose to establish a new framework which tightly integrates these three important components for accurate and robust video object cut in highly dynamic scenes. [sent-50, score-0.471]

26 Overview of the Proposed Approach The basic flow of the proposed ensemble video object cut method is shown in Fig. [sent-52, score-0.453]

27 We first scan the image sequence, perform initial background-foreground image patch classification, and construct bag-of-words (BoW) background models with Histogram of Oriented Gradients (HOG) features. [sent-54, score-0.412]

28 This BoW model is able to capture the background motion and texture dynamics at each image location. [sent-55, score-0.303]

29 To segment the foreground object, for each image patch, we develop features to describe its texture and neighborhood image characteristics. [sent-56, score-0.397]

30 Based on the BoW background models, we analyze its temporal salience. [sent-57, score-0.314]

31 We also compare the image patch to its neighborhood patches to form the spatial salience measure. [sent-58, score-0.878]

32 Based on this spatiotemporal salience analysis results, we construct the fore- ground salience graph. [sent-59, score-1.281]

33 We then apply the graph-cut energy minimization method to obtain the foreground segmentation. [sent-60, score-0.417]

34 These background-foreground classification results of neighboring frames are fused together to further update the weights of the foreground salience graph. [sent-61, score-1.036]

35 Shape prior information is extracted from the detected foreground objects and used as constraints to guide the graph-cut energy minimization procedure. [sent-62, score-0.417]

36 This classification-fusion-refinement procedure is performed in an iterative manner to achieve the final video object segmentation results. [sent-63, score-0.257]

37 Most of the existing background models are constructed at the pixel level [4, 26, 10, 11]. [sent-64, score-0.211]

38 In this work, we propose to develop background models at the patch level using BoW features. [sent-67, score-0.372]

39 Foreground Salience Graph and Graph Cut In foreground object detection, we need to detect those image patches which are salient in comparison with background models on both appearance and texture dynamics. [sent-76, score-0.685]

40 In this work, we propose to construct a foreground salience graph (FSG) to characterize the salience of an image patch in the spatiotemporal domain. [sent-77, score-1.855]

41 We will then formulate the object segmentation as an energy minimization problem which can be solved using the graph cut method. [sent-78, score-0.511]

42 Foreground Salience Graph The FSG consists of two components: temporal salience and spatial salience. [sent-81, score-0.704]

43 The temporal salience measures the dis-similarity between the current image patch P(x,y) and the background model. [sent-84, score-1.076]

44 Let d(P(x,y) , ) be Pk(x,y) the feature distance between the current image patch and co-located background image patch at frame k. [sent-85, score-0.533]

45 i(gg((i ) − + h h((i ) )2, (1) where g and h are the BoW histogram features describing image patch P(x,y) and its co-located background image patch respectively. [sent-87, score-0.533]

46 The temporal salience at location (x, y) is the defined as Wk(x,y), Dt(P(x,y)) = Dt(x, y) = mkind(P(x,y), Pk(x,y)). [sent-88, score-0.704]

47 2 shows temporal salience maps for four example frames from the Camera Trap dataset. [sent-90, score-0.769]

48 The camera data contains very challenging videos with highly dynamic scenes with large tree waving motion, strong moving shadow and sunlight spots. [sent-91, score-0.32]

49 Here, red and blues pixels represent image patches with large and small temporal salience values, respectively. [sent-93, score-0.779]

50 We can see that the bag-of-words background model and the temporal salience map are able to efficiently characterize the complex background motion Spatial salience map. [sent-94, score-1.802]

51 As discussed in Section 3, to achieve consistently accurate and reliable foreground object segmentation, we need to consider the spatial context of the image patch and the image characteristics in its spa- tial neighborhood. [sent-95, score-0.506]

52 To form the spatial salience measure between two neighboring image patches, P(x,y) and , we analyze both color and texture information. [sent-97, score-0.723]

53 ) era Trap dataset; the second row: temporal salience maps with red and blue pixels representing large and small temporal salience values, respectively. [sent-101, score-1.408]

54 The spatial salience measure between patches P(x,y) and Q(x? [sent-110, score-0.676]

55 (3) To effectively differentiate the background and foreground textures, we propose to modulate this spatial salience with LBP texture weights. [sent-115, score-1.168]

56 s T thheis HLaBmPm tienxgtur dei weight aeitmwse eton find a balance between effectively differentiating the foreground and background image textures and accomondating background motions. [sent-120, score-0.724]

57 Based on the temporal and spatial salience measures, we can then construct the foreground salience graph. [sent-122, score-1.647]

58 In addition, for background-foreground segmentation purposes, we also introduce two terminal nodes, the foreground t and background nodes s. [sent-127, score-0.646]

59 The cor111999444977 Figure 3: (a) Given an intensity image, at a given location, we can find the optimum angle θ which maximize the distance between the histograms of oriented gradients in these two half discs; (b) LBP texture weights; (c) foreground salience graph. [sent-130, score-0.989]

60 We formulate the foreground object segmentation as a graph-cut energy minimization problem, which aims to minimize the following global energy function: E(X) = ? [sent-133, score-0.628]

61 The T-link energy provides an initial assessment if the image patch belongs to the background or foreground, which is defined based on the temporal salience measure: Ep(xp) =? [sent-143, score-1.143]

62 ED (xp, xq) captures the discontinuity between two neighboring patches in the temporal salience map, which is defined as ED(xp, xq) = γ · e−βD·|Dt(p)−Dt(q)|, (7) where βD is a normalization term βD = [? [sent-146, score-0.847]

63 The output of this graphcut minimization procedure will be the foreground object segmentation. [sent-151, score-0.393]

64 Iterative Ensemable Video Object Cut We recognize that, in cluttered scenes, the initial segmentation often yields incorrect segmentation and object contours. [sent-153, score-0.34]

65 For example, in the Camera-trap dataset, we find that some parts of the animal body are well segmented in some video frames but poorly segmented in other frames since the foreground object has moved to different background regions. [sent-154, score-0.761]

66 Motivated by this observation, we propose to consider the problem of video object cut as an ensemble offrame-level foreground-background classifiers, which share and fuse the classification information across frames, helping each other to refine the segmentation in an iterative manner. [sent-155, score-0.595]

67 More specifically, with the new background masks, we remove those false background image patches from the background model. [sent-160, score-0.739]

68 For each image patch P at location xP, we update its minimum distance Dt (P) to all background image patches in the model using (2). [sent-161, score-0.447]

69 With the new foreground masks, we can also construct bag-of-words models for the foreground objects. [sent-162, score-0.644]

70 Following the procedure in Section 2, we can then define the foreground temporal salience measure Dft (P) for a given image patch P in the current frame, which will measure the similarity between the current patch and detected foreground patches. [sent-163, score-1.63]

71 We define the foreground probability map as γ(P) = 1 − e−Dt(P)/Dft(P), (11) which measures the probability of P to be a foreground patch. [sent-164, score-0.662]

72 Foreground Shape Priors The graph cut (or s/t cut) problem can be solved by finding a maximum flow from the source s (background) to the destination t (foreground). [sent-168, score-0.321]

73 The set of saturation edges corresponds to the final graph cut result [6]. [sent-175, score-0.329]

74 In our case, the object contour obtained from graph cut segmentation will run across these saturation edges. [sent-176, score-0.473]

75 To address this issue, we propose to use the object segmentation results from other frames, extract shape prior information of the foreground object, and use this information to guide the graph cut algorithm. [sent-180, score-0.698]

76 (14) This probability p(θ) attempts to predict the shape segment orientation based on the foreground object shapes segmented from other frames. [sent-185, score-0.374]

77 If the residual capacity of this edge in the residual graph is below a certain threshold, we can impose early termination of the augmenting process and set this edge as the saturation edge. [sent-187, score-0.273]

78 In our on-going work on automated large-scale wildlife monitoring, we have collected over 1million camera-trap images of wildlife species. [sent-195, score-0.266]

79 These are very challenging videos with highly cluttered and dynamic wooded scenes. [sent-197, score-0.281]

80 The numInb eoru ro fe patches utss,ed w efo urs background modeling ranges from 2560 to 7680 depending on the video size. [sent-200, score-0.396]

81 We use a group of 10-15 video frames as a segmentation unit for ensemble video object cut. [sent-202, score-0.486]

82 Quantitative Evaluations In this section, we provide quantitative evaluations measured using the F-score: F =2 ×rec raecllal +l × pr pecriecsiisoinon=2TP +2 FTNP + FP, 111999445 919 where TP stands for true positive, FP stands for false positive, FN stands for false negative [13]. [sent-205, score-0.227]

83 This is because our method is able to effectively capture and model the highly dynamic background motion and to accurately locate the object boundary by sharing and fusing foreground-background classification information between frames. [sent-220, score-0.435]

84 Qualitative Evaluations In this section, we provide qualitative evaluations of our ensemble video object cut method and performance comparisons with other state-of-the-art methods in the literature. [sent-223, score-0.507]

85 We can see that both the Bayesian modeling approach and our approach are able to accurately model the background water motion. [sent-226, score-0.283]

86 However, their method tends to under-segment the foreground person with the low-contrast waist and hair areas being classified as background. [sent-227, score-0.302]

87 We can see that the proposed method yields more accurate and robust foreground object detection and segmentation than these two methods. [sent-234, score-0.488]

88 7 shows how our ensemble video cut method is able to refine the segmentation results in an iterative manner by sharing and fusing foreground-background classification information between frames. [sent-236, score-0.552]

89 Conclusion In this work, we have successfully developed a video object segmentation scheme for highly dynamic and cluttered scenes. [sent-238, score-0.423]

90 Our approach integrates patch-level local background modeling with bags of words, region-level foreground object segmentation with graph cuts, and temporal domain information fusion among foreground-background classifiers at neighboring frames. [sent-239, score-0.998]

91 We constructed patchlevel bag-of-words background models to effectively capture the background motion and texture dynamics. [sent-240, score-0.514]

92 We have developed a foreground salience graph (FSG) to characterize the similarity of an image patch to the bag-of-words background models in the temporal domain and to neighboring image patches in the spatial domain. [sent-241, score-1.632]

93 We incorporated this similarity information into a graph-cut energy minimization framework for foreground object segmentation. [sent-242, score-0.46]

94 Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes, including the Change Detection Challenge Dataset, demonstrated that the proposed ensemble video object cut method outperforms various state-of-the-art algorithms. [sent-383, score-0.673]

95 111999555311 Figure 7: Iterative ensemble video obejct cut; the first row: the original video frames; the second row, the segmentation results after the first iteration; the third row, the segmenta- tion results after the first iteration; the fourth row, the final segmentation results. [sent-398, score-0.479]

96 A texture-based method for modeling the background and detecting moving objects. [sent-439, score-0.293]

97 Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes. [sent-479, score-0.307]

98 Evaluation report of integrated background modeling based on spatio-temporal features. [sent-507, score-0.246]

99 Improving foreground segmentations with probabilistic superpixel markov random fields. [sent-533, score-0.302]

100 Segmenting foreground objects from a dynamic textured background via a robust kalman filter. [sent-565, score-0.624]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('salience', 0.601), ('foreground', 0.302), ('background', 0.211), ('cut', 0.178), ('patch', 0.161), ('xp', 0.142), ('wildlife', 0.133), ('avg', 0.132), ('ensemble', 0.127), ('fountain', 0.118), ('lbp', 0.116), ('dynamic', 0.111), ('fsg', 0.106), ('temporal', 0.103), ('segmentation', 0.101), ('subtraction', 0.096), ('xq', 0.079), ('saturation', 0.077), ('videos', 0.077), ('video', 0.075), ('patches', 0.075), ('graph', 0.074), ('trap', 0.071), ('neighboring', 0.068), ('energy', 0.067), ('frames', 0.065), ('cluttered', 0.061), ('dt', 0.061), ('change', 0.058), ('bow', 0.057), ('fn', 0.056), ('feedback', 0.055), ('evaluations', 0.054), ('texture', 0.054), ('airporthall', 0.053), ('escalator', 0.053), ('fnr', 0.053), ('lbpk', 0.053), ('lobby', 0.053), ('ouri', 0.053), ('pietikinen', 0.053), ('pkde', 0.053), ('pkdewm', 0.053), ('pwc', 0.053), ('qch', 0.053), ('sobs', 0.053), ('watersurface', 0.053), ('scenes', 0.053), ('fuses', 0.053), ('minimization', 0.048), ('vibe', 0.047), ('curtain', 0.047), ('moving', 0.047), ('workshop', 0.045), ('fpr', 0.044), ('monitoring', 0.043), ('object', 0.043), ('augmenting', 0.042), ('detection', 0.042), ('neighborhood', 0.041), ('residual', 0.04), ('construct', 0.04), ('destination', 0.039), ('spatiotemporal', 0.039), ('motion', 0.038), ('iterative', 0.038), ('mahadevan', 0.038), ('heikkil', 0.038), ('water', 0.037), ('pk', 0.037), ('collaborative', 0.037), ('stands', 0.037), ('characterize', 0.037), ('kde', 0.036), ('modeling', 0.035), ('incorrect', 0.034), ('row', 0.034), ('specificity', 0.034), ('harwood', 0.034), ('liao', 0.034), ('wh', 0.034), ('qk', 0.033), ('refine', 0.033), ('ford', 0.032), ('terminal', 0.032), ('integrates', 0.032), ('highly', 0.032), ('angle', 0.032), ('diverse', 0.032), ('dh', 0.032), ('pn', 0.032), ('ep', 0.031), ('false', 0.031), ('challenge', 0.031), ('comparisons', 0.03), ('ren', 0.03), ('ds', 0.03), ('flow', 0.03), ('probability', 0.029), ('classifiers', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000012 148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes

Author: Xiaobo Ren, Tony X. Han, Zhihai He

Abstract: We consider video object cut as an ensemble of framelevel background-foreground object classifiers which fuses information across frames and refine their segmentation results in a collaborative and iterative manner. Our approach addresses the challenging issues of modeling of background with dynamic textures and segmentation of foreground objects from cluttered scenes. We construct patch-level bagof-words background models to effectively capture the background motion and texture dynamics. We propose a foreground salience graph (FSG) to characterize the similarity of an image patch to the bag-of-words background models in the temporal domain and to neighboring image patches in the spatial domain. We incorporate this similarity information into a graph-cut energy minimization framework for foreground object segmentation. The background-foreground classification results at neighboring frames are fused together to construct a foreground probability map to update the graph weights. The resulting object shapes at neighboring frames are also used as constraints to guide the energy minimization process during graph cut. Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes, including the new Change Detection Challenge Dataset, demonstrate that the proposed ensemble video object cut method outperforms various state-ofthe-art algorithms.

2 0.4881548 451 cvpr-2013-Unsupervised Salience Learning for Person Re-identification

Author: Rui Zhao, Wanli Ouyang, Xiaogang Wang

Abstract: Human eyes can recognize person identities based on some small salient regions. However, such valuable salient information is often hidden when computing similarities of images with existing approaches. Moreover, many existing approaches learn discriminative features and handle drastic viewpoint change in a supervised way and require labeling new training data for a different pair of camera views. In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning. Distinctive features are extracted without requiring identity labels in the training procedure. First, we apply adjacency constrained patch matching to build dense correspondence between image pairs, which shows effectiveness in handling misalignment caused by large viewpoint and pose variations. Second, we learn human salience in an unsupervised manner. To improve the performance of person re-identification, human salience is incorporated in patch matching to find reliable and discriminative matched patches. The effectiveness of our approach is validated on the widely used VIPeR dataset and ETHZ dataset.

3 0.18255933 55 cvpr-2013-Background Modeling Based on Bidirectional Analysis

Author: Atsushi Shimada, Hajime Nagahara, Rin-ichiro Taniguchi

Abstract: Background modeling and subtraction is an essential task in video surveillance applications. Most traditional studies use information observed in past frames to create and update a background model. To adapt to background changes, the backgroundmodel has been enhancedby introducing various forms of information including spatial consistency and temporal tendency. In this paper, we propose a new framework that leverages information from a future period. Our proposed approach realizes a low-cost and highly accurate background model. The proposed framework is called bidirectional background modeling, and performs background subtraction based on bidirectional analysis; i.e., analysis from past to present and analysis from future to present. Although a result will be output with some delay because information is takenfrom a futureperiod, our proposed approach improves the accuracy by about 30% if only a 33-millisecond of delay is acceptable. Furthermore, the memory cost can be reduced by about 65% relative to typical background modeling.

4 0.16268341 216 cvpr-2013-Improving Image Matting Using Comprehensive Sampling Sets

Author: Ehsan Shahrian, Deepu Rajan, Brian Price, Scott Cohen

Abstract: In this paper, we present a new image matting algorithm that achieves state-of-the-art performance on a benchmark dataset of images. This is achieved by solving two major problems encountered by current sampling based algorithms. The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present. Here, we describe a method by which a more comprehensive and representative set of samples is collected so as not to miss out on the true samples. This is accomplished by expanding the sampling range for pixels farther from the foreground or background boundary and ensuring that samples from each color distribution are included. The second problem is the overlap in color distributions of foreground and background regions. This causes sampling based methods to fail to pick the correct samples for foreground and background. Our design of an objective function forces those foreground and background samples to be picked that are generated from well-separated distributions. Comparison on the dataset at and evaluation by www.alphamatting.com shows that the proposed method ranks first in terms of error measures used in the website.

5 0.13034314 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images

Author: Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu

Abstract: We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on es- tablished co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.

6 0.12428831 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos

7 0.11596138 187 cvpr-2013-Geometric Context from Videos

8 0.10900336 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos

9 0.10855699 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

10 0.10741442 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction

11 0.10302278 10 cvpr-2013-A Fully-Connected Layered Model of Foreground and Background Flow

12 0.10111924 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition

13 0.099172205 357 cvpr-2013-Revisiting Depth Layers from Occlusions

14 0.095737241 294 cvpr-2013-Multi-class Video Co-segmentation with a Generative Multi-video Model

15 0.092818335 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus

16 0.091549158 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

17 0.08878509 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

18 0.08841683 166 cvpr-2013-Fast Image Super-Resolution Based on In-Place Example Regression

19 0.087613292 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras

20 0.08653564 455 cvpr-2013-Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.231), (1, -0.01), (2, 0.068), (3, -0.02), (4, -0.024), (5, -0.003), (6, 0.021), (7, -0.015), (8, -0.047), (9, 0.013), (10, 0.083), (11, -0.049), (12, 0.066), (13, 0.007), (14, 0.087), (15, -0.02), (16, -0.034), (17, -0.034), (18, 0.04), (19, -0.037), (20, -0.0), (21, 0.194), (22, -0.111), (23, -0.249), (24, -0.037), (25, -0.126), (26, 0.031), (27, 0.071), (28, -0.064), (29, -0.065), (30, -0.029), (31, -0.062), (32, 0.027), (33, -0.006), (34, 0.018), (35, 0.035), (36, 0.032), (37, -0.065), (38, -0.203), (39, -0.123), (40, 0.124), (41, 0.046), (42, 0.162), (43, 0.025), (44, 0.035), (45, -0.01), (46, -0.096), (47, -0.02), (48, 0.045), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92461765 148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes

Author: Xiaobo Ren, Tony X. Han, Zhihai He

Abstract: We consider video object cut as an ensemble of framelevel background-foreground object classifiers which fuses information across frames and refine their segmentation results in a collaborative and iterative manner. Our approach addresses the challenging issues of modeling of background with dynamic textures and segmentation of foreground objects from cluttered scenes. We construct patch-level bagof-words background models to effectively capture the background motion and texture dynamics. We propose a foreground salience graph (FSG) to characterize the similarity of an image patch to the bag-of-words background models in the temporal domain and to neighboring image patches in the spatial domain. We incorporate this similarity information into a graph-cut energy minimization framework for foreground object segmentation. The background-foreground classification results at neighboring frames are fused together to construct a foreground probability map to update the graph weights. The resulting object shapes at neighboring frames are also used as constraints to guide the energy minimization process during graph cut. Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes, including the new Change Detection Challenge Dataset, demonstrate that the proposed ensemble video object cut method outperforms various state-ofthe-art algorithms.

2 0.79279113 451 cvpr-2013-Unsupervised Salience Learning for Person Re-identification

Author: Rui Zhao, Wanli Ouyang, Xiaogang Wang

Abstract: Human eyes can recognize person identities based on some small salient regions. However, such valuable salient information is often hidden when computing similarities of images with existing approaches. Moreover, many existing approaches learn discriminative features and handle drastic viewpoint change in a supervised way and require labeling new training data for a different pair of camera views. In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning. Distinctive features are extracted without requiring identity labels in the training procedure. First, we apply adjacency constrained patch matching to build dense correspondence between image pairs, which shows effectiveness in handling misalignment caused by large viewpoint and pose variations. Second, we learn human salience in an unsupervised manner. To improve the performance of person re-identification, human salience is incorporated in patch matching to find reliable and discriminative matched patches. The effectiveness of our approach is validated on the widely used VIPeR dataset and ETHZ dataset.

3 0.66312015 464 cvpr-2013-What Makes a Patch Distinct?

Author: Ran Margolin, Ayellet Tal, Lihi Zelnik-Manor

Abstract: What makes an object salient? Most previous work assert that distinctness is the dominating factor. The difference between the various algorithms is in the way they compute distinctness. Some focus on the patterns, others on the colors, and several add high-level cues and priors. We propose a simple, yet powerful, algorithm that integrates these three factors. Our key contribution is a novel and fast approach to compute pattern distinctness. We rely on the inner statistics of the patches in the image for identifying unique patterns. We provide an extensive evaluation and show that our approach outperforms all state-of-the-art methods on the five most commonly-used datasets.

4 0.64786947 55 cvpr-2013-Background Modeling Based on Bidirectional Analysis

Author: Atsushi Shimada, Hajime Nagahara, Rin-ichiro Taniguchi

Abstract: Background modeling and subtraction is an essential task in video surveillance applications. Most traditional studies use information observed in past frames to create and update a background model. To adapt to background changes, the backgroundmodel has been enhancedby introducing various forms of information including spatial consistency and temporal tendency. In this paper, we propose a new framework that leverages information from a future period. Our proposed approach realizes a low-cost and highly accurate background model. The proposed framework is called bidirectional background modeling, and performs background subtraction based on bidirectional analysis; i.e., analysis from past to present and analysis from future to present. Although a result will be output with some delay because information is takenfrom a futureperiod, our proposed approach improves the accuracy by about 30% if only a 33-millisecond of delay is acceptable. Furthermore, the memory cost can be reduced by about 65% relative to typical background modeling.

5 0.58902603 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos

Author: Mehrsan Javan Roshtkhari, Martin D. Levine

Abstract: We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes.

6 0.58567542 22 cvpr-2013-A Non-parametric Framework for Document Bleed-through Removal

7 0.58174849 169 cvpr-2013-Fast Patch-Based Denoising Using Approximated Patch Geodesic Paths

8 0.57461685 166 cvpr-2013-Fast Image Super-Resolution Based on In-Place Example Regression

9 0.54525512 393 cvpr-2013-Separating Signal from Noise Using Patch Recurrence across Scales

10 0.53809792 195 cvpr-2013-HDR Deghosting: How to Deal with Saturation?

11 0.51108605 216 cvpr-2013-Improving Image Matting Using Comprehensive Sampling Sets

12 0.50890779 266 cvpr-2013-Learning without Human Scores for Blind Image Quality Assessment

13 0.49264592 270 cvpr-2013-Local Fisher Discriminant Analysis for Pedestrian Re-identification

14 0.49238756 455 cvpr-2013-Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

15 0.49181533 271 cvpr-2013-Locally Aligned Feature Transforms across Views

16 0.48926795 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images

17 0.48285875 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos

18 0.47778335 413 cvpr-2013-Story-Driven Summarization for Egocentric Video

19 0.47621772 294 cvpr-2013-Multi-class Video Co-segmentation with a Generative Multi-video Model

20 0.47488716 252 cvpr-2013-Learning Locally-Adaptive Decision Functions for Person Verification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.095), (16, 0.027), (26, 0.038), (33, 0.349), (41, 0.203), (67, 0.079), (69, 0.045), (87, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92310947 220 cvpr-2013-In Defense of Sparsity Based Face Recognition

Author: Weihong Deng, Jiani Hu, Jun Guo

Abstract: The success of sparse representation based classification (SRC) has largely boosted the research of sparsity based face recognition in recent years. A prevailing view is that the sparsity based face recognition performs well only when the training images have been carefully controlled and the number of samples per class is sufficiently large. This paper challenges the prevailing view by proposing a “prototype plus variation ” representation model for sparsity based face recognition. Based on the new model, a Superposed SRC (SSRC), in which the dictionary is assembled by the class centroids and the sample-to-centroid differences, leads to a substantial improvement on SRC. The experiments results on AR, FERET and FRGC databases validate that, if the proposed prototype plus variation representation model is applied, sparse coding plays a crucial role in face recognition, and performs well even when the dictionary bases are collected under uncontrolled conditions and only a single sample per classes is available.

2 0.92059356 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers

Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof

Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.

3 0.91226381 81 cvpr-2013-City-Scale Change Detection in Cadastral 3D Models Using Images

Author: Aparna Taneja, Luca Ballan, Marc Pollefeys

Abstract: In this paper, we propose a method to detect changes in the geometry of a city using panoramic images captured by a car driving around the city. We designed our approach to account for all the challenges involved in a large scale application of change detection, such as, inaccuracies in the input geometry, errors in the geo-location data of the images, as well as, the limited amount of information due to sparse imagery. We evaluated our approach on an area of 6 square kilometers inside a city, using 3420 images downloaded from Google StreetView. These images besides being publicly available, are also a good example of panoramic images captured with a driving vehicle, and hence demonstrating all the possible challenges resulting from such an acquisition. We also quantitatively compared the performance of our approach with respect to a ground truth, as well as to prior work. This evaluation shows that our approach outperforms the current state of the art.

same-paper 4 0.9030441 148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes

Author: Xiaobo Ren, Tony X. Han, Zhihai He

Abstract: We consider video object cut as an ensemble of framelevel background-foreground object classifiers which fuses information across frames and refine their segmentation results in a collaborative and iterative manner. Our approach addresses the challenging issues of modeling of background with dynamic textures and segmentation of foreground objects from cluttered scenes. We construct patch-level bagof-words background models to effectively capture the background motion and texture dynamics. We propose a foreground salience graph (FSG) to characterize the similarity of an image patch to the bag-of-words background models in the temporal domain and to neighboring image patches in the spatial domain. We incorporate this similarity information into a graph-cut energy minimization framework for foreground object segmentation. The background-foreground classification results at neighboring frames are fused together to construct a foreground probability map to update the graph weights. The resulting object shapes at neighboring frames are also used as constraints to guide the energy minimization process during graph cut. Our extensive experimental results and performance comparisons over a diverse set of challenging videos with dynamic scenes, including the new Change Detection Challenge Dataset, demonstrate that the proposed ensemble video object cut method outperforms various state-ofthe-art algorithms.

5 0.88048732 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

Author: Michele Fenzi, Laura Leal-Taixé, Bodo Rosenhahn, Jörn Ostermann

Abstract: In this paper, we propose a method for learning a class representation that can return a continuous value for the pose of an unknown class instance using only 2D data and weak 3D labelling information. Our method is based on generative feature models, i.e., regression functions learnt from local descriptors of the same patch collected under different viewpoints. The individual generative models are then clustered in order to create class generative models which form the class representation. At run-time, the pose of the query image is estimated in a maximum a posteriori fashion by combining the regression functions belonging to the matching clusters. We evaluate our approach on the EPFL car dataset [17] and the Pointing’04 face dataset [8]. Experimental results show that our method outperforms by 10% the state-of-the-art in the first dataset and by 9% in the second.

6 0.88024509 202 cvpr-2013-Hierarchical Saliency Detection

7 0.87925887 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

8 0.87879026 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

9 0.87864751 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video

10 0.87863588 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People

11 0.87862915 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation

12 0.87832403 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

13 0.8782751 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns

14 0.87823415 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

15 0.87783998 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

16 0.87772626 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

17 0.87766832 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

18 0.87750518 204 cvpr-2013-Histograms of Sparse Codes for Object Detection

19 0.87748128 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

20 0.87745708 438 cvpr-2013-Towards Pose Robust Face Recognition