cvpr cvpr2013 cvpr2013-357 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adarsh Kowdle, Andrew Gallagher, Tsuhan Chen
Abstract: In this work, we consider images of a scene with a moving object captured by a static camera. As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features.
Reference: text
sentIndex sentText sentNum sentScore
1 Revisiting Depth Layers from Occlusions Adarsh Kowdle Andrew Gallagher Tsuhan Chen Cornell University apk 6 4 @ corne l edu l Cornell University acg2 2 6 @ cornel l edu Cornell University t suhan@ ece cornel l edu . [sent-1, score-0.174]
2 Abstract In this work, we consider images of a scene with a moving object captured by a static camera. [sent-5, score-0.877]
3 As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. [sent-6, score-0.604]
4 The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. [sent-7, score-1.942]
5 We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. [sent-8, score-1.86]
6 We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features. [sent-9, score-1.675]
7 Introduction We consider a time-series ofimages of a scene with moving objects captured from a static camera, and our goal is to exploit occlusion cues revealed as the objects move through the scene to segment the scene into depth layers. [sent-11, score-2.159]
8 Recovering the depth layers of a scene from a 2D image sequence has a number of applications. [sent-12, score-0.535]
9 Video surveillance often has a fixed camera focused on a scene with one or more moving objects. [sent-13, score-0.727]
10 As objects move through the scene over time, we recover a layered representation of the scene. [sent-14, score-0.269]
11 This aides tasks such as object detection and recognition in the presence of occlusions since one can reason about partial observations of an occluded object with a better 3D understanding of the scene [6, 15, 22]. [sent-15, score-0.414]
12 In addition, a layered representation of the scene is useful in video editing applications, such as composing novel objects into the scene with occlusion reasoning [30] and changing the depth of focus [24]. [sent-16, score-0.915]
13 An image sequence captured from a dynamic (moving) camera allows one to leverage powerful stereo matching cues to recover the depth and occlusion information of the scene. [sent-17, score-0.934]
14 However, these cues are absent in the case of a static camera. [sent-18, score-0.403]
15 For single images, monocular cues help reveal useful depth information [8, 10, 12, 13, 23, 28, 3 1, 32]. [sent-19, score-0.701]
16 In this work, we consider a set of images with moving objects captured from a static camera. [sent-20, score-0.671]
17 These pairwise cues are powerful, but sparse, which makes our goal of extracting dense pixel-level depth layers a hard problem. [sent-22, score-0.777]
18 In this work, we cast the problem of depth-layer segmentation as a discrete labeling problem on a spatio-temporal MRF over the video. [sent-23, score-0.15]
19 We accumulate the pairwise ordering cues revealed as the object moves through the scene and include monocular cues to propagate the sparse occlusion cues through the scene. [sent-24, score-2.081]
20 We over-segment the background scene (which has no moving objects) and construct a region-level MRF with edges between adjacent regions. [sent-25, score-0.975]
21 In each frame, we identify the pixels corresponding to the moving object and add a node corresponding to each moving object for every frame of the video. [sent-26, score-1.371]
22 We add temporal edges between the corresponding moving object nodes across frames, allowing us to encode a smooth motion prior for the moving object. [sent-27, score-1.359]
23 As the object moves about the scene, we detect motion occlusion events and add edges between the background scene node and the corresponding moving object node, including long range edges between two background scene nodes to encode the pairwise depth-ordering or occlusion cues. [sent-28, score-2.457]
24 An overview of our proposed formulation for a single moving object is shown in Figure 1, with the extension to handle multiple objects in Section 3. [sent-29, score-0.61]
25 Our paper, for the first time, proposes a framework for recovering depth layers in static camera scenes by combining depth-ordering cues from moving objects and cues from monocular occlusion reasoning. [sent-32, score-2.098]
26 Our approach works with any moving object (human or other- wise) and extends to multiple objects moving in the scene. [sent-33, score-1.112]
27 We show that this depth layer reasoning out-performs the current state-of-the-art in terms of depth-layer recovery. [sent-34, score-0.285]
28 Each colored node corresponds to the respective colored region in (b). [sent-37, score-0.192]
29 The red nodes correspond to the moving object with a node for every frame f in the input sequence ({1, 2, . [sent-38, score-0.884]
30 rc Teh teh ree dob nsoerdvesed c pairwise depth-ordering, foborj einctst wanitche abe ntowdeee fno trh eev green-red fnio nde ths at f = 1t ,s eaqnude bncluee- (r{e1d, n2o,. [sent-43, score-0.266]
31 The red edges enforce a smooth motion model for the moving object; (d) Shows the inferred depth layers, white = near and black = far. [sent-47, score-0.991]
32 Related work Research in cognitive rely on occlusion science has shown that humans cues to obtain object depth discontinuities and even in the absence of strong image cues such as edges and lighting clusion boundaries boundaries [17, 25]. [sent-49, score-1.439]
33 Recovering oc- in a scene is a classic problem that has been a topic of wide interest. [sent-50, score-0.164]
34 We focus on prior work with the similar setting of static camera scenarios. [sent-51, score-0.195]
35 classify these works into learning-based approaches that purely rely on motion We broadly approaches and occlusion cues revealed by the moving object. [sent-52, score-1.296]
36 approaches Prior work has explored for estimating the depth of the scene [8, 10, 12, 14,23,28,3 1,32] and estimating depth ordering [13, 16] from a single image for 3D scene understanding. [sent-54, score-0.827]
37 Recent work has shown objects (clutter) in the scene to aid better depth estimation of the scene [9, 11] through affordances. [sent-55, score-0.605]
38 [5] showed that the pose of people interacting with a cluttered room can be used to obtain functional regions and recover a coarse 3D geometry of the room. [sent-57, score-0.144]
39 Our work is complementary to this work, and in particular is agnostic to priors about the type of moving object and the type of scene (indoor or outdoor). [sent-58, score-0.796]
40 In other words, we do not require a human as the moving object. [sent-59, score-0.502]
41 We relate back to prior research in cognitive science that show that occlusion cues we observe are agnostic to any prior about the object. [sent-60, score-0.725]
42 We use these sparse, yet strong occlusion cues revealed by the moving object to aid the dense depth layer segmentation of the scene. [sent-61, score-1.658]
43 We work with a single static camera image sequence that precludes us from using algorithms for multiview occlusion reasoning using a moving object [7]. [sent-63, score-1.074]
44 We focus on segmenting a scene captured by a single static camera into depth layers using occlusion cues revealed by the moving objects. [sent-64, score-1.951]
45 [29] who use pairwise occlusion cues to “push” and “pop” the regions of the scene affected by the moving object to obtain depth layers at each frame. [sent-67, score-1.833]
46 A limitation of these works is that they reason only about the portion of the scene the object interacts with, leaving behind huge portions of the scene at an unknown depth. [sent-68, score-0.591]
47 In addition, since the interaction with each region is treated independently it leads to excessive fragmentation of the scene as we show in Section 4. [sent-69, score-0.288]
48 This fragmentation can be partially avoided [29] by making the (possibly over-restrictive) strong assumption that the moving object stays at a constant depth. [sent-70, score-0.671]
49 Our model includes a more reasonable model of object motion. [sent-71, score-0.075]
50 In summary, we revisit depth layers from occlusions and address limitations of prior work via a unified framework that leverages sparse depth-ordering cues revealed by the moving object and gracefully propagates them throughout the whole scene. [sent-72, score-1.483]
51 Algorithm We formulate the task of segmenting the scene into depth layers as a discrete labeling problem. [sent-74, score-0.604]
52 In this section, we first describe our formulation as applied to a scene with a single moving object and then extend the same framework to handle multiple moving objects in the scene. [sent-75, score-1.276]
53 We refer to the scene without any moving objects as the background scene. [sent-79, score-0.86]
54 We use a calibration stage to obtain a clean background image without any moving objects. [sent-80, score-0.663]
55 In the absence of the calibration stage we take advantage of the static camera scenario and obtain an estimate of the background image as the median image over the video. [sent-81, score-0.369]
56 Given the background image we obtain an over-segmentation using mean shift segmentation [4] to give us about 300 superpixels. [sent-82, score-0.209]
57 We treat this segmentation as a stencil of background superpixels 222000999200 (a)Objectin-front-ofbackgroundsceneregion (b) Object behind background scene region Figure 2: Pairwise depth-ordering cues. [sent-83, score-0.848]
58 Left image shows the background scene segmentation and the right image shows an intermediate frame segmentation with the moving object segment. [sent-84, score-1.077]
59 It also reveals new relationships via transitivity; the chair occludes the object and at the same instant the object occludes regions on the wall; therefore the chair occludes the regions on the wall. [sent-86, score-1.133]
60 Given the superpixel stencil for the background scene, we update this superpixel map for every frame by identifying the pixels corresponding to the moving object via background subtraction. [sent-89, score-1.525]
61 We model the appearance of the background using a per-pixel Gaussian distribution (Ap) centered at the mean color (RGB space) of the pixel across the whole video. [sent-90, score-0.161]
62 Given Ap, for every frame we estimate the likelihood for each pixel belonging to the background. [sent-91, score-0.11]
63 We label pixels with background likelihood above 90% as confident background pixels and below 10% likelihood as confident moving object pixels. [sent-92, score-1.017]
64 Using these as confident initial seeds, we learn an appearance model for the background (BG) and the moving object (FG). [sent-93, score-0.797]
65 The moving object segmentation is obtained using iterative graph-cuts [1, 2, 20] updating the BG/FG color models with each iteration similar to GrabCut [27]. [sent-94, score-0.654]
66 Figure 2 shows examples of the moving object segmentation overlaid on the background segmentation. [sent-95, score-0.786]
67 After this stage, we have the background scene superpixel map and the moving object segmentation for each frame. [sent-96, score-1.156]
68 A region-level MRF is constructed over the background scene superpixels where each superpixel is a node with an edge to adjacent superpixels. [sent-97, score-0.819]
69 We add a node corresponding to the moving object for every frame of the video and add temporal edges connecting the moving object nodes on adjacent frames. [sent-98, score-1.623]
70 Pairwise depth-ordering cues The object moving through the scene is either occluded by or occludes portions of the scene. [sent-102, score-1.389]
71 In our superpixel representation of the scene, we accumulate the pairwise cues using a matrix we call Occlusion Matrix (O) where, Oi,j ∈ {−1, 0, +1} indicates the relationship between superpixel i− a1n,d0 superpixel j tie. [sent-104, score-1.103]
72 matrix is updated at every frame of the vide=o using detected motion occlusion events or using learnt monocular cues in absence ofocclusion cues. [sent-111, score-0.999]
73 Low-level cues revealed by the moving object in the scene serve as sparse, yet strong pairwise depth-ordering cues. [sent-113, score-1.363]
74 We work with the abstract superpixel representation of each frame and use cues similar to prior work [3] to obtain pairwise relationship between the moving object segment and the superpixel it interacts with. [sent-114, score-1.68]
75 The cues are intuitive, given a background region the moving object is interacting with, we use the moving object pixels and the boundary pixels of the background region to infer whether the object moved in-front-of this region or behind this region, respectively, as illustrated in Figure 2. [sent-115, score-2.139]
76 We update the corresponding entry of the occlusion matrix with Oi,j as +1 to indicate that superpixel i oc- cludes superpixel j and set Oj,i to −1. [sent-116, score-0.742]
77 In addition to the pairwise depth-ordering cues bettwoe −en1 . [sent-117, score-0.438]
78 th Ien moving object and the superpixel it is interacting with, we also enforce transitivity while updating the matrix. [sent-118, score-1.036]
79 If the object is occluded by a region of the background scene and is simultaneously occluding several regions of the background scene, via transitivity it establishes a pairwise relationship between the occluding background region and each of the other background regions as shown in Figure 2(b). [sent-119, score-1.494]
80 More formally, if m refers to the moving object segment simultaneously involved in motion occlusion events with superpixels k and l then, Ok,m = +1 and Ol,m = −1, implies Ok,l = +1. [sent-120, score-1.136]
81 This provides a strong depth-ordering cue between k and l. [sent-121, score-0.031]
82 In addition, since k and l are not constrained to be adjacent superpixels, long-range edges between non-adjacent superpixels are also a result. [sent-122, score-0.29]
83 We use monocular cues to provide evidence about occlusions for the other regions of the scene. [sent-124, score-0.555]
84 Given the superpixel map for each frame, we use the work of Hoiem et al. [sent-125, score-0.206]
85 [13] that uses learnt priors to determine which of two adjacent superpixels occludes the other. [sent-126, score-0.47]
86 For each frame, we first update the occlusion matrix using the motion occlusion cues where available and update the matrix for all the other spatially adjacent superpixels using the monocular cues. [sent-127, score-1.349]
87 We do not enforce transitivity here since the monocular cues are not as reliable as motion occlusion 222000999311 term will encourage that itakes a depth label closer (lower label) than j via a large penalty for the red terms and zero penalty for the blue terms. [sent-128, score-1.233]
88 The occlusion matrix serves as the observations for modulating the terms of the energy function described below. [sent-132, score-0.295]
89 Energy minimization problem The goal given the sparse pairwise depth-ordering constraints is to obtain dense depth-layers. [sent-135, score-0.17]
90 One approach is a greedy algorithm where the whole scene starts at layer-0 and with every pairwise depth-ordering constraint regions of the scene are “pushed” and “popped” [3] to obtain the final labeling. [sent-136, score-0.545]
91 [13] use a graph with boundaries between superpixels are nodes connected to adjacent boundaries to encourage continuity and closure. [sent-138, score-0.39]
92 [16] use image junctions as nodes to obtain a globally consistent depth ordering using a minimum spanning tree. [sent-140, score-0.354]
93 In this work, we use superpixels as nodes in the graph. [sent-141, score-0.206]
94 This allows us to directly obtain the depth-layer labeling, and also incorporate long range edges between nodes. [sent-142, score-0.069]
95 We formulate depth layer segmentation as a discrete labeling problem where every superpixel is assigned a depth label {1, 2, . [sent-143, score-0.813]
96 rTeh eL l iasbe slosm aere p dree-pdtehfi-onredder yeedt lfrarogme closer to the camera moving away i. [sent-150, score-0.596]
97 s eWre t ofo trhmeu claamte ethrais m mmouvlitnig-la abwela syeig . [sent-153, score-0.036]
wordName wordTfidf (topN-words)
[('moving', 0.502), ('cues', 0.302), ('occlusion', 0.265), ('occludes', 0.249), ('depth', 0.209), ('superpixel', 0.206), ('scene', 0.164), ('background', 0.161), ('monocular', 0.158), ('revealed', 0.153), ('superpixels', 0.142), ('pairwise', 0.136), ('layers', 0.13), ('transitivity', 0.126), ('interacts', 0.107), ('static', 0.101), ('ordering', 0.081), ('frame', 0.079), ('adjacent', 0.079), ('object', 0.075), ('motion', 0.074), ('cornell', 0.072), ('stencil', 0.072), ('cornel', 0.072), ('edges', 0.069), ('node', 0.067), ('reveals', 0.066), ('interacting', 0.064), ('nodes', 0.064), ('fragmentation', 0.063), ('moves', 0.062), ('camera', 0.061), ('region', 0.061), ('confident', 0.059), ('mrf', 0.057), ('agnostic', 0.055), ('occluded', 0.055), ('regions', 0.05), ('segmentation', 0.048), ('accumulate', 0.047), ('absence', 0.046), ('occlusions', 0.045), ('ellipse', 0.045), ('events', 0.044), ('layered', 0.042), ('portions', 0.042), ('labeling', 0.041), ('add', 0.04), ('orange', 0.04), ('behind', 0.039), ('white', 0.039), ('reasoning', 0.038), ('layer', 0.038), ('cognitive', 0.037), ('boundaries', 0.037), ('occluding', 0.036), ('trhmeu', 0.036), ('eev', 0.036), ('cllo', 0.036), ('popped', 0.036), ('recovering', 0.035), ('chair', 0.035), ('aid', 0.035), ('captured', 0.035), ('enforce', 0.034), ('sparse', 0.034), ('segment', 0.034), ('red', 0.034), ('prior', 0.033), ('bfo', 0.033), ('suhan', 0.033), ('tsuhan', 0.033), ('cludes', 0.033), ('aere', 0.033), ('cisc', 0.033), ('dob', 0.033), ('objects', 0.033), ('update', 0.032), ('colored', 0.032), ('reveal', 0.032), ('sequence', 0.032), ('discrete', 0.031), ('abe', 0.031), ('pop', 0.031), ('strong', 0.031), ('encourage', 0.031), ('every', 0.031), ('corne', 0.03), ('nen', 0.03), ('modulating', 0.03), ('fouhey', 0.03), ('eqn', 0.03), ('fno', 0.03), ('cast', 0.03), ('black', 0.03), ('recover', 0.03), ('updating', 0.029), ('segmenting', 0.029), ('clusion', 0.029), ('vide', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 357 cvpr-2013-Revisiting Depth Layers from Occlusions
Author: Adarsh Kowdle, Andrew Gallagher, Tsuhan Chen
Abstract: In this work, we consider images of a scene with a moving object captured by a static camera. As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features.
2 0.23368397 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
Author: Ju Shen, Sen-Ching S. Cheung
Abstract: The recent popularity of structured-light depth sensors has enabled many new applications from gesture-based user interface to 3D reconstructions. The quality of the depth measurements of these systems, however, is far from perfect. Some depth values can have significant errors, while others can be missing altogether. The uncertainty in depth measurements among these sensors can significantly degrade the performance of any subsequent vision processing. In this paper, we propose a novel probabilistic model to capture various types of uncertainties in the depth measurement process among structured-light systems. The key to our model is the use of depth layers to account for the differences between foreground objects and background scene, the missing depth value phenomenon, and the correlation between color and depth channels. The depth layer labeling is solved as a maximum a-posteriori estimation problem, and a Markov Random Field attuned to the uncertainty in measurements is used to spatially smooth the labeling process. Using the depth-layer labels, we propose a depth correction and completion algorithm that outperforms oth- er techniques in the literature.
3 0.22416177 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
4 0.18322419 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
Author: Guang Shu, Afshin Dehghan, Mubarak Shah
Abstract: We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specificfeatures like consistent colorpatterns. In our approach, we apply a Superpixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using superpixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background . Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.
5 0.17537458 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
6 0.15942562 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
7 0.15867981 29 cvpr-2013-A Video Representation Using Temporal Superpixels
9 0.14980298 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
10 0.12937553 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
11 0.12678355 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
12 0.12546501 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera
13 0.12310389 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
14 0.1210695 108 cvpr-2013-Dense 3D Reconstruction from Severely Blurred Images Using a Single Moving Camera
15 0.11559132 455 cvpr-2013-Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions
16 0.11163848 196 cvpr-2013-HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences
17 0.11131267 10 cvpr-2013-A Fully-Connected Layered Model of Foreground and Background Flow
18 0.11087507 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
19 0.11035211 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
20 0.10818458 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery
topicId topicWeight
[(0, 0.236), (1, 0.141), (2, 0.094), (3, -0.03), (4, 0.018), (5, -0.018), (6, 0.073), (7, 0.145), (8, -0.035), (9, 0.074), (10, 0.102), (11, -0.073), (12, 0.066), (13, 0.161), (14, 0.084), (15, 0.025), (16, -0.09), (17, 0.05), (18, -0.165), (19, 0.053), (20, 0.057), (21, -0.026), (22, -0.067), (23, -0.058), (24, 0.006), (25, -0.05), (26, -0.01), (27, -0.02), (28, -0.009), (29, 0.067), (30, 0.046), (31, -0.02), (32, 0.076), (33, -0.043), (34, -0.016), (35, 0.003), (36, -0.012), (37, 0.009), (38, 0.163), (39, -0.051), (40, -0.07), (41, 0.0), (42, 0.008), (43, 0.002), (44, -0.14), (45, 0.053), (46, -0.132), (47, -0.067), (48, -0.016), (49, 0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.96554029 357 cvpr-2013-Revisiting Depth Layers from Occlusions
Author: Adarsh Kowdle, Andrew Gallagher, Tsuhan Chen
Abstract: In this work, we consider images of a scene with a moving object captured by a static camera. As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features.
2 0.62956852 29 cvpr-2013-A Video Representation Using Temporal Superpixels
Author: Jason Chang, Donglai Wei, John W. Fisher_III
Abstract: We develop a generative probabilistic model for temporally consistent superpixels in video sequences. In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods.
3 0.6128363 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
Author: Ju Shen, Sen-Ching S. Cheung
Abstract: The recent popularity of structured-light depth sensors has enabled many new applications from gesture-based user interface to 3D reconstructions. The quality of the depth measurements of these systems, however, is far from perfect. Some depth values can have significant errors, while others can be missing altogether. The uncertainty in depth measurements among these sensors can significantly degrade the performance of any subsequent vision processing. In this paper, we propose a novel probabilistic model to capture various types of uncertainties in the depth measurement process among structured-light systems. The key to our model is the use of depth layers to account for the differences between foreground objects and background scene, the missing depth value phenomenon, and the correlation between color and depth channels. The depth layer labeling is solved as a maximum a-posteriori estimation problem, and a Markov Random Field attuned to the uncertainty in measurements is used to spatially smooth the labeling process. Using the depth-layer labels, we propose a depth correction and completion algorithm that outperforms oth- er techniques in the literature.
4 0.61183023 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
5 0.59668308 114 cvpr-2013-Depth Acquisition from Density Modulated Binary Patterns
Author: Zhe Yang, Zhiwei Xiong, Yueyi Zhang, Jiao Wang, Feng Wu
Abstract: This paper proposes novel density modulated binary patterns for depth acquisition. Similar to Kinect, the illumination patterns do not need a projector for generation and can be emitted by infrared lasers and diffraction gratings. Our key idea is to use the density of light spots in the patterns to carry phase information. Two technical problems are addressed here. First, we propose an algorithm to design the patterns to carry more phase information without compromising the depth reconstruction from a single captured image as with Kinect. Second, since the carried phase is not strictly sinusoidal, the depth reconstructed from the phase contains a systematic error. We further propose a pixelbased phase matching algorithm to reduce the error. Experimental results show that the depth quality can be greatly improved using the phase carried by the density of light spots. Furthermore, our scheme can achieve 20 fps depth reconstruction with GPU assistance.
7 0.57921273 311 cvpr-2013-Occlusion Patterns for Object Class Detection
9 0.54467106 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
10 0.53655344 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
11 0.51964337 232 cvpr-2013-Joint Geodesic Upsampling of Depth Images
12 0.51817214 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
13 0.51815343 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images
14 0.51669931 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
15 0.51384056 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video
16 0.50916135 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera
17 0.50655526 10 cvpr-2013-A Fully-Connected Layered Model of Foreground and Background Flow
18 0.49892446 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization
19 0.49568337 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
20 0.49290103 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
topicId topicWeight
[(10, 0.054), (26, 0.026), (33, 0.75), (67, 0.023), (69, 0.04), (87, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99959439 357 cvpr-2013-Revisiting Depth Layers from Occlusions
Author: Adarsh Kowdle, Andrew Gallagher, Tsuhan Chen
Abstract: In this work, we consider images of a scene with a moving object captured by a static camera. As the object (human or otherwise) moves about the scene, it reveals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatiotemporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlusion cues are superior to using either feature independently, and using a na¨ ıve combination of the features.
2 0.99932528 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification
Author: Amirreza Shaban, Hamid R. Rabiee, Mehrdad Farajtabar, Marjan Ghazvininejad
Abstract: Bag of words models for feature extraction have demonstrated top-notch performance in image classification. These representations are usually accompanied by a coding method. Recently, methods that code a descriptor giving regard to its nearby bases have proved efficacious. These methods take into account the nonlinear structure of descriptors, since local similarities are a good approximation of global similarities. However, they confine their usage of the global similarities to nearby bases. In this paper, we propose a coding scheme that brings into focus the manifold structure of descriptors, and devise a method to compute the global similarities of descriptors to the bases. Given a local similarity measure between bases, a global measure is computed. Exploiting the local similarity of a descriptor and its nearby bases, a global measure of association of a descriptor to all the bases is computed. Unlike the locality-based and sparse coding methods, the proposed coding varies smoothly with respect to the underlying manifold. Experiments on benchmark image classification datasets substantiate the superiority oftheproposed method over its locality and sparsity based rivals.
3 0.99878603 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
Author: Neill D.F. Campbell, Kartic Subr, Jan Kautz
Abstract: Conditional Random Fields (CRFs) are used for diverse tasks, ranging from image denoising to object recognition. For images, they are commonly defined as a graph with nodes corresponding to individual pixels and pairwise links that connect nodes to their immediate neighbors. Recent work has shown that fully-connected CRFs, where each node is connected to every other node, can be solved efficiently under the restriction that the pairwise term is a Gaussian kernel over a Euclidean feature space. In this paper, we generalize the pairwise terms to a non-linear dissimilarity measure that is not required to be a distance metric. To this end, we propose a density estimation technique to derive conditional pairwise potentials in a nonparametric manner. We then use an efficient embedding technique to estimate an approximate Euclidean feature space for these potentials, in which the pairwise term can still be expressed as a Gaussian kernel. We demonstrate that the use of non-parametric models for the pairwise interactions, conditioned on the input data, greatly increases expressive power whilst maintaining efficient inference.
4 0.99871409 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Author: Petr Gronát, Guillaume Obozinski, Josef Sivic, Tomáš Pajdla
Abstract: The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as onlyfewpositive training examples are availablefor each location, we propose a new approach to calibrate all the per-location SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work. 2Center for Machine Perception, Faculty of Electrical Engineering 3WILLOW project, Laboratoire d’Informatique de l’E´cole Normale Sup e´rieure, ENS/INRIA/CNRS UMR 8548. 4Universit Paris-Est, LIGM (UMR CNRS 8049), Center for Visual Computing, Ecole des Ponts - ParisTech, 77455 Marne-la-Valle, France
5 0.99857259 137 cvpr-2013-Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis
Author: Christian Thériault, Nicolas Thome, Matthieu Cord
Abstract: In this paper, we address the challenging problem of categorizing video sequences composed of dynamic natural scenes. Contrarily to previous methods that rely on handcrafted descriptors, we propose here to represent videos using unsupervised learning of motion features. Our method encompasses three main contributions: 1) Based on the Slow Feature Analysis principle, we introduce a learned local motion descriptor which represents the principal and more stable motion components of training videos. 2) We integrate our local motion feature into a global coding/pooling architecture in order to provide an effective signature for each video sequence. 3) We report state of the art classification performances on two challenging natural scenes data sets. In particular, an outstanding improvement of 11 % in classification score is reached on a data set introduced in 2012.
6 0.99857241 252 cvpr-2013-Learning Locally-Adaptive Decision Functions for Person Verification
7 0.99852854 55 cvpr-2013-Background Modeling Based on Bidirectional Analysis
8 0.99850804 93 cvpr-2013-Constraints as Features
9 0.99817038 346 cvpr-2013-Real-Time No-Reference Image Quality Assessment Based on Filter Learning
10 0.99668539 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
11 0.99660563 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
12 0.99634123 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
13 0.99419159 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
14 0.99384916 301 cvpr-2013-Multi-target Tracking by Rank-1 Tensor Approximation
15 0.98831928 379 cvpr-2013-Scalable Sparse Subspace Clustering
16 0.98812801 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
17 0.98779285 266 cvpr-2013-Learning without Human Scores for Blind Image Quality Assessment
18 0.98483992 306 cvpr-2013-Non-rigid Structure from Motion with Diffusion Maps Prior
19 0.98449767 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval