cvpr cvpr2013 cvpr2013-384 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xing Mei, Xun Sun, Weiming Dong, Haitao Wang, Xiaopeng Zhang
Abstract: This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, ”Segment-Tree ”, is proposed for non-local matching cost aggregation. Conceptually, the segment-tree is constructed in a three-step process: first, the pixels are grouped into a set of segments with the reference color or intensity image; second, a tree graph is created for each segment; and in the final step, these independent segment graphs are linked to form the segment-tree structure. In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. Compared to MST where the graph connectivity is determined with local edge weights, our method introduces some ’non-local’ decision rules: the pixels in one perceptually consistent segment are more likely to share similar disparities, and therefore their connectivity within the segment should be first enforced in the tree construction process. The matching costs are then aggregated over the tree within two passes. Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results.
Reference: text
sentIndex sentText sentNum sentScore
1 cn , Abstract This paper presents a novel tree-based cost aggregation method for dense stereo matching. [sent-4, score-0.672]
2 Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, ”Segment-Tree ”, is proposed for non-local matching cost aggregation. [sent-5, score-0.494]
3 In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. [sent-7, score-0.221]
4 The matching costs are then aggregated over the tree within two passes. [sent-9, score-0.383]
5 Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. [sent-10, score-0.838]
6 Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results. [sent-11, score-0.71]
7 Introduction Dense two-frame stereo matching is one of the most extensively studied areas in computer vision. [sent-13, score-0.186]
8 A stereo algorithm usually takes four steps [15]: matching cost computation, cost aggregation, disparity computation and disparity refinement. [sent-14, score-1.044]
9 com on cost aggregation (step 2), which has great impact on the speed and accuracy of a stereo system. [sent-19, score-0.672]
10 Most aggregation methods work by defining a local support window for each pixel and averaging the costs over the region, which are therefore closely related to image filtering techniques. [sent-21, score-0.61]
11 Since then, various edge-aware filtering techniques have been explored for cost aggregation, such as geodesic weight [7] and segment support [20]. [sent-24, score-0.221]
12 This filter shows leading speed and accuracy performance in two recent stereo methods [3, 14]. [sent-27, score-0.188]
13 Recently, Yang first proposed a non-local cost aggregation method [23]. [sent-28, score-0.52]
14 Different from previous methods that rely on pixelwise support regions, Yang’s method performs nonlocal cost aggregation over the image with a tree structure. [sent-29, score-0.727]
15 The reference image is treated as a 4-connected, undirected planar graph: the nodes are image pixels, and the edges are all the edges between neighboring pixels. [sent-30, score-0.244]
16 The tree is then constructed as a minimum spanning tree (MST) over this graph. [sent-31, score-0.413]
17 By traversing the tree in two sequential passes (one from leaf to root, another one from root to leaf), each pixel receives proper contributions from all the other pixels in the image. [sent-32, score-0.367]
18 Evaluation on the standard Middlebury benchmark [16] shows that this non-local method outperforms the guided image filter in aggregation accuracy. [sent-33, score-0.571]
19 Besides aggregation, MST and its variants have also been used as graphical models for many global stereo methods such as tree-based dynamic programming [10, 21] and graph cut on sparse graph [18]. [sent-35, score-0.218]
20 Edges with equal weights also lead to non unique tree structures [21]. [sent-38, score-0.233]
21 In this paper, we propose a novel tree structure, SegmentTree (ST), for non-local cost aggregation. [sent-39, score-0.252]
22 Although MST has shown good performance in cost aggregation and other stereo work [21, 23], we believe ST is competitive for two reasons. [sent-41, score-0.672]
23 ST instead selects edges with both local edge weights and ’non-local’ segment properties, which yields a more robust tree structure. [sent-43, score-0.442]
24 Second and more importantly, ST incorporates the segmentation information into the cost aggregation step in a ’soft’ way. [sent-44, score-0.601]
25 By enforcing tight connections for the pixels inside each segment, improved aggregation results and better depth boundaries can be expected. [sent-46, score-0.537]
26 On the other hand, the aggregation weight between any two pixels in the same segment is determined by their geodesic distance over a sub-tree structure, which still × allows large disparity variation inside the segment without a hard constraint. [sent-47, score-1.034]
27 We quantitatively evaluate the aggregation accuracy with ST, MST and guided image filter on 19 Middlebury data sets, including the standard benchmark. [sent-50, score-0.546]
28 And even better results can be achieved if the tree structure is further updated with a color-depth joint segmentation. [sent-52, score-0.224]
29 For the Middlebury data sets, ST-based aggregation method is about 11 faster than the guided image filter [14]. [sent-55, score-0.546]
30 oIunt summary, trh teh acno nthterib guutiiodneds iomf taghies paper are: ∙ A novel tree structure for matching cost aggregation. [sent-56, score-0.307]
31 ∙ An efficient graph-based tree construction algorithm. [sent-57, score-0.25]
32 ∙ An effective tree structure refinement method. [sent-58, score-0.201]
33 ∙ Quantitative evaluation of several recent aggregation mQeutahnotditsa on a nvualmubateiro onf osfte rseeove draalta r seecetsn. [sent-59, score-0.448]
34 Non-Local Cost Aggregation In this section, we briefly review the non-local cost aggregation method. [sent-61, score-0.52]
35 Our algorithm follows the same work flow, except that we employ a different tree structure. [sent-63, score-0.18]
36 In this problem, the reference color/intensity image 퐼 is represented as a connected, undirected graph 퐺 = (푉, 퐸), where each node in 푉 corresponds to a pixel in 퐼, and each edge in 퐸 connects a pair ofneighboring pixels. [sent-64, score-0.293]
37 For an edge 푒 connecting pixel 푠 and 푟, its weight is decided as follows: 푤푒 = 푤(푠, 푟) = ∣퐼(푠) − 퐼(푟) ∣ (1) A tree 푇 can then be constructed by selecting a subset of edges from 퐸. [sent-65, score-0.421]
38 Yang proposed to construct 푇 as a minimum spanning tree (MST). [sent-66, score-0.208]
39 For accurate aggregation, such edges should be selected during the tree construction process, which leads to a MST with the minimum sum of the edge weights. [sent-68, score-0.386]
40 For any two pixels 푝 and 푞, there is only one path connecting them in 푇, and their distance 퐷(푝, 푞) is determined by the sum of the edge weights along the path. [sent-69, score-0.2]
41 Let 퐶푑(푝) denote the matching cost for pixel 푝 at disparity level the non-local aggregated cost 퐶푑퐴 (푝) is computed as a weighted sum of 푑, 퐶푑: 퐶푑퐴(푝) = ∑푆(푝,푞)퐶푑(푞) (2) ∑푞∈퐼 where 푞 covers every pixel in image 퐼. [sent-70, score-0.74]
42 This is different from traditional aggregation methods, where 푞 is limited in a local region around 푝. [sent-71, score-0.448]
43 With the tree structure 푇, 푆(푝, 푞) is defined as follows: 푆(푝,푞) = 푒푥푝(−퐷(휎푝,푞)) (3) where 휎 is a user-specified parameter for distance adjustment. [sent-73, score-0.201]
44 Aggregation needs to be performed for all the pixels at all disparity levels. [sent-77, score-0.391]
45 Yang showed that the aggregated costs for all the pixels can be efficiently computed by traversing the tree structure 푇 in two sequential passes. [sent-79, score-0.466]
46 In the first pass (Figure 1 (a)), the tree is traced from the leaf nodes to the root node. [sent-81, score-0.286]
47 After the first pass, the root node (node 푉 7 in Figure 1(a)) receives the weighted costs from all the other nodes, while the rest receive the costs from their subtrees. [sent-85, score-0.254]
48 Then in the second pass (Figure 1(b)), the tree is traversed from top to bottom. [sent-86, score-0.208]
49 Starting from the root node, the aggregated costs are passed to the subtrees. [sent-87, score-0.2]
50 m mDagiseparity results can then be computed with the aggregated cost volume 퐶퐴 and a simple Winner-Take-All (WTA) strategy. [sent-89, score-0.191]
51 A post-processing technique based on the same tree structure was also proposed in the original work [23]. [sent-90, score-0.201]
52 Segment-Tree Construction In this section we first propose a graph-based ST construction algorithm, then we present a simple but effective method to further enhance the tree structure, and lastly we discuss the computational complexity of the algorithm. [sent-92, score-0.25]
53 Step 1is a typical image segmentation problem, while step 2 and 3 enforce the connectivity inside and around each segment respectively. [sent-101, score-0.178]
54 16: if 푇푝 푇푞 then 17: Merge 푇푝, 푇푞 into a new tree 푇푝,푞 = (푉푝,푞, 퐸푝,푞) : 푉푝,푞 ← 푉푝 ∪ 푉푞, 퐸푝,푞 ← 퐸푝 ∪ 퐸푞 ∪ {푒푗} = 18: Update 퐸푉′: ∪퐸 푉′ ← 퐸′ ∪← {푒푗} 19: end if 20: Break the for loop if ∣퐸′ ∣ = ∣푉 ∣ − 1. [sent-114, score-0.18]
55 Our tree construction algorithm is listed in Algorithm 1. [sent-117, score-0.25]
56 The algorithm proceeds in three stages: 퐺 ∙ ∙ Initialization (Line 1-5): The edges in 퐸 are sorted iInn a non-decreasing o 1r-d5e)r: according tso tnhe 퐸 weights dteedfined in Equation (1), and a subtree is created for each node in 푉 . [sent-119, score-0.242]
57 Grouping (Line 6-13): The subtrees are merged into bigger groups nwei t6h- a f)u:l lT scan obtfr tehees edge seertg e퐸d. [sent-121, score-0.295]
58 If 푣푝 and 푣푞 belong to different subtrees, and th∈e edge weight 푤푒푗 satisfies a criterion proposed in [4], the subtrees 푇푝, 푇푞 are merged into a new subtree 푇푝,푞. [sent-123, score-0.376]
59 The edges of these subtrees (already collected in 퐸′) are then removed from 퐸. [sent-128, score-0.205]
60 If an edge connects two different subtrees, we merge the subtrees and include the edge in 퐸′. [sent-131, score-0.361]
61 The Grouping stage covers step 1and 2, and generates a set of segments and their subtrees simultaneously. [sent-134, score-0.247]
62 The Linking stage covers step 3, and selects edges with small weights to connect neighboring subtrees. [sent-136, score-0.204]
63 In fact, if each segment is treated as a basic graph node, the Linking stage builds a MST for this segment graph [2]. [sent-137, score-0.263]
64 (a) Reference image and two selected pixels 푝1 and 푝2 (b) Close-up of 푝1 ’s and 푝2 ’s neighborhood (c) Support weights computed with ST (d) Support weights computed with MST. [sent-144, score-0.205]
65 We select two pixels 푝1 , 푝2 from the reference image (marked as red dots), and calculate the support weights of the neighboring pixels with ST and MST respectively. [sent-149, score-0.244]
66 Enhancement with Color-Depth Segmentation We further propose to enhance the tree structure with a second segmentation process, which employs both color and the estimated depth information. [sent-155, score-0.306]
67 Our observation is that neighboring regions with different color distributions might still have similar disparities, and such regions should be merged for robust cost aggregation. [sent-156, score-0.194]
68 For joint segmentation, a rough disparity map 퐷 is computed with ST and non-local aggregation, as described in Section 2. [sent-158, score-0.373]
69 By re-running Algorithm 1 on the updated image graph, an enhanced segment-tree is constructed and serves as the final structure for cost aggregation and disparity estimation. [sent-163, score-0.975]
70 Note that both the segmentation parame- ter 푘 and the aggregation parameter 휎 are closely related to the edge weights. [sent-164, score-0.612]
71 By including an initial disparity map (Figure 3(d)), more consistent scene segmentation results can be achieved (Figure 3(d)), which in turn lead to much improved disparity estimation, especially around depth borders (Figure 3(e)). [sent-169, score-0.821]
72 (a) Reference Image (b) Color segmentation results (c) Color-depth joint segmentation results (d) Initial disparity results computed with ST (e) Final disparity results computed with the enhanced ST. [sent-177, score-0.908]
73 Parameter settings for (b) and (d): 휎1 For (d) and (e), pixels with erroneous disparities are marked in red. [sent-182, score-0.201]
74 Compared to the aggregation process (푂(푛 ⋅ 푙)), the ST conCstroumctpioarne process usually ttiaoknes p a very (s푂m(a푛ll⋅ part hofe SthTe tcootnalrunning time, as shown in the experiments. [sent-188, score-0.448]
75 We first quantitatively evaluate the aggregation accuracy of the four methods. [sent-200, score-0.47]
76 For each method, the disparity results are computed with the aggregated cost volume and a WTA local strategy. [sent-201, score-0.537]
77 The disparity error rates in non-occlusion regions (non-occlusion errors) are used to evaluate the aggregation accuracy. [sent-203, score-0.828]
78 Quantitative evaluation of four aggregation methods (GF [14], MST [23], ST-1, ST-2) on 19 Middlebury data sets with error threshold 1. [sent-211, score-0.514]
79 The percentages of the erroneous pixels in nonocclusion regions are used to evaluate the aggregation accuracy of the methods. [sent-212, score-0.603]
80 The WTA disparity results of Baby1 and Flowerpots data sets. [sent-221, score-0.346]
81 (a) disparity results computed with GF (b) disparity results computed with MST (c) disparity results computed with ST-1 (d) disparity results computed with ST-2. [sent-223, score-1.492]
82 The disparity results are refined with the same post-processing technique. [sent-242, score-0.346]
83 The quantitative results of ST-1 and ST-2 show that the non-local aggregation method benefits greatly from the segmentation information. [sent-252, score-0.532]
84 For visual comparison, we present the WTA disparity results (without post-processing) of Baby1 and Flowerpots data sets in Figure 4. [sent-253, score-0.39]
85 The complete disparity results can be found in the supplementary material. [sent-255, score-0.346]
86 We further evaluate the final disparity results of the four methods with the standard Middlebury benchmark [16]. [sent-256, score-0.393]
87 The disparity results of the four methods are presented in Figure 5. [sent-262, score-0.368]
88 The average tree construction time for MST and ST are 21 milliseconds and 34 milliseconds respectively, which take less than 10% of the total running time. [sent-271, score-0.325]
89 Final disparity results on the standard Middlebury benchmark [16]. [sent-273, score-0.371]
90 (a) disparity results computed with GF (b) disparity results computed with MST (c) disparity results computed with ST-1 (d) disparity results computed with ST-2. [sent-275, score-1.492]
91 Conclusion In this paper, we have presented a novel cost aggregation for stereo matching. [sent-282, score-0.672]
92 Our approach is based on a new tree structure, which successfully integrates the segmentation information in a recently proposed non-local aggregation framework [23]. [sent-283, score-0.689]
93 We have also proposed a fast tree construction algorithm and an effective method to update the tree structure. [sent-284, score-0.43]
94 Preliminary results show that this method is very promising: it shows leading aggregation accuracy and speed performance for a number of Middlebury data sets. [sent-285, score-0.448]
95 (a) the reference frame (b) disparity results computed by GF (c) disparity results computed by MST (d) disparity results computed by ST-2. [sent-294, score-1.173]
96 For both examples, ST-2 produces more accurate disparity results than MST and GF near depth borders. [sent-295, score-0.39]
97 Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. [sent-349, score-0.186]
98 On building an accurate stereo matching system on graphics hardware. [sent-374, score-0.186]
99 Classification and evaluation of cost aggregation methods for stereo correspondence. [sent-425, score-0.672]
100 A region based stereo matching algorithm using cooperative optimization. [sent-441, score-0.186]
wordName wordTfidf (topN-words)
[('mst', 0.555), ('aggregation', 0.448), ('disparity', 0.346), ('gf', 0.244), ('tree', 0.18), ('middlebury', 0.167), ('stereo', 0.152), ('subtrees', 0.15), ('st', 0.099), ('aggregated', 0.092), ('subtree', 0.088), ('edge', 0.081), ('disparities', 0.079), ('costs', 0.077), ('segment', 0.073), ('cost', 0.072), ('construction', 0.07), ('wta', 0.064), ('guided', 0.062), ('segmentation', 0.061), ('edges', 0.055), ('reference', 0.054), ('weights', 0.053), ('mattoccia', 0.053), ('erroneous', 0.051), ('node', 0.046), ('pixels', 0.045), ('electronic', 0.044), ('sets', 0.044), ('depth', 0.044), ('amortized', 0.043), ('danod', 0.043), ('flowerpots', 0.043), ('ilkay', 0.043), ('linking', 0.042), ('enhanced', 0.04), ('pages', 0.037), ('filter', 0.036), ('pixel', 0.036), ('arrival', 0.035), ('samsung', 0.035), ('yoon', 0.035), ('grouping', 0.035), ('merged', 0.034), ('regions', 0.034), ('matching', 0.034), ('hosni', 0.033), ('graph', 0.033), ('tombari', 0.032), ('stage', 0.031), ('root', 0.031), ('scan', 0.03), ('ref', 0.029), ('spanning', 0.028), ('leaf', 0.028), ('bleyer', 0.028), ('kitti', 0.028), ('pass', 0.028), ('support', 0.027), ('merge', 0.027), ('computed', 0.027), ('snapshots', 0.026), ('marked', 0.026), ('milliseconds', 0.026), ('geodesic', 0.026), ('constructed', 0.025), ('covers', 0.025), ('benchmark', 0.025), ('conceptually', 0.025), ('percentages', 0.025), ('connectivity', 0.024), ('traversing', 0.024), ('book', 0.024), ('borders', 0.024), ('connected', 0.023), ('updated', 0.023), ('perceptually', 0.023), ('weight', 0.023), ('quantitative', 0.023), ('receives', 0.023), ('running', 0.023), ('closely', 0.022), ('runs', 0.022), ('four', 0.022), ('greedily', 0.022), ('connects', 0.022), ('yang', 0.022), ('children', 0.021), ('seconds', 0.021), ('undirected', 0.021), ('nearly', 0.021), ('connecting', 0.021), ('structure', 0.021), ('segments', 0.021), ('correspondence', 0.02), ('neighboring', 0.02), ('treated', 0.02), ('built', 0.02), ('step', 0.02), ('nodes', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
Author: Xing Mei, Xun Sun, Weiming Dong, Haitao Wang, Xiaopeng Zhang
Abstract: This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, ”Segment-Tree ”, is proposed for non-local matching cost aggregation. Conceptually, the segment-tree is constructed in a three-step process: first, the pixels are grouped into a set of segments with the reference color or intensity image; second, a tree graph is created for each segment; and in the final step, these independent segment graphs are linked to form the segment-tree structure. In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. Compared to MST where the graph connectivity is determined with local edge weights, our method introduces some ’non-local’ decision rules: the pixels in one perceptually consistent segment are more likely to share similar disparities, and therefore their connectivity within the segment should be first enforced in the tree construction process. The matching costs are then aggregated over the tree within two passes. Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results.
2 0.28585866 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
Author: Ralf Haeusler, Rahul Nair, Daniel Kondermann
Abstract: With the aim to improve accuracy of stereo confidence measures, we apply the random decision forest framework to a large set of diverse stereo confidence measures. Learning and testing sets were drawnfrom the recently introduced KITTI dataset, which currently poses higher challenges to stereo solvers than other benchmarks with ground truth for stereo evaluation. We experiment with semi global matching stereo (SGM) and a census dataterm, which is the best performing realtime capable stereo method known to date. On KITTI images, SGM still produces a significant amount of error. We obtain consistently improved area under curve values of sparsification measures in comparison to best performing single stereo confidence measures where numbers of stereo errors are large. More specifically, our method performs best in all but one out of 194 frames of the KITTI dataset.
Author: Jiangbo Lu, Hongsheng Yang, Dongbo Min, Minh N. Do
Abstract: Though many tasks in computer vision can be formulated elegantly as pixel-labeling problems, a typical challenge discouraging such a discrete formulation is often due to computational efficiency. Recent studies on fast cost volume filtering based on efficient edge-aware filters have provided a fast alternative to solve discrete labeling problems, with the complexity independent of the support window size. However, these methods still have to step through the entire cost volume exhaustively, which makes the solution speed scale linearly with the label space size. When the label space is huge, which is often the case for (subpixelaccurate) stereo and optical flow estimation, their computational complexity becomes quickly unacceptable. Developed to search approximate nearest neighbors rapidly, the PatchMatch method can significantly reduce the complexity dependency on the search space size. But, its pixel-wise randomized search and fragmented data access within the 3D cost volume seriously hinder the application of efficient cost slice filtering. This paper presents a generic and fast computational framework for general multi-labeling problems called PatchMatch Filter (PMF). For the very first time, we explore effective and efficient strategies to weave together these two fundamental techniques developed in isolation, i.e., PatchMatch-based randomized search and efficient edge-aware image filtering. By decompositing an image into compact superpixels, we also propose superpixelbased novel search strategies that generalize and improve the original PatchMatch method. Focusing on dense correspondence field estimation in this paper, we demonstrate PMF’s applications in stereo and optical flow. Our PMF methods achieve state-of-the-art correspondence accuracy but run much faster than other competing methods, often giving over 10-times speedup for large label space cases.
4 0.21189629 431 cvpr-2013-The Variational Structure of Disparity and Regularization of 4D Light Fields
Author: Bastian Goldluecke, Sven Wanner
Abstract: Unlike traditional images which do not offer information for different directions of incident light, a light field is defined on ray space, and implicitly encodes scene geometry data in a rich structure which becomes visible on its epipolar plane images. In this work, we analyze regularization of light fields in variational frameworks and show that their variational structure is induced by disparity, which is in this context best understood as a vector field on epipolar plane image space. We derive differential constraints on this vector field to enable consistent disparity map regularization. Furthermore, we show how the disparity field is related to the regularization of more general vector-valued functions on the 4D ray space of the light field. This way, we derive an efficient variational framework with convex priors, which can serve as a fundament for a large class of inverse problems on ray space.
5 0.19704217 181 cvpr-2013-Fusing Depth from Defocus and Stereo with Coded Apertures
Author: Yuichi Takeda, Shinsaku Hiura, Kosuke Sato
Abstract: In this paper we propose a novel depth measurement method by fusing depth from defocus (DFD) and stereo. One of the problems of passive stereo method is the difficulty of finding correct correspondence between images when an object has a repetitive pattern or edges parallel to the epipolar line. On the other hand, the accuracy of DFD method is inherently limited by the effective diameter of the lens. Therefore, we propose the fusion of stereo method and DFD by giving different focus distances for left and right cameras of a stereo camera with coded apertures. Two types of depth cues, defocus and disparity, are naturally integrated by the magnification and phase shift of a single point spread function (PSF) per camera. In this paper we give the proof of the proportional relationship between the diameter of defocus and disparity which makes the calibration easy. We also show the outstanding performance of our method which has both advantages of two depth cues through simulation and actual experiments.
6 0.18122736 155 cvpr-2013-Exploiting the Power of Stereo Confidences
7 0.18002678 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
8 0.12969035 219 cvpr-2013-In Defense of 3D-Label Stereo
9 0.12951012 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
10 0.11200292 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
11 0.094596729 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation
12 0.092468105 232 cvpr-2013-Joint Geodesic Upsampling of Depth Images
13 0.089630231 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
14 0.087928981 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
15 0.087140642 394 cvpr-2013-Shading-Based Shape Refinement of RGB-D Images
16 0.082424864 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
17 0.079154886 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
18 0.075721115 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
19 0.075333714 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences
20 0.073947631 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D
topicId topicWeight
[(0, 0.154), (1, 0.099), (2, 0.063), (3, 0.041), (4, 0.031), (5, -0.033), (6, 0.005), (7, 0.051), (8, -0.042), (9, 0.03), (10, 0.028), (11, 0.063), (12, 0.135), (13, 0.075), (14, -0.088), (15, 0.07), (16, -0.171), (17, -0.185), (18, 0.129), (19, -0.068), (20, -0.096), (21, 0.122), (22, 0.187), (23, 0.102), (24, -0.035), (25, 0.006), (26, 0.019), (27, -0.073), (28, 0.068), (29, 0.077), (30, -0.018), (31, 0.084), (32, -0.084), (33, -0.03), (34, 0.016), (35, 0.02), (36, -0.028), (37, 0.007), (38, 0.029), (39, -0.003), (40, 0.039), (41, 0.053), (42, -0.005), (43, 0.036), (44, -0.003), (45, 0.022), (46, 0.018), (47, 0.049), (48, -0.048), (49, -0.093)]
simIndex simValue paperId paperTitle
same-paper 1 0.936643 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
Author: Xing Mei, Xun Sun, Weiming Dong, Haitao Wang, Xiaopeng Zhang
Abstract: This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, ”Segment-Tree ”, is proposed for non-local matching cost aggregation. Conceptually, the segment-tree is constructed in a three-step process: first, the pixels are grouped into a set of segments with the reference color or intensity image; second, a tree graph is created for each segment; and in the final step, these independent segment graphs are linked to form the segment-tree structure. In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. Compared to MST where the graph connectivity is determined with local edge weights, our method introduces some ’non-local’ decision rules: the pixels in one perceptually consistent segment are more likely to share similar disparities, and therefore their connectivity within the segment should be first enforced in the tree construction process. The matching costs are then aggregated over the tree within two passes. Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results.
2 0.90326935 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
Author: Ralf Haeusler, Rahul Nair, Daniel Kondermann
Abstract: With the aim to improve accuracy of stereo confidence measures, we apply the random decision forest framework to a large set of diverse stereo confidence measures. Learning and testing sets were drawnfrom the recently introduced KITTI dataset, which currently poses higher challenges to stereo solvers than other benchmarks with ground truth for stereo evaluation. We experiment with semi global matching stereo (SGM) and a census dataterm, which is the best performing realtime capable stereo method known to date. On KITTI images, SGM still produces a significant amount of error. We obtain consistently improved area under curve values of sparsification measures in comparison to best performing single stereo confidence measures where numbers of stereo errors are large. More specifically, our method performs best in all but one out of 194 frames of the KITTI dataset.
3 0.86072612 155 cvpr-2013-Exploiting the Power of Stereo Confidences
Author: David Pfeiffer, Stefan Gehrig, Nicolai Schneider
Abstract: Applications based on stereo vision are becoming increasingly common, ranging from gaming over robotics to driver assistance. While stereo algorithms have been investigated heavily both on the pixel and the application level, far less attention has been dedicated to the use of stereo confidence cues. Mostly, a threshold is applied to the confidence values for further processing, which is essentially a sparsified disparity map. This is straightforward but it does not take full advantage of the available information. In this paper, we make full use of the stereo confidence cues by propagating all confidence values along with the measured disparities in a Bayesian manner. Before using this information, a mapping from confidence values to disparity outlier probability rate is performed based on gathered disparity statistics from labeled video data. We present an extension of the so called Stixel World, a generic 3D intermediate representation that can serve as input for many of the applications mentioned above. This scheme is modified to directly exploit stereo confidence cues in the underlying sensor model during a maximum a poste- riori estimation process. The effectiveness of this step is verified in an in-depth evaluation on a large real-world traffic data base of which parts are made publicly available. We show that using stereo confidence cues allows both reducing the number of false object detections by a factor of six while keeping the detection rate at a near constant level.
4 0.72757363 181 cvpr-2013-Fusing Depth from Defocus and Stereo with Coded Apertures
Author: Yuichi Takeda, Shinsaku Hiura, Kosuke Sato
Abstract: In this paper we propose a novel depth measurement method by fusing depth from defocus (DFD) and stereo. One of the problems of passive stereo method is the difficulty of finding correct correspondence between images when an object has a repetitive pattern or edges parallel to the epipolar line. On the other hand, the accuracy of DFD method is inherently limited by the effective diameter of the lens. Therefore, we propose the fusion of stereo method and DFD by giving different focus distances for left and right cameras of a stereo camera with coded apertures. Two types of depth cues, defocus and disparity, are naturally integrated by the magnification and phase shift of a single point spread function (PSF) per camera. In this paper we give the proof of the proportional relationship between the diameter of defocus and disparity which makes the calibration easy. We also show the outstanding performance of our method which has both advantages of two depth cues through simulation and actual experiments.
5 0.6864357 219 cvpr-2013-In Defense of 3D-Label Stereo
Author: Carl Olsson, Johannes Ulén, Yuri Boykov
Abstract: It is commonly believed that higher order smoothness should be modeled using higher order interactions. For example, 2nd order derivatives for deformable (active) contours are represented by triple cliques. Similarly, the 2nd order regularization methods in stereo predominantly use MRF models with scalar (1D) disparity labels and triple clique interactions. In this paper we advocate a largely overlooked alternative approach to stereo where 2nd order surface smoothness is represented by pairwise interactions with 3D-labels, e.g. tangent planes. This general paradigm has been criticized due to perceived computational complexity of optimization in higher-dimensional label space. Contrary to popular beliefs, we demonstrate that representing 2nd order surface smoothness with 3D labels leads to simpler optimization problems with (nearly) submodular pairwise interactions. Our theoretical and experimental re- sults demonstrate advantages over state-of-the-art methods for 2nd order smoothness stereo. 1
7 0.58448565 431 cvpr-2013-The Variational Structure of Disparity and Regularization of 4D Light Fields
8 0.49669787 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation
9 0.48691058 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
10 0.46671063 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
11 0.41617393 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
12 0.40749171 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences
13 0.36482149 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
14 0.35893887 112 cvpr-2013-Dense Segmentation-Aware Descriptors
15 0.3493689 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
16 0.34834471 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
17 0.34305388 39 cvpr-2013-Alternating Decision Forests
18 0.33403927 190 cvpr-2013-Graph-Based Optimization with Tubularity Markov Tree for 3D Vessel Segmentation
19 0.33157724 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
20 0.3186233 21 cvpr-2013-A New Perspective on Uncalibrated Photometric Stereo
topicId topicWeight
[(10, 0.115), (16, 0.119), (24, 0.166), (26, 0.042), (33, 0.245), (67, 0.052), (69, 0.043), (87, 0.109)]
simIndex simValue paperId paperTitle
1 0.90224481 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video
Author: Shaodi You, Robby T. Tan, Rei Kawakami, Katsushi Ikeuchi
Abstract: Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatiotemporal derivatives ofraindrops. First, it detects raindrops based on the motion and the intensity temporal derivatives of the input video. Second, relying on an analysis that some areas of a raindrop completely occludes the scene, yet the remaining areas occludes only partially, the method removes the two types of areas separately. For partially occluding areas, it restores them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity change. For completely occluding areas, it recovers them by using a video completion technique. Experimental results using various real videos show the effectiveness of the proposed method.
2 0.86579227 27 cvpr-2013-A Theory of Refractive Photo-Light-Path Triangulation
Author: Visesh Chari, Peter Sturm
Abstract: 3D reconstruction of transparent refractive objects like a plastic bottle is challenging: they lack appearance related visual cues and merely reflect and refract light from the surrounding environment. Amongst several approaches to reconstruct such objects, the seminal work of Light-Path triangulation [17] is highly popular because of its general applicability and analysis of minimal scenarios. A lightpath is defined as the piece-wise linear path taken by a ray of light as it passes from source, through the object and into the camera. Transparent refractive objects not only affect the geometric configuration of light-paths but also their radiometric properties. In this paper, we describe a method that combines both geometric and radiometric information to do reconstruction. We show two major consequences of the addition of radiometric cues to the light-path setup. Firstly, we extend the case of scenarios in which reconstruction is plausible while reducing the minimal re- quirements for a unique reconstruction. This happens as a consequence of the fact that radiometric cues add an additional known variable to the already existing system of equations. Secondly, we present a simple algorithm for reconstruction, owing to the nature of the radiometric cue. We present several synthetic experiments to validate our theories, and show high quality reconstructions in challenging scenarios.
same-paper 3 0.8645997 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
Author: Xing Mei, Xun Sun, Weiming Dong, Haitao Wang, Xiaopeng Zhang
Abstract: This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, ”Segment-Tree ”, is proposed for non-local matching cost aggregation. Conceptually, the segment-tree is constructed in a three-step process: first, the pixels are grouped into a set of segments with the reference color or intensity image; second, a tree graph is created for each segment; and in the final step, these independent segment graphs are linked to form the segment-tree structure. In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. Compared to MST where the graph connectivity is determined with local edge weights, our method introduces some ’non-local’ decision rules: the pixels in one perceptually consistent segment are more likely to share similar disparities, and therefore their connectivity within the segment should be first enforced in the tree construction process. The matching costs are then aggregated over the tree within two passes. Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results.
4 0.85992795 118 cvpr-2013-Detecting Pulse from Head Motions in Video
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
5 0.85203838 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
6 0.84915233 271 cvpr-2013-Locally Aligned Feature Transforms across Views
7 0.84718519 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
8 0.8458069 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation
9 0.84489232 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
10 0.8433795 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
12 0.83918399 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
13 0.83531821 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction
14 0.83501399 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
15 0.83431196 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
16 0.83369911 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D
17 0.83319741 303 cvpr-2013-Multi-view Photometric Stereo with Spatially Varying Isotropic Materials
18 0.83255017 155 cvpr-2013-Exploiting the Power of Stereo Confidences
19 0.83233505 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
20 0.8314625 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms