Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer

Abstract: In this work we exploit segmentation to construct appearance descriptors that can robustly deal with occlusion and background changes. For this, we downplay measurements coming from areas that are unlikely to belong to the same region as the descriptor’s center, as suggested by soft segmentation masks. Our treatment is applicable to any image point, i.e. dense, and its computational overhead is in the order of a few seconds. We integrate this idea with Dense SIFT, and also with Dense Scale and Rotation Invariant Descriptors (SID), delivering descriptors that are densely computable, invariant to scaling and rotation, and robust to background changes. We apply our approach to standard benchmarks on large displacement motion estimation using SIFT-flow and widebaseline stereo, systematically demonstrating that the introduction of segmentation yields clear improvements.

[4] A. Berg and J. Malik. Geometric blur for template matching. In CVPR, 2001. 222888999644 Left image Ground truth depth Daisy, 1iter. Daisy, 5 iter. SSID, 'Eigen' Figure10.Firstcolumn:Theimagesonthel ft(Images5and7of[31])arematchedag instImage3of[31],shownisFig.2,whic serves as the the 'Right image'), for an increasing baseline. Second column: ground truth depth maps of [31]. Third and fourth columns: first and fifth iteration of the Daisy stereo algorithm. Fifth column: single shot depth estimation with SSID and 'Eigen' embeddings. The occlusion estimates for the first Daisy iteration may seem aggressive, but allow the algorithm to converge. Higher occlusion costs induce errors in the initial estimate and degrade the final accuracy.

