cvpr cvpr2013 cvpr2013-112 knowledge-graph by maker-knowledge-mining

112 cvpr-2013-Dense Segmentation-Aware Descriptors

Source: pdf

Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer

Abstract: In this work we exploit segmentation to construct appearance descriptors that can robustly deal with occlusion and background changes. For this, we downplay measurements coming from areas that are unlikely to belong to the same region as the descriptor’s center, as suggested by soft segmentation masks. Our treatment is applicable to any image point, i.e. dense, and its computational overhead is in the order of a few seconds. We integrate this idea with Dense SIFT, and also with Dense Scale and Rotation Invariant Descriptors (SID), delivering descriptors that are densely computable, invariant to scaling and rotation, and robust to background changes. We apply our approach to standard benchmarks on large displacement motion estimation using SIFT-flow and widebaseline stereo, systematically demonstrating that the introduction of segmentation yields clear improvements.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Alberto Sanfeliu, Francesc Moreno-Noguer Institut de Rob o`tica i Inform` atica Industrial Barcelona, Spain Abstract In this work we exploit segmentation to construct appearance descriptors that can robustly deal with occlusion and background changes. [sent-5, score-0.375]

2 For this, we downplay measurements coming from areas that are unlikely to belong to the same region as the descriptor’s center, as suggested by soft segmentation masks. [sent-6, score-0.298]

3 We integrate this idea with Dense SIFT, and also with Dense Scale and Rotation Invariant Descriptors (SID), delivering descriptors that are densely computable, invariant to scaling and rotation, and robust to background changes. [sent-10, score-0.288]

4 We apply our approach to standard benchmarks on large displacement motion estimation using SIFT-flow and widebaseline stereo, systematically demonstrating that the introduction of segmentation yields clear improvements. [sent-11, score-0.14]

5 A different thread of works, such as Daisy [32], dense SIFT [34] or the dense Scale-Invariant Descriptors [13] have demonstrated that it is possible to efficiently compute descriptors densely, i. [sent-14, score-0.4]

6 We exploit segmentation to construct appearance descriptors that are robust to background motion and/or occlusions. [sent-23, score-0.291]

7 (2) RGB encoding of the first three soft segmentation masks of [16]. [sent-25, score-0.357]

8 (3) Segmentation-based affinity between x and the whole image (as per Eq. [sent-26, score-0.078]

9 (5) RGB encoding of first three principal components of dense SIFT. [sent-29, score-0.13]

10 (6) Same as (5), but using the affinity mask in (4). [sent-30, score-0.104]

11 We obtain similar results by applying this technique to the SID descriptors of [14]. [sent-32, score-0.14]

12 Some recent advances to address this problem include the treatment of scale- and/or rotation-invariance in [14, 13, 11, 27] as well as invariance to non-rigid deformations in [17, 22]. [sent-34, score-0.112]

13 Our main contribution in this work is a new approach to suppress background information during descriptor construction. [sent-39, score-0.165]

14 For this we use soft segmentation masks to compute the affinity of a point with its neighbors, and shun the information coming from regions which are likely to belong to other objects. [sent-40, score-0.499]

15 We extract soft segmentation masks before descriptor construction, using either Normalized Cut eigenvectors [28, 20], or the Global Boundary masks of [16], with the latter coming with a minimal computational overhead. [sent-41, score-0.707]

16 We combine this scheme with dense SIFT, and the dense Scaleand Rotation-Invariant Descriptor (SID) extraction of [13], thereby constructing a descriptor that is dense, invariant to rotations, scaling, and occlusions. [sent-42, score-0.432]

17 We demonstrate increased performance with respect to state-of-the art appearance descriptors: dense SIFT, dense SID, and the dense Scale-Invariant descriptors of [11]. [sent-44, score-0.53]

18 A complementary research direction that started from Daisy [32] and dense SIFT [34] is to extract dense image descriptors. [sent-55, score-0.26]

19 This is motived both by experimental evidence that dense sampling of descriptors yields better performance in Bag-of-Words classification systems [23], but also from applications such as dense stereo matching which require dense features. [sent-56, score-0.624]

20 Regarding scale, the standard approach to accommodate scale changes is scale selection [19], which however is only applicable to singular points where scale can be reliably estimated. [sent-59, score-0.084]

21 An alternative that allows to compute scaleinvariant descriptors densely is the Scale- and rotationInvariant Descriptor (SID) of [14], which exploits a combination of logarithmic sampling and multi-scale signal processing to obtain scale- and rotation-invariance. [sent-60, score-0.214]

22 To achieve this the image is sampled over a log-polar grid, which turns image rotation and scaling into descriptor translations. [sent-61, score-0.179]

23 The principles of Daisy were recently used in [13] to efficiently compute dense SIDs. [sent-63, score-0.13]

24 A more recent work on scale-invariant descriptors is the Scale-Less SIFT (SLS) of Hassner et al [11]. [sent-64, score-0.14]

25 Their approach is to compute a set of SIFT descriptors at different scales, and then project these into an invariant lowdimensional subspace that ellicits the scale-invariant as- pects of these descriptors. [sent-65, score-0.186]

26 This descriptor comes at an increased computational price, and is not rotation-invariant by design, but gives clearly better results than dense SIFT in the presence of scaling transformations. [sent-66, score-0.282]

27 We include also this state-of-the-art descriptor in our multi-layered motion benchmarks. [sent-67, score-0.162]

28 In [24], an implicit color segmentation of objects into foreground and background was used to augment histograms of gradients for people detection. [sent-69, score-0.115]

29 The Daisy paper of [32] demonstrated clear performance improvements in multi-view stereo from treating occlusion as a latent variable and enforcing spatial consistency with Graph Cuts [5]. [sent-70, score-0.187]

30 To deal with occlusions, a predefined set of binary masks was applied over the Daisy grid coordinates, effectively disabling half the grid at different orientations— the descriptor being a ‘half moon’ instead of the full circle. [sent-71, score-0.369]

31 These masks are applied iteratively, interleaved with successive rounds of stereo matching, yielding increasingly refined depth estimates. [sent-72, score-0.325]

32 Soft segmentations Our goal is to construct appearance descriptors that are not only local, but also contained within a single surface/object (‘region’ from now on). [sent-79, score-0.173]

33 We therefore turn to algorithms that do not strongly commit to a single segmentation, but rather determine the affinity of a pixel to its neighbors in a soft manner. [sent-86, score-0.198]

34 This soft affinity information is then incorporated into descriptor construction. [sent-87, score-0.324]

35 We explore two different approaches to extracting such soft segmentations. [sent-88, score-0.12]

36 First, we use the approach of Maire et al [20]; in brief, [20] combines multiple cues to estimate a probability of boundary cue Pbσ (x, y, θ), which is then used to estimate a boundary-based affinity using the ‘intervening contour’ technique of [28]. [sent-89, score-0.078]

37 The affinity of two pixels is computed as the euclidean distance of their respective embeddings. [sent-92, score-0.078]

38 We also use the soft segmentation masks of Leordeanu et al [16]—there the authors use local color models constructed around each pixel to construct a large set of figure/ground segmentations. [sent-93, score-0.357]

39 For simplicity, we refer to the eigenvector embeddings of [20] as ‘Eigen’, and to the soft segmentation masks of [16] as ‘SoftMask’ . [sent-96, score-0.681]

40 2 shows the first three coordinates of the ‘Eigen’/‘SoftMask’ embeddings as an RGB image. [sent-98, score-0.324]

41 Note that the embeddings from Gb have higher granularity, Image Pb embeddings Gb embeddings Figure2. [sent-99, score-0.972]

42 Softsegmentaioncues:WeshowasRGBmapsthefirst three coordinates of the ‘Eigen’ embeddings (middle column) and the ‘SoftMask’ embeddings (right column). [sent-100, score-0.648]

43 Descriptor construction We now describe how the pixel embeddings described above can be used to render local descriptors robust to background changes and/or occlusions. [sent-104, score-0.534]

44 Our technique is equally well applicable to dense SIFT [34], Daisy [32] and dense SID descriptors [13]; we focus on SID, as it allows us to also × achieve scale- and rotation- invariance, but later on will report results for dense SIFT as well. [sent-105, score-0.53]

45 We start with a brief introduction of the SID descriptor: the log-polar sampling technique of [14, 13] allows us to densely compute scale- and rotation-invariant features through the Fourier Transform Modulus/FourierMelin transform technique. [sent-107, score-0.118]

46 The measurements on those points are obtained after smoothing the image by Gaussian filters whose scale σn increases for larger radii, and extract image derivatives at 4 orientations and two polarities, using steering to have measurements aligned with the angle directions. [sent-109, score-0.141]

47 We will refer to the scale- and rotation-invariant descriptor as SID and to the scale-invariant but rotationsensitive descriptor as SID-Rot. [sent-114, score-0.252]

48 Having provided the outline of SID, we now proceed to describe how we combine soft segmentation masks with it. [sent-115, score-0.357]

49 Using the embeddings described in the previous subsection, we have an embedding of every pixel into a space where euclidean distances indicate how likely it is that two pixels will belong to the same region. [sent-116, score-0.354]

50 When constructing a descriptor around a point x we construct the affinity w[i] between x and every other point on its grid, G[i] (x) , i = 1. [sent-117, score-0.204]

51 We then multiply these weights w[i] ∈ [0, 1] with the measurements extracted around each grid point: D? [sent-129, score-0.079]

52 Multiplying by these weights effectively shuns measurements which come from the background (occluders, background planes, other objects). [sent-134, score-0.116]

53 As such, the descriptor extracted around a point is affected only by points belonging to the same region with and remains robust to background changes. [sent-135, score-0.165]

54 As our results indicate, this particularily simple modification yields noticeable improvements in performance. [sent-136, score-0.092]

55 We explore the use of both of the embeddings described in 3. [sent-139, score-0.324]

56 1, and several dense descriptors (SID, Segmentation-aware SID, dense SIFT, Segmentation-aware dense SIFT, SLS). [sent-140, score-0.53]

57 For SID construction we use the implementation of [13], which adopts Daisy to compute dense features. [sent-142, score-0.161]

58 We exploit the symmetry of the FTM to discard two quadrants, as well as the DC component, which is affected by additive lighting changes, and we normalize the resulting descriptor to have unit L2 norm. [sent-146, score-0.126]

59 The size of the descriptor is 3328 for SID and 3360 for SID-Rot. [sent-147, score-0.126]

60 This dataset contains 10 sequences of outdoor traffic taken with a handheld camera, three sequences of people in movement, and 13 sequences from the TV series Miss Marple. [sent-152, score-0.119]

61 The dataset provides ground truth segmentation masks for a subset of frames in every sequence, roughly in one out of ten frames. [sent-154, score-0.237]

62 We evaluate SSID with ‘Eigen ’ and ‘SoftMask’ embeddings against: Dense SIFT (DSIFT) [34], SLS and SID. [sent-155, score-0.324]

63 We use SLS both in its original form and a PCA variant made publicly available by the authors: we refer to them as SLSpaper and SLS-PCA—a SLS is size 8256, whereas its PCA variant is size 528. [sent-156, score-0.118]

64 For all the SID-based descriptors we also consider the rotation-sensitive version SID-Rot. [sent-157, score-0.14]

65 We use the 10 traffic sequences, pairing the first frame with all successive frames for which we have ground truth segmentation masks, which yields 31 frame pairs. [sent-159, score-0.248]

66 To evaluate each descriptor we use the flow estimates to warp the segmentation mask for the second frame over the first, and compute its overlap with the ground truth using the Dice coefficient [9]. [sent-162, score-0.405]

67 We observe that the rotation-sensitive variants do better, which is to be expected since the foreground elements do not contain many rotations, and discarding rotations im222888999311 Overlap (accumulated), SID/SSID descriptors Frame difference Figure 3. [sent-169, score-0.199]

68 The results are accumulated, so the first bin includes all frame pairs, and the second bin includes frame pairs with a displacement of 20 or more frames. [sent-171, score-0.208]

69 Each bin shows the average overlap between all the frame pairs under consideration. [sent-172, score-0.151]

70 4 shows the best results obtained from our approach against the other dense descriptors. [sent-177, score-0.13]

71 The best overall results are obtained by SSID-Rot with ‘SoftMask’ embeddings, followed by the same descriptor with ‘Eigen’ embeddings—note that × the ‘SoftMask’ variant does better despite its drastically reduced computational cost. [sent-178, score-0.171]

72 Segmentation-aware SIFT The application of soft segmentation masks over SID is of particular interest because it alleviates its main shortcoming—the requirement of a large patch. [sent-184, score-0.357]

73 We extend the formulation to SIFT’s 4 4 grid, using the ‘SoftMask’ embeddings wonh itcoh S give us consistently nbget ttehre r ‘eSsuolfttsM wasikth’ SSID, and repeat the experiments over the Moseg dataset. [sent-186, score-0.324]

74 The gains are systematic, but as expected the optimal λ is strongly correlated to the descriptor size. [sent-189, score-0.126]

75 Overlap (accumulated), all descriptors Frame difference × Figure 4. [sent-193, score-0.166]

76 Overlap results over the Moseg dataset for all the dense descriptors considered. [sent-194, score-0.27]

77 We compute depth maps using this stereo algorithm and evaluate the error on every visible pixel using the ground truth visibility maps from [3 1]—note that this does not account for occlusion. [sent-207, score-0.14]

78 Note that for descriptors other than SID we align the descriptors with the epipolar lines, to enforce rotation invariance [32]. [sent-209, score-0.353]

79 Our segmentationaware descriptors outperform the others except for SLS— but again we do not need to rotate the patch. [sent-213, score-0.216]

80 Most of Daisy’s performance gains on wide-baseline stereo stem from its handling of occlusions, which are not taken into account in the previous experiment. [sent-214, score-0.094]

81 The Daisy stereo algorithm introduces an additional depth layer with 222888999422 First imageSecond imageDSIFTSLS-PCASID-RotSSID-Rot, ‘SoftMask’ Figure5. [sent-215, score-0.14]

82 The ground truth segmentation masks of image 1 are overlaid in red (a good registration should bring the object in alignment with the segmentation mask). [sent-218, score-0.313]

83 a fixed cost, to account for occlusions, and exploits binary masks in an iterative process to refine the depth estimate (see 2). [sent-223, score-0.207]

84 Note that the occlusion cost is a nuisance parameter: it can vary from one image set to another, or across different baselines of the same set, and it has a drastic effect on the number of pixels marked as occluded. [sent-224, score-0.098]

85 Increase in average overlap using our approach on DSIFT, from white (no difference in overlap) to red (largest increase in overlap, which is 0. [sent-226, score-0.119]

86 We run the Daisy stereo algorithm for 5 iterations, and plot the results on Fig. [sent-231, score-0.094]

87 For this figure in particular we do not consider an occlusion layer, and do not use masks for Daisy. [sent-235, score-0.212]

88 formance of SSID with ‘Eigen’ embeddings is comparable of superior to that of Daisy on most baselines—we achieve this on a single step, and without relying on the calibration data to rotate the patch. [sent-239, score-0.374]

89 (2) on the motion experiments and do not retune it for the stereo experiments. [sent-241, score-0.13]

90 Figure 10 displays the depth estimates at two different baselines (image pairs 5-3 and 7-3)—the reference frame (3) is that on the last row of Fig. [sent-242, score-0.154]

91 Computational requirements The cost of computing DSIFT descriptors [34] for an image of size 320 240 is under 1 second (MATLAB/C++ acogdee o). [sent-246, score-0.14]

92 Note that for all the experiments in this paper we compute the ‘Eigen’/‘SoftMask’ embeddings at the original resolution (e. [sent-250, score-0.324]

93 k 6’ embeddings (oMreA dTowLAnsBc)a require ∼7 seconds, and the ‘Eigen’ embeddings (MATLAB/C hybrid) ∼280 seconds. [sent-254, score-0.648]

94 Conclusions and future work This paper presents a novel strategy to dealing with background motion and occlusions by incorporating soft segmentations into the construction of appearance descriptors. [sent-257, score-0.3]

95 We have applied this idea to different dense descriptors, and with different methods of computing the soft segmentations, demonstrating clear improvements in all cases. [sent-258, score-0.292]

96 1, but the effect of inter-class variability on our soft segmentations remains in question. [sent-266, score-0.153]

97 Third and fourth columns: first and fifth iteration of the Daisy stereo algorithm. [sent-300, score-0.12]

98 On benchmarking camera calibration and multi-view stereo for high resolution imagery. [sent-470, score-0.118]

99 Daisy: An efficient dense descriptor applied to wide-baseline stereo. [sent-476, score-0.256]

100 A benchmark for the comparison of 3-d motion segmentation algorithms. [sent-482, score-0.112]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sid', 0.375), ('daisy', 0.353), ('embeddings', 0.324), ('ssid', 0.309), ('softmask', 0.274), ('eigen', 0.182), ('sls', 0.177), ('masks', 0.161), ('descriptors', 0.14), ('dsift', 0.133), ('dense', 0.13), ('descriptor', 0.126), ('sdsift', 0.125), ('soft', 0.12), ('sift', 0.119), ('moseg', 0.111), ('stereo', 0.094), ('affinity', 0.078), ('segmentation', 0.076), ('kokkinos', 0.064), ('fourier', 0.063), ('overlap', 0.061), ('frame', 0.061), ('occlusion', 0.051), ('ftm', 0.05), ('iasonas', 0.05), ('particularily', 0.05), ('segmentationaware', 0.05), ('trulls', 0.05), ('strecha', 0.049), ('baselines', 0.047), ('invariant', 0.046), ('invariance', 0.046), ('depth', 0.046), ('variant', 0.045), ('atica', 0.044), ('centrale', 0.044), ('bronstein', 0.044), ('improvements', 0.042), ('brief', 0.042), ('accumulated', 0.042), ('occlusions', 0.041), ('grid', 0.041), ('lepetit', 0.04), ('transform', 0.039), ('background', 0.039), ('tica', 0.039), ('measurements', 0.038), ('densely', 0.037), ('scaleinvariant', 0.037), ('treatment', 0.037), ('derivatives', 0.037), ('motion', 0.036), ('unaffected', 0.035), ('orb', 0.035), ('rob', 0.034), ('ecole', 0.034), ('institut', 0.034), ('coming', 0.034), ('segmentations', 0.033), ('inform', 0.033), ('calonder', 0.033), ('rotations', 0.033), ('increase', 0.032), ('sequences', 0.031), ('barcelona', 0.031), ('spain', 0.031), ('construction', 0.031), ('belong', 0.03), ('amenable', 0.03), ('hassner', 0.03), ('deformations', 0.029), ('leordeanu', 0.029), ('eigenmaps', 0.029), ('bin', 0.029), ('eigenvectors', 0.029), ('industrial', 0.028), ('pca', 0.028), ('displacement', 0.028), ('flow', 0.028), ('scale', 0.028), ('publicly', 0.028), ('vlfeat', 0.027), ('france', 0.027), ('warp', 0.027), ('rotation', 0.027), ('difference', 0.026), ('scaling', 0.026), ('mask', 0.026), ('rotate', 0.026), ('cuts', 0.026), ('traffic', 0.026), ('fifth', 0.026), ('translations', 0.025), ('overhead', 0.025), ('de', 0.025), ('seconds', 0.025), ('successive', 0.024), ('calibration', 0.024), ('closer', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 112 cvpr-2013-Dense Segmentation-Aware Descriptors

Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer

2 0.11219945 234 cvpr-2013-Joint Spectral Correspondence for Disparate Image Matching

Author: Mayank Bansal, Kostas Daniilidis

Abstract: We address the problem of matching images with disparate appearance arising from factors like dramatic illumination (day vs. night), age (historic vs. new) and rendering style differences. The lack of local intensity or gradient patterns in these images makes the application of pixellevel descriptors like SIFT infeasible. We propose a novel formulation for detecting and matching persistent features between such images by analyzing the eigen-spectrum of the joint image graph constructed from all the pixels in the two images. We show experimental results of our approach on a public dataset of challenging image pairs and demonstrate significant performance improvements over state-of-the-art.

3 0.11101211 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences

Author: Jaechul Kim, Ce Liu, Fei Sha, Kristen Grauman

Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents—ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the “deformable ” aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on LabelMe and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and PatchMatch [2]), both in terms of accuracy and run time.

4 0.10406834 241 cvpr-2013-Label-Embedding for Attribute-Based Classification

Author: Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid

Abstract: Attributes are an intermediate representation, which enables parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function which measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. The label embedding framework offers other advantages such as the ability to leverage alternative sources of information in addition to attributes (e.g. class hierarchies) or to transition smoothly from zero-shot learning to learning with large quantities of data.

5 0.10302795 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

Author: Karl Pauwels, Leonardo Rubio, Javier Díaz, Eduardo Ros

Abstract: We propose a novel model-based method for estimating and tracking the six-degrees-of-freedom (6DOF) pose of rigid objects of arbitrary shapes in real-time. By combining dense motion and stereo cues with sparse keypoint correspondences, and by feeding back information from the model to the cue extraction level, the method is both highly accurate and robust to noise and occlusions. A tight integration of the graphical and computational capability of Graphics Processing Units (GPUs) results in pose updates at framerates exceeding 60 Hz. Since a benchmark dataset that enables the evaluation of stereo-vision-based pose estimators in complex scenarios is currently missing in the literature, we have introduced a novel synthetic benchmark dataset with varying objects, background motion, noise and occlusions. Using this dataset and a novel evaluation methodology, we show that the proposed method greatly outperforms state-of-the-art methods. Finally, we demonstrate excellent performance on challenging real-world sequences involving object manipulation.

6 0.093143255 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition

7 0.08951401 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition

8 0.08353176 205 cvpr-2013-Hollywood 3D: Recognizing Actions in 3D Natural Scenes

9 0.078800596 69 cvpr-2013-Boosting Binary Keypoint Descriptors

10 0.077479891 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

11 0.077389888 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision

12 0.077381663 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera

13 0.076084279 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos

14 0.075596236 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors

15 0.074410476 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video

16 0.073577464 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation

17 0.072668314 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification

18 0.072433077 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras

19 0.070614517 334 cvpr-2013-Pose from Flow and Flow from Pose

20 0.070528485 130 cvpr-2013-Discriminative Color Descriptors

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.181), (1, 0.067), (2, 0.021), (3, 0.001), (4, 0.011), (5, -0.019), (6, -0.005), (7, -0.014), (8, -0.055), (9, -0.001), (10, 0.004), (11, 0.009), (12, 0.076), (13, 0.017), (14, 0.099), (15, -0.032), (16, -0.049), (17, -0.061), (18, 0.03), (19, 0.041), (20, 0.045), (21, -0.007), (22, 0.055), (23, -0.017), (24, 0.019), (25, 0.012), (26, 0.016), (27, 0.049), (28, -0.03), (29, -0.045), (30, 0.039), (31, 0.02), (32, -0.011), (33, -0.021), (34, -0.035), (35, 0.027), (36, 0.035), (37, 0.086), (38, 0.062), (39, 0.067), (40, -0.026), (41, 0.055), (42, 0.001), (43, -0.027), (44, -0.033), (45, -0.058), (46, -0.055), (47, 0.074), (48, -0.006), (49, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95415491 112 cvpr-2013-Dense Segmentation-Aware Descriptors

Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer

2 0.69933796 234 cvpr-2013-Joint Spectral Correspondence for Disparate Image Matching

Author: Mayank Bansal, Kostas Daniilidis

3 0.69569665 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs

Author: Armand Joulin, Sing Bing Kang

Abstract: An anaglyph is a single image created by selecting complementary colors from a stereo color pair; the user can perceive depth by viewing it through color-filtered glasses. We propose a technique to reconstruct the original color stereo pair given such an anaglyph. We modified SIFT-Flow and use it to initially match the different color channels across the two views. Our technique then iteratively refines the matches, selects the good matches (which defines the “anchor” colors), and propagates the anchor colors. We use a diffusion-based technique for the color propagation, and added a step to suppress unwanted colors. Results on a variety of inputs demonstrate the robustness of our technique. We also extended our method to anaglyph videos by using optic flow between time frames.

4 0.63091016 130 cvpr-2013-Discriminative Color Descriptors

Author: Rahat Khan, Joost van_de_Weijer, Fahad Shahbaz Khan, Damien Muselet, Christophe Ducottet, Cecile Barat

Abstract: Color description is a challenging task because of large variations in RGB values which occur due to scene accidental events, such as shadows, shading, specularities, illuminant color changes, and changes in viewing geometry. Traditionally, this challenge has been addressed by capturing the variations in physics-basedmodels, and deriving invariants for the undesired variations. The drawback of this approach is that sets of distinguishable colors in the original color space are mapped to the same value in the photometric invariant space. This results in a drop of discriminative power of the color description. In this paper we take an information theoretic approach to color description. We cluster color values together based on their discriminative power in a classification problem. The clustering has the explicit objective to minimize the drop of mutual information of the final representation. We show that such a color description automatically learns a certain degree of photometric invariance. We also show that a universal color representation, which is based on other data sets than the one at hand, can obtain competing performance. Experiments show that the proposed descriptor outperforms existing photometric invariants. Furthermore, we show that combined with shape description these color descriptors obtain excellent results on four challenging datasets, namely, PASCAL VOC 2007, Flowers-102, Stanford dogs-120 and Birds-200.

5 0.61899799 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition

Author: Ivo Everts, Jan C. van_Gemert, Theo Gevers

Abstract: This paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. These issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. This enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.

6 0.61639428 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences

7 0.60789192 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos

8 0.59902912 69 cvpr-2013-Boosting Binary Keypoint Descriptors

9 0.57719731 210 cvpr-2013-Illumination Estimation Based on Bilayer Sparse Coding

10 0.57612759 140 cvpr-2013-Efficient Color Boundary Detection with Color-Opponent Mechanisms

11 0.57585925 437 cvpr-2013-Towards Fast and Accurate Segmentation

12 0.57152468 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation

13 0.57108003 38 cvpr-2013-All About VLAD

14 0.57003182 10 cvpr-2013-A Fully-Connected Layered Model of Foreground and Background Flow

15 0.56144714 369 cvpr-2013-Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination

16 0.55730343 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization

17 0.55710894 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition

18 0.55146432 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition

19 0.54598159 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images

20 0.54535681 162 cvpr-2013-FasT-Match: Fast Affine Template Matching

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.119), (16, 0.031), (26, 0.056), (33, 0.255), (59, 0.263), (67, 0.047), (69, 0.041), (77, 0.011), (87, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90755606 11 cvpr-2013-A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles

Author: Dror Sholomon, Omid David, Nathan S. Netanyahu

Abstract: In thispaper wepropose thefirst effective automated, genetic algorithm (GA)-based jigsaw puzzle solver. We introduce a novel procedure of merging two ”parent” solutions to an improved ”child” solution by detecting, extracting, and combining correctly assembled puzzle segments. The solver proposed exhibits state-of-the-art performance solving previously attempted puzzles faster and far more accurately, and also puzzles of size never before attempted. Other contributions include the creation of a benchmark of large images, previously unavailable. We share the data sets and all of our results for future testing and comparative evaluation of jigsaw puzzle solvers.

2 0.84550196 276 cvpr-2013-MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification

Author: Amr Bakry, Ahmed Elgammal

Abstract: Visual speech recognition is a challenging problem, due to confusion between visual speech features. The speaker identification problem is usually coupled with speech recognition. Moreover, speaker identification is important to several applications, such as automatic access control, biometrics, authentication, and personal privacy issues. In this paper, we propose a novel approach for lipreading and speaker identification. Wepropose a new approachfor manifold parameterization in a low-dimensional latent space, where each manifold is represented as a point in that space. We initially parameterize each instance manifold using a nonlinear mapping from a unified manifold representation. We then factorize the parameter space using Kernel Partial Least Squares (KPLS) to achieve a low-dimension manifold latent space. We use two-way projections to achieve two manifold latent spaces, one for the speech content and one for the speaker. We apply our approach on two public databases: AVLetters and OuluVS. We show the results for three different settings of lipreading: speaker independent, speaker dependent, and speaker semi-dependent. Our approach outperforms for the speaker semi-dependent setting by at least 15% of the baseline, and competes in the other two settings.

3 0.82888842 316 cvpr-2013-Optical Flow Estimation Using Laplacian Mesh Energy

Author: Wenbin Li, Darren Cosker, Matthew Brown, Rui Tang

Abstract: In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. The algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst simultaneously preserving non-rigid deformation. Laplacian deformation approaches have become popular in graphics research as they enable mesh deformations to preserve local surface shape. In this work we propose a novel Laplacian Mesh Energy formula to ensure such sensible local deformations between image pairs. We express this wholly within the optical flow optimization, and show its application in a novel coarse-to-fine pyramidal approach. Our algorithm achieves the state-of-the-art performance in all trials on the Garg et al. dataset, and top tier performance on the Middlebury evaluation.

4 0.82702047 298 cvpr-2013-Multi-scale Curve Detection on Surfaces

Author: Michael Kolomenkin, Ilan Shimshoni, Ayellet Tal

Abstract: This paper extends to surfaces the multi-scale approach of edge detection on images. The common practice for detecting curves on surfaces requires the user to first select the scale of the features, apply an appropriate smoothing, and detect the edges on the smoothed surface. This approach suffers from two drawbacks. First, it relies on a hidden assumption that all the features on the surface are of the same scale. Second, manual user intervention is required. In this paper, we propose a general framework for automatically detecting the optimal scale for each point on the surface. We smooth the surface at each point according to this optimal scale and run the curve detection algorithm on the resulting surface. Our multi-scale algorithm solves the two disadvantages of the single-scale approach mentioned above. We demonstrate how to realize our approach on two commonly-used special cases: ridges & valleys and relief edges. In each case, the optimal scale is found in accordance with the mathematical definition of the curve.

same-paper 5 0.82505095 112 cvpr-2013-Dense Segmentation-Aware Descriptors

Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer

6 0.78244191 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

7 0.78054982 456 cvpr-2013-Visual Place Recognition with Repetitive Structures

8 0.75586057 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve

9 0.75420654 297 cvpr-2013-Multi-resolution Shape Analysis via Non-Euclidean Wavelets: Applications to Mesh Segmentation and Surface Alignment Problems

10 0.75009608 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

11 0.74939013 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

12 0.74761993 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

13 0.74706614 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

14 0.74654257 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path

15 0.74639058 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image

16 0.74605137 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

17 0.74558163 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

18 0.74557745 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

19 0.74551964 143 cvpr-2013-Efficient Large-Scale Structured Learning

20 0.74506152 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis