iccv iccv2013 iccv2013-57 knowledge-graph by maker-knowledge-mining

57 iccv-2013-BOLD Features to Detect Texture-less Objects


Source: pdf

Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano

Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 it Abstract Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. [sent-7, score-0.437]

2 We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. [sent-8, score-0.474]

3 robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects. [sent-11, score-0.443]

4 Currently, the established paradigm to accomplish detection of textured objects relies on matching descriptors, i. [sent-14, score-0.299]

5 One fundamental requirement for the above techniques to behave effectively is the presence of enough information onto the object surface to anchor feature detection and description. [sent-20, score-0.148]

6 SIFT behaves nicely on textured objects but performance drops dramatically when the objects sought for lack enough texture details onto their surface. [sent-28, score-0.302]

7 Our proposal (referred to as BOLD) can advance the state-of-the-art in texture-less object detection (compare BOLD to LINE-2D [11]). [sent-29, score-0.236]

8 Hence, texture-less object detection is relevant to foster deployment of computer vision in both established as well as emerging scenarios. [sent-31, score-0.148]

9 Given the aforementioned limitations of descriptorbased object detectors, state-of-the-art proposals tackle the texture-less object detection problem by means of edgebased template matching [11, 12,24,25]. [sent-32, score-0.376]

10 One major merit of edge-based template matching is the ability to detect seam- lessly both textured as well as texture-less objects. [sent-33, score-0.229]

11 It suffers from other limitations though, in particular related to the ability to withstand significant occlusion and clutter as well as to the scalability with respect to the size of the model library. [sent-34, score-0.341]

12 Indeed, a large model library is searched efficiently by storing all descriptors belonging to sought objects within a fast indexing structure (e. [sent-40, score-0.188]

13 Accordingly, in this paper we propose novel features that can be injected seamlessly into a standard SIFT-like object detection pipeline so as to provide notable performance improvements with respect to state-of-the-art edge-based template matching (see again Fig. [sent-45, score-0.397]

14 Purposely, we exploit groups of neighboring line segments to build up a representation of object parts which we term Bunch Of Lines Descriptor (BOLD). [sent-47, score-0.493]

15 The cues deployed in our descriptor are peculiarly encoded into a compact twodimensional histogram and include relative orientations and displacements between pairs of segments as well as contrast polarity. [sent-48, score-0.601]

16 Related work The state-of-the-art in edge-based template matching for texture-less object detection is likely represented by LINE [11], which has been proposed both for 2D (LINE-2D) as well as RGB-D images (LINE-MOD). [sent-50, score-0.263]

17 Thus, 3D object detection can be achieved by matching in real-time thousands of templates gathered during the training stage by looking at the object from different vantage points and distances. [sent-53, score-0.319]

18 Another recent relevant template matching approach for texture-less object detection is proposed in [25], which however, unlike BOLD, requires full-3D object models to carry out the training stage. [sent-55, score-0.32]

19 One of the first methods to describe object contours is the ”cubist” approach by Nelson and Selinger [19], whereby the object representation is simplified by means of a loosely structured combination of local context regions keyed by distinctive boundary fragments called Key Curves. [sent-57, score-0.155]

20 [10] introduced a new family of scaleinvariant local shape features aimed at object categorization which are based on chains of k-connected, roughly straight contour segments called k-Adjacent Segment (kAS). [sent-64, score-0.324]

21 Each kAS is described as a signature including distances between segment pairs, segment absolute orientations and lengths. [sent-65, score-0.401]

22 [6] match sequences of short line segments called constellation of edgelets, i. [sent-69, score-0.418]

23 a sequence of angles that defines the direction of the tracing vectors that connect a subset of object edges. [sent-71, score-0.151]

24 These poses are then ranked by the average distance between the 10-Nearest Neighbor segments on the model transformed according to the current pose hypothesis and the respective scene segments. [sent-75, score-0.267]

25 This method does not include any feature descriptor proposal as it relies on geometric verification only. [sent-76, score-0.206]

26 BOLD features As discussed in previous Section, several approaches aimed at texture-less object detection or recognition rely on edges and segments, mainly extracted from objects’ contours, as the basic trait underpinning the semantic perception process. [sent-79, score-0.148]

27 Edges and segments are also the starting point of our method. [sent-80, score-0.267]

28 In particular, we propose a descriptor for 1266 Figure 2: The geometrical primitives deployed by BOLD are the relative orientations (represented by angles α and β in figure) between pairs of oriented line segments. [sent-81, score-0.768]

29 line segments, which can be extracted by means of a variety of approaches such as either polygonal approximation of the output of an edge detector [8, 22] or a specific line detection algorithm [16, 21, 26]. [sent-82, score-0.373]

30 Additionally, further pruning may be enforced to improve repeatability of extracted line segments, e. [sent-83, score-0.173]

31 Assuming a set of repeatable line segments, S, has been extracted from the image, for each segment we compute a BOLD descriptor, which aggregates together geometrical cues related to neighboring segments. [sent-86, score-0.472]

32 Geometric primitives The BOLD descriptor aggregates together geometric primitives computed over pairs of neighboring segments. [sent-89, score-0.76]

33 These primitives should yield invariance to rotation, translation and scale, and at the same time be robust to noise and efficient to compute. [sent-90, score-0.272]

34 As also depicted in Figure 2, let us denote vectors in boldface and consider a segment pair si, sj ∈ S, with mi, mj representing their respective midpoints. [sent-91, score-0.281]

35 We then refer to the segment connecting mi and mj as to t, the midpoint segment. [sent-93, score-0.406]

36 First of all, to define our primitive each line segment has to be associated with a canonical orientation. [sent-96, score-0.342]

37 Specifically, we define a canonically oriented line segment si as follows: sign(si) =? [sent-98, score-0.405]

38 Based on the previous definition, the proposed geometsi ric primitive consists in the two angles shown in Figure 2, which can be uniquely associated to a pair of oriented segments: α measures the clockwise rotation which aligns si to tij, β the clockwise rotation to align sj to tji. [sent-104, score-0.384]

39 4 which illustrates how the disambiguated angles can detect unlikely transformations such as simultaneous mirroring and contrast polarity inversion. [sent-128, score-0.201]

40 Usually, higher distinctiveness comes to a price in terms of robustness: we will show later in this Section that the chosen angles (α, β) are consistently more effective than (α∗, 1267 FPR Figure 3: Comparison of pairwise geometric primitives. [sent-129, score-0.211]

41 It is also important to point out that (α, β) depend not only on the relative orientation between the two segments but also on their relative spatial displacement. [sent-132, score-0.401]

42 Overall, they thus represent a compact geometric primitive encoding relative orientation and position as well as, due to segments being oriented, contrast polarity. [sent-133, score-0.488]

43 To the best of our knowledge, the proposed geometric primitive has not been deployed by any previous work. [sent-134, score-0.21]

44 As already mentioned, we carried out an in-depth experimental analysis to help devise the most effective geometrical primitives to be deployed within BOLD. [sent-136, score-0.406]

45 An excerpt from the results is shown in Figure 3, where we compare (α, β) with other commonly deployed primitives [6, 10, 15] such as relative orientation between segments, normalized length and normalized midpoint distance. [sent-137, score-0.582]

46 In this experiment, all primitives are accumulated into histograms, which is the way pairwise geometrical primitives are aggregated in BOLD (see Section 3. [sent-138, score-0.636]

47 As (α, β) yield 2D histograms while the other considered primitives 1D histograms, we also compare our proposal with 2D histograms built by using jointly multiple primitives. [sent-140, score-0.36]

48 As anticipated, we also evaluated using the smaller angles between vectors (α∗ , β∗), as well as measurement of such angles without canonically orienting segments, which results in taking always the smallest possible angle between vectors (referred to here as (α, β) unoriented). [sent-141, score-0.305]

49 By building histograms out of different primitives we attain different descriptors that can be plugged seamlessly into the object detection pipeline described in Section 4 and thereby evaluated comparatively as depicted in Figure 3. [sent-142, score-0.599]

50 Results show the overall superiority of angle-based primitives with respect to distances or lengths. [sent-143, score-0.272]

51 We ascribe this mainly to the former turning out more robust with respect to the potential fragility of the segment extrac- si ? [sent-144, score-0.267]

52 Figure 4: The disambiguated angles defined according to (7),(8) highlight potentially invalid transformations such as simultaneous mirroring and contrast polarity inversion: in (b) α∗, β∗ take the same values as in (a), whilst α and β take different values. [sent-165, score-0.201]

53 The Figure also demonstrates the effectiveness of relying on canonically oriented segments (α∗ , β∗ vs. [sent-167, score-0.342]

54 As for computational efficiency, all the considered primitives appear approximately equivalent in terms of their impact on overall detection time. [sent-170, score-0.363]

55 Aggregation of geometric primitives For each line segment, si, the BOLD descriptor is built by aggregating (α, β) primitives computed for the set of neighboring segments (referred to as bunch) given by the k nearest neighbors (kNN) segments of si, k being a parameter of the method. [sent-173, score-1.365]

56 Purposely, a distance between line segments has to be defined. [sent-176, score-0.378]

57 In our proposal we simply compute the distance between midpoints, although other approaches, such as sampling uniformly along segments and computing the closest distance between sampled points [7], may be deployed. [sent-177, score-0.355]

58 Successively, for each pair formed by si and one of the k segments in its bunch, the geometric primitives (α, β) are computed and aggregated together. [sent-178, score-0.695]

59 According to the former, a signature of the primitives is computed by ordering the neighboring segments of a bunch based on the distance to the central segment, then building the descriptor as the ordered chain of primitives associated to each segment. [sent-180, score-1.11]

60 The histogram approach turned out to notably outperform the signature approach, due to the higher robustness with regards to clutter and occlusion, as in a signature a single segment missing from the bunch tends to disrupt description. [sent-184, score-0.648]

61 herently provides good robustness to inaccuracies in segment localization. [sent-186, score-0.189]

62 Deploying multiple bunches The number of neighboring segments, k, is a key parameter of the BOLD descriptor. [sent-194, score-0.227]

63 A high number of segments tends to increase distinctiveness of BOLDs, since there are lower ambiguities due to similar bunches arising from non corresponding object parts. [sent-195, score-0.56]

64 On the other hand, a high value of k tends to include, within the same bunch, neighboring segments that may belong to clutter, this leading to somewhat corrupted histograms (see the example in Figure 5). [sent-196, score-0.325]

65 Accumulating primitives over histograms helps increasing the robustness up to a certain extent, i. [sent-197, score-0.31]

66 until the number of clutter elements does not exceed that of object elements. [sent-199, score-0.203]

67 Moreover, a good choice for k depends also on the type of objects to be detected: simple shapes made out of a few segments call for a small k, so as not to incorporate clutter, whereas for more complex objects a higher k is usually beneficial. [sent-200, score-0.357]

68 Instead of trying to tune this parameter based on specific scenarios, we propose to simultaneously deploy multiple k values to describe each line segment si. [sent-202, score-0.305]

69 Figure 6 reports object detection results attained by a single bunch approach with different k as well as by deploying multiple bunches: the best FPR Figure 6: BOLD descriptors employing single vs. [sent-205, score-0.414]

70 single-bunch configuration turns out k = 10, but remarkably improved performance can be attained by using multiple bunches altogether, this without slowing down too notably the overall process. [sent-207, score-0.311]

71 Object detection pipeline In this section we describe our object detection approach, which deploys BOLD within a standard SIFT-like pipeline [17] where the detection and description stages are mod- ified to deal with texture-less objects. [sent-210, score-0.554]

72 Object contours can change notably at different scales, and sometimes edges can completely disappear if either the object is blurred or a significant scale variation does occur. [sent-211, score-0.144]

73 For this reason, the first step of our pipeline is represented by multi-scale extraction of line segments. [sent-212, score-0.195]

74 In particular, we build a scale space by rescaling the input image at different resolutions, then extract line segments at each level of the pyramid. [sent-213, score-0.378]

75 The scale of each segment is retained so that, in the next step, the BOLD descriptor for each segment is computed taking into account only the neighbors found at the same scale. [sent-214, score-0.37]

76 This counteracts the issue of missing segments due to large scale variations. [sent-215, score-0.267]

77 Although we have evaluated matching measures specifically conceived for histogram data, such as the Histogram Intersection, we have found that the Euclidean distance yields good results without sacrificing efficiency. [sent-217, score-0.146]

78 Experimental evaluation We compare here the BOLD pipeline for texture-less object detection to two prominent edge-based template matching-based approaches, i. [sent-223, score-0.298]

79 Moreover, we include in our comparison descriptor-based methods for textured object detection such as SIFT [17], SURF [2] and ORB [23]. [sent-226, score-0.262]

80 To extract the line segments needed to compute BOLD we use the LSD algorithm [26]. [sent-227, score-0.378]

81 Although BOLD is in principle independent of a specific line segment detector, we found that LSD provides enough repeatability to enable effective object detection. [sent-228, score-0.381]

82 As for SIFT and SURF we simply plug their specific detection/description stages into the reference object detection pipeline described in Section 4, while for the ORB pipeline we employed LSH in the match- ing stage as suggested in [23]. [sent-231, score-0.316]

83 com/ 2 opens i ft object detection withstanding clutter and occlusions, we have acquired our own, referred to as D-Textureless. [sent-246, score-0.4]

84 This dataset has been acquired with a webcam, comes with handlabeled ground-truth and includes 9 texture-less models and 55 scenes with clutter and occlusions. [sent-247, score-0.146]

85 To complement our comparison, we evaluate BOLD also on a textured dataset built from publicly available images and referred to as Caltech Covers. [sent-250, score-0.164]

86 D-Textureless and Caltech Covers are referred to, respectively, as texture-less and textured dataset in Fig. [sent-253, score-0.164]

87 8b, SIFT is clearly the best performer when dealing with textured objects, neatly surpassing SURF and then BOLD. [sent-270, score-0.189]

88 The authors incorporate their model into LINE-2D according to three different variants and show improved performance on their dataset, which consists of 8 common household texture-less objects sought in 800 single view and 800 multi view cluttered scenes with various levels of occlusion (see Fig. [sent-273, score-0.322]

89 In single view experiments the object is seen in the scene from the same vantage point as in the -single- training image, while multi view experiments focus on variations of the elevation angle, the training set comprising 25 views of each object. [sent-275, score-0.164]

90 According to [13], in Figure 9 we provide the results attained by the BOLD object detection pipeline in terms of recall (i. [sent-276, score-0.283]

91 Figure 9 shows also two failure cases and one successful detection enabled by quite impressive matches dealing with segments on the mug handle occluded by a semi-transparent plastic bag. [sent-285, score-0.358]

92 Finally, to analyze the scalability of the considered algorithms, in Figure 10 we report the measured execution times versus the number of sought models for the D-Textureless dataset5. [sent-286, score-0.156]

93 Concluding remarks BOLD features allows for leveraging on a fairly standard descriptor-based pipeline to detect effectively also textureless objects, thereby achieving state-of-the-art robustness to clutter and occlusion and unprecedented scalability with respect to the size of the model database. [sent-291, score-0.467]

94 it/BOLD 1271 fppi (a) Single view fppi (b) Multi view ture matching Figure 9: CMU-KO8: quantitative (BOLD, LINE-2D and the three methods proposed in [13]) and qualitative (BOLD) results of our proposal deals with detection of highly curvilinear (e. [sent-296, score-0.388]

95 Such objects show just a few repeatable BOLDs: if some get corrupted due to occlusion or clutter, then the object may hardly be detected. [sent-301, score-0.247]

96 Object recognition in high clutter images using line features. [sent-343, score-0.257]

97 3d object detection and localization using multimodal point pair features. [sent-360, score-0.148]

98 Preliminary development of a line feature-based object recognition system for textureless indoor objects. [sent-403, score-0.222]

99 A parameterless line segment and elliptical arc detector with enhanced ellipse fitting. [sent-435, score-0.262]

100 Lsd: A fast line segment detector with a false detection control. [sent-473, score-0.353]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bold', 0.387), ('primitives', 0.272), ('segments', 0.267), ('halcon', 0.197), ('bunches', 0.169), ('segment', 0.151), ('clutter', 0.146), ('midpoint', 0.139), ('orb', 0.13), ('surf', 0.123), ('bunch', 0.12), ('textured', 0.114), ('line', 0.111), ('caltech', 0.108), ('sought', 0.098), ('angles', 0.094), ('detection', 0.091), ('covers', 0.089), ('proposal', 0.088), ('occlusion', 0.087), ('pipeline', 0.084), ('lsd', 0.083), ('deployed', 0.08), ('primitive', 0.08), ('fppi', 0.08), ('canonically', 0.075), ('neatly', 0.075), ('polarity', 0.069), ('descriptor', 0.068), ('mj', 0.068), ('si', 0.068), ('distinctiveness', 0.067), ('template', 0.066), ('disambiguation', 0.065), ('vantage', 0.065), ('repeatability', 0.062), ('sj', 0.062), ('polygonal', 0.06), ('scalability', 0.058), ('neighboring', 0.058), ('repeatable', 0.058), ('object', 0.057), ('bolds', 0.056), ('conceived', 0.056), ('cubist', 0.056), ('datalogic', 0.056), ('deploys', 0.056), ('descriptiveness', 0.056), ('descriptorbased', 0.056), ('franchi', 0.056), ('luigi', 0.056), ('nelson', 0.056), ('peculiarly', 0.056), ('unibo', 0.056), ('unoriented', 0.056), ('withstanding', 0.056), ('textureless', 0.054), ('geometrical', 0.054), ('signature', 0.053), ('attained', 0.051), ('geometric', 0.05), ('seamlessly', 0.05), ('bologna', 0.05), ('damen', 0.05), ('deploying', 0.05), ('hinterstoisser', 0.05), ('household', 0.05), ('hsiao', 0.05), ('kas', 0.05), ('withstand', 0.05), ('sift', 0.05), ('referred', 0.05), ('matching', 0.049), ('mi', 0.048), ('orientation', 0.048), ('former', 0.048), ('eft', 0.046), ('purposely', 0.046), ('orientations', 0.046), ('notably', 0.046), ('descriptors', 0.045), ('objects', 0.045), ('turns', 0.045), ('ilic', 0.043), ('deploy', 0.043), ('relative', 0.043), ('multi', 0.042), ('hebert', 0.042), ('angle', 0.042), ('histogram', 0.041), ('contours', 0.041), ('knn', 0.04), ('aggregates', 0.04), ('clockwise', 0.04), ('constellation', 0.04), ('disi', 0.04), ('fpr', 0.038), ('mirroring', 0.038), ('aggregated', 0.038), ('robustness', 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 57 iccv-2013-BOLD Features to Detect Texture-less Objects

Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano

Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.

2 0.27000734 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding

Author: David F. Fouhey, Abhinav Gupta, Martial Hebert

Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.

3 0.19912186 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff

Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.

4 0.18404968 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg

Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.

5 0.15813395 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

Author: Reyes Rios-Cabrera, Tinne Tuytelaars

Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.

6 0.12498509 317 iccv-2013-Piecewise Rigid Scene Flow

7 0.11854736 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects

8 0.10677916 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs

9 0.10285305 379 iccv-2013-Semantic Segmentation without Annotating Segments

10 0.10280171 74 iccv-2013-Co-segmentation by Composition

11 0.097585082 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees

12 0.096962452 127 iccv-2013-Dynamic Pooling for Complex Event Recognition

13 0.087544419 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

14 0.086862601 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors

15 0.086778015 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers

16 0.084854357 90 iccv-2013-Content-Aware Rotation

17 0.084453337 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation

18 0.083907567 190 iccv-2013-Handling Occlusions with Franken-Classifiers

19 0.079745211 250 iccv-2013-Lifting 3D Manhattan Lines from a Single Image

20 0.078646675 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.226), (1, -0.056), (2, 0.018), (3, 0.008), (4, 0.062), (5, 0.01), (6, -0.022), (7, -0.006), (8, -0.054), (9, -0.052), (10, 0.038), (11, 0.046), (12, 0.014), (13, 0.016), (14, 0.026), (15, -0.033), (16, 0.057), (17, 0.08), (18, 0.061), (19, -0.037), (20, 0.009), (21, -0.018), (22, -0.006), (23, -0.055), (24, 0.002), (25, -0.002), (26, -0.054), (27, 0.052), (28, -0.124), (29, -0.038), (30, -0.081), (31, 0.076), (32, -0.098), (33, -0.015), (34, -0.126), (35, 0.069), (36, 0.009), (37, -0.102), (38, 0.047), (39, -0.183), (40, 0.017), (41, 0.185), (42, -0.043), (43, -0.001), (44, -0.015), (45, 0.155), (46, -0.082), (47, 0.084), (48, -0.131), (49, 0.137)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94657081 57 iccv-2013-BOLD Features to Detect Texture-less Objects

Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano

Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.

2 0.67865807 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg

Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.

3 0.67174828 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding

Author: Daniel M. Steinberg, Oscar Pizarro, Stefan B. Williams

Abstract: With the advent of cheap, high fidelity, digital imaging systems, the quantity and rate of generation of visual data can dramatically outpace a humans ability to label or annotate it. In these situations there is scope for the use of unsupervised approaches that can model these datasets and automatically summarise their content. To this end, we present a totally unsupervised, and annotation-less, model for scene understanding. This model can simultaneously cluster whole-image and segment descriptors, therebyforming an unsupervised model of scenes and objects. We show that this model outperforms other unsupervised models that can only cluster one source of information (image or segment) at once. We are able to compare unsupervised and supervised techniques using standard measures derived from confusion matrices and contingency tables. This shows that our unsupervised model is competitive with current supervised and weakly-supervised models for scene understanding on standard datasets. We also demonstrate our model operating on a dataset with more than 100,000 images col- lected by an autonomous underwater vehicle.

4 0.64694667 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition

Author: Shahriar Shariat, Vladimir Pavlovic

Abstract: The problem of human activity recognition is a central problem in many real-world applications. In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search.

5 0.61255276 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects

Author: Daniel Wesierski, Patrick Horain

Abstract: Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. This generally makes tracking of poses of elongated objects very challenging. We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. Our method strives for simplicity, versatility, and efficiency. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. To our knowledge, this is the first approach to generic tracking of elongated objects.

6 0.60895014 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding

7 0.60390246 74 iccv-2013-Co-segmentation by Composition

8 0.6011216 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

9 0.59690869 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers

10 0.55571204 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

11 0.55071092 250 iccv-2013-Lifting 3D Manhattan Lines from a Single Image

12 0.54623783 90 iccv-2013-Content-Aware Rotation

13 0.53180104 288 iccv-2013-Nested Shape Descriptors

14 0.50306553 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees

15 0.48445141 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

16 0.48312277 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings

17 0.46037546 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields

18 0.45844257 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model

19 0.45806882 112 iccv-2013-Detecting Irregular Curvilinear Structures in Gray Scale and Color Imagery Using Multi-directional Oriented Flux

20 0.45729771 21 iccv-2013-A Method of Perceptual-Based Shape Decomposition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.075), (6, 0.019), (7, 0.03), (12, 0.011), (24, 0.188), (26, 0.082), (31, 0.04), (35, 0.011), (40, 0.015), (42, 0.109), (48, 0.015), (64, 0.066), (73, 0.046), (89, 0.182), (98, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8725006 153 iccv-2013-Face Recognition Using Face Patch Networks

Author: Chaochao Lu, Deli Zhao, Xiaoou Tang

Abstract: When face images are taken in the wild, the large variations in facial pose, illumination, and expression make face recognition challenging. The most fundamental problem for face recognition is to measure the similarity between faces. The traditional measurements such as various mathematical norms, Hausdorff distance, and approximate geodesic distance cannot accurately capture the structural information between faces in such complex circumstances. To address this issue, we develop a novel face patch network, based on which we define a new similarity measure called the random path (RP) measure. The RP measure is derivedfrom the collective similarity ofpaths by performing random walks in the network. It can globally characterize the contextual and curved structures of the face space. To apply the RP measure, we construct two kinds of networks: . cuhk . edu . hk the in-face network and the out-face network. The in-face network is drawn from any two face images and captures the local structural information. The out-face network is constructed from all the training face patches, thereby modeling the global structures of face space. The two face networks are structurally complementary and can be combined together to improve the recognition performance. Experiments on the Multi-PIE and LFW benchmarks show that the RP measure outperforms most of the state-of-art algorithms for face recognition.

2 0.86498821 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors

Author: Nakamasa Inoue, Koichi Shinoda

Abstract: Assigning a visual code to a low-level image descriptor, which we call code assignment, is the most computationally expensive part of image classification algorithms based on the bag of visual word (BoW) framework. This paper proposes a fast computation method, Neighbor-toNeighbor (NTN) search, for this code assignment. Based on the fact that image features from an adjacent region are usually similar to each other, this algorithm effectively reduces the cost of calculating the distance between a codeword and a feature vector. This method can be applied not only to a hard codebook constructed by vector quantization (NTN-VQ), but also to a soft codebook, a Gaussian mixture model (NTN-GMM). We evaluated this method on the PASCAL VOC 2007 classification challenge task. NTN-VQ reduced the assignment cost by 77.4% in super-vector coding, and NTN-GMM reduced it by 89.3% in Fisher-vector coding, without any significant degradation in classification performance.

3 0.85561264 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

Author: Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori

Abstract: The appearance of an object changes profoundly with pose, camera view and interactions of the object with other objects in the scene. This makes it challenging to learn detectors based on an object-level label (e.g., “car”). We postulate that having a richer set oflabelings (at different levels of granularity) for an object, including finer-grained subcategories, consistent in appearance and view, and higherorder composites – contextual groupings of objects consistent in their spatial layout and appearance, can significantly alleviate these problems. However, obtaining such a rich set of annotations, including annotation of an exponentially growing set of object groupings, is simply not feasible. We propose a weakly-supervised framework for object detection where we discover subcategories and the composites automatically with only traditional object-level category labels as input. To this end, we first propose an exemplar-SVM-based clustering approach, with latent SVM refinement, that discovers a variable length set of discriminative subcategories for each object class. We then develop a structured model for object detection that captures interactions among object subcategories and automatically discovers semantically meaningful and discriminatively relevant visual composites. We show that this model produces state-of-the-art performance on UIUC phrase object detection benchmark.

same-paper 4 0.85522527 57 iccv-2013-BOLD Features to Detect Texture-less Objects

Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano

Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.

5 0.79262894 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

Author: Reyes Rios-Cabrera, Tinne Tuytelaars

Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.

6 0.79012513 338 iccv-2013-Randomized Ensemble Tracking

7 0.78998697 150 iccv-2013-Exemplar Cut

8 0.78877258 379 iccv-2013-Semantic Segmentation without Annotating Segments

9 0.78780901 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning

10 0.78777075 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps

11 0.78722334 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

12 0.78685153 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning

13 0.78638911 349 iccv-2013-Regionlets for Generic Object Detection

14 0.78632605 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning

15 0.78533459 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation

16 0.78532374 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

17 0.78334242 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses

18 0.78294861 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

19 0.78291535 180 iccv-2013-From Where and How to What We See

20 0.78279644 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model