iccv iccv2013 iccv2013-269 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu
Abstract: Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic programming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estimation. Experimental results show that (i) Our CAD simulation strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) [6, 10] in car detec- , tion on both our self-collected streetparking dataset and the Pascal VOC 2007 car dataset [4], (iii) The learned model is on-par with the state-of-the-art methods on car view estimation tested on two public datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i. [sent-4, score-1.55]
2 Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. [sent-8, score-0.708]
3 In experiments, we test our method on both car detection and car view estimation. [sent-11, score-0.868]
4 Introduction Handling occlusion is a challenging problem in object detection since occlusion increases the intra-class variations of an object category significantly. [sent-14, score-0.938]
5 Taking the car-to-car occlusion as an example illustrated in the left of Fig. [sent-15, score-0.395]
6 This poses a difficult problem on learning an object model, as the model needs to encode a large number of occlusion configurations. [sent-19, score-0.448]
7 It organizes object parts into consistently visible parts and optional part clusters, and then represents an object with the consistently visible parts (i. [sent-25, score-0.983]
8 Our objective is to extend these models so that their parts can be re-configured to represent objects with different occlusion configurations. [sent-31, score-0.52]
9 Inspired by the expressive power of AND-OR graph [35], we propose a simple AND-OR structure (which is a directed and acyclic graph) to model the occlusion configurations effectively. [sent-32, score-0.646]
10 In this structure, a valid occlusion configuration can be generated by composing (AND) the consistently vis- ible parts together with one of (OR) the optional part clusters. [sent-35, score-0.827]
11 This structural representation is more compact than plainly remembering individual configuration, yet it effectively constrains the space of occlusion configurations as no other unobserved configurations are allowed. [sent-36, score-0.787]
12 Because manually labelling views, parts and part occlusion on real images are time-consuming and error-prone, we propose to learn the AND-OR structure using a large set of occlusion configurations generated by car CAD models and a graphics rendering engine. [sent-37, score-1.608]
13 By directly incorporating the appearance and deformation formulations from the DPM [6] model, the parameters of this AND-OR structure 22556600 can be discriminatively trained with real images under the latent structure SVM (LSSVM) framework [32]. [sent-38, score-0.298]
14 Because the parts are shared by multiple occlusion configurations, we can use images with different occlusion configurations collectively to train them, which would be more efficient and robust than training them individually. [sent-39, score-1.051]
15 In experiments, we test our method on both car detection and car view estimation. [sent-41, score-0.868]
16 (ii) The learned model is on-par with the state-of-the-art methods on car view estimation. [sent-44, score-0.488]
17 [9] used a more flexible grammar model to infer both the occluder and visible parts of a occluded person. [sent-54, score-0.382]
18 We utilize the fact that parts are not occluded at random, and model the overall structures or regularities in the visibility of all object parts. [sent-55, score-0.33]
19 While the idea of modelling occlusion structure is similar to [16], our paper is aimed at modelling a strong occlusion prior for an entire object class, instead of a generic prior suitable for regular objects on a table. [sent-56, score-1.037]
20 For our application, the latter one will be less informative as it averages out the contribution of the object shape information to the occlusion structure. [sent-57, score-0.448]
21 In the literature, [17] employed an AND-OR Tree (AOT) for 3D car modelling, but not on occlusion. [sent-60, score-0.368]
22 The proposed approach uses CAD models to synthesize the occlusion patterns under different object distances and views. [sent-61, score-0.448]
23 While several recent methods use the synthesized images to learn part geometry [24] or detailed part appearance [20], our method only uses them to learn the configuration structure of occlusion, where the appearance terms are still trained using real images. [sent-62, score-0.424]
24 Finally, as this paper mainly concentrates on occlusion modeling, we directly incorporate the ingredients of DPM [6] for modeling object part appearance and deformation. [sent-63, score-0.545]
25 The main contributions of this paper include: i) We propose a generic AND-OR structure to capture the structure of various occlusion configurations by hierarchically composing a small number of object parts. [sent-64, score-0.807]
26 iii) We introduce a street parking car dataset emphasizing detection of cars with occlusions, as a benchmark for our method and future methods. [sent-66, score-0.98]
27 2 gives an overview of the learned full AND-OR structure for the car category. [sent-71, score-0.481]
28 The structure is a DAG which encodes the patterns of visible part combinations observed in occluded car images. [sent-72, score-0.728]
29 b) the occlusion OR nodes, which connect to different optional part clusters that are visible or occluded together in different occlusion configurations. [sent-76, score-1.168]
30 Each OR node O ∈ VOR has a branching variable denoted by cωh( OOR), indicating w Vhich one of its child nodes is selected, and ω(O) will be inferred on-the-fly in detection. [sent-77, score-0.395]
31 According tso V their semantics, these AND nodes are also organized into two groups: a) Object level AND nodes, which collect all the parts in an occlusion configuration. [sent-80, score-0.642]
32 the ‘root’) as a generalized part of 22556611 node represents an object category (car) OR-node (dashed circle), which has a set of child AND-nodes (solid circle) representing different view points. [sent-83, score-0.391]
33 Each viewpoint AND-node consists of a small number of consistently visible parts (as terminal nodes plotted by rectangles) and a occlusion OR-node. [sent-84, score-0.996]
34 The occlusion OR-node represents optional parts subject to different occlusion patterns. [sent-85, score-1.016]
35 Each of these AND nodes collects a subset of object parts that will become visible or invisible together in an occlusion configuration. [sent-88, score-0.845]
36 Each terminal node t has its own location, which will also be inferred during detection. [sent-93, score-0.323]
37 The inferred locations of the nodes represent positions of the object and its visible parts. [sent-94, score-0.353]
38 This term balances the templates for different occlusion configurations such that their scores are comparable. [sent-99, score-0.557]
39 Therefore, our AND-OR Structure model is defined as a 4-tuple: AOT = (VAND, VOR, VT, Θ) (1) Our model could be transformed to a big mixtures-ofDPM by removing part sharing and moving all OR nodes to the root, which has more complexity and less robustness, as there are no shared nodes (i. [sent-103, score-0.305]
40 Scoring Functions of the AND-OR structure Given an image Iand its corresponding image feature pyramid H such as HOG [2], the score S(·, ·) of a node in tphyer almattiidce H HΛ s uofc thh aes pyramid ]H, thise ede scfionreed S as ,fo·)ll oofw as: n i) For a terminal node t ∈ VT w. [sent-108, score-0.53]
41 T(A) where O is the child OR node of A (if has), the function T(A) retrieves all the child terminal nodes of A, and θAb is the corresponding bias. [sent-121, score-0.613]
42 iii) For an OR node O placed at u: S(O,u) =A∈mcah(xO)S(A,u) (4) where A denotes a child AND node of O, and the function ch(O) retrieves all the child AND nodes of O. [sent-122, score-0.58]
43 When generating images (shown in the second column), we use the color coded model for the car instances in the center, and regular CAD models for the rest. [sent-125, score-0.43]
44 Each of these images then contribute one row to the occlusion data matrix shown on the right, where each column denotes if a part is visible (white) or not (gray). [sent-127, score-0.581]
45 Given the data matrix D and the initial AOT which plainly remembering each occlusion configuration as a row in D, the algorithm iteratively pursues big mbloatcrkisx oDf a1sn (e. [sent-129, score-0.583]
46 AND-OR Structure Learning We propose to learn the AND-OR structure automatically from a large number of occlusion configurations. [sent-134, score-0.478]
47 Because of the ambiguity of views and the relatively large number of parts (17 in our case), manually labelling views and parts are time consuming and error-prone. [sent-135, score-0.398]
48 Note that the synthetic data is only used to learn the occlusion structure, while the appearance and geometry parameters are still learned from real data. [sent-137, score-0.625]
49 Taking the number of views as the only parameter, this structure learning process can be divided to 3 steps: (i) generating occlusion configurations, (ii) constructing data matrix for an initial AND-OR Tree (AOT), and (iii) refining the initial AOT structure. [sent-138, score-0.639]
50 Generating occlusion configurations We choose to put 3 cars in generating each occlusion image, as this is a basic unit that can be used to further com- × × pose general car-to-car occlusions. [sent-141, score-1.194]
51 For each set of the position triplet, we randomly choose values for a few factors controlling the occlusion, and then extract an occlusion configuration from the generated images. [sent-143, score-0.465]
52 Sample images generated using a pool of 40 car CAD models1 for the scene of street parking are shown in Fig. [sent-144, score-0.625]
53 We assume that the occlusion configurations are affected by following factors: car type t, orientation ρ , relative position r and camera view Π. [sent-146, score-0.989]
54 To generate a configuration, we randomly choose corresponding values for these factors, where for each car with type i, ρi ∈ {frontal,rear}, 1 from www. [sent-147, score-0.368]
55 com and Google 3d warehouse ri(0) ri(0) = + dr, where is the nominated position for the i-th car on the 3 3 grid, and dr = (dx, dy) is the trheelat ii-vteh d cisatran once (along x 3a gxirsi da,n dan y a dxris) = = be (twdxee,nd sampled position and nominated position of the i-th car. [sent-149, score-0.452]
56 By changing values of these parameters, we can generate many different occlusion images for further processing. [sent-151, score-0.395]
57 3(a), we manually segment a car into 17 parts (considering symmetry and simplicity), which are ri ×× coded with different colors. [sent-153, score-0.524]
58 3(a), and the other using only the color coded car in the center, which is shown immediately below. [sent-155, score-0.399]
59 Constructing initial AOT With the part-level visibility information, we could get two vectors for each occlusion configuration. [sent-160, score-0.452]
60 Denoting M as the dimension of the vector v, and by stacking v for N occlusion configurations, we can get an N M occlusion matrix D, where the first few rows of this mNa ×triMx f oorc cBlu =io n8 misa strhioxw Dn, winh tehree lthaset cirosltu fmewn orofw Fig. [sent-163, score-0.79]
61 Note that we have partitioned the view space into B views, so for each row, the visible parts always concentrate in a 22556633 segment of the vector representing that view. [sent-165, score-0.34]
62 wIn i par- ticular, each subtree consists of an AND node as root and a set of terminal nodes as its children. [sent-167, score-0.53]
63 Here each row of D represents an occlusion configuration, . [sent-170, score-0.395]
64 Refining AOT Structure The initial AOT can be very large and redundant, since it has many duplicated occlusion configurations (i. [sent-176, score-0.598]
65 In each view, we assume the number of occlusion branches is not greater than K(= 4). [sent-184, score-0.482]
66 As there are consistently visible parts for each view, the algorithm will quickly converge to the structure similar to Fig. [sent-192, score-0.378]
67 With the refined AND-OR structure, we could get occlusion configurations (i. [sent-194, score-0.531]
68 , the consistently visible parts and optional occluded parts) in each view. [sent-196, score-0.487]
69 Besides, the bounding box sizes and nominal positions of each terminal node w. [sent-197, score-0.423]
70 The latent variables in our model and the way to initialize them are listed as follows: The view and occlusion configuration of each object bounding box, which is related to the branching variable V of the root OR node in layer 0 and the branching variable ω(O) for OR nodes in layer 2 (see Fig. [sent-206, score-1.208]
71 (ii) then we use the temporary model to “infer” the view and occlusion configuration of each training positive on real data. [sent-209, score-0.634]
72 The location and bounding box for each visible part under corresponding occlusion configuration. [sent-211, score-0.678]
73 The bottom-up pass places our model at all positions and scales of the image to compute appearance and geometry score maps for every terminal node, as well as these for the AND and OR nodes. [sent-222, score-0.307]
74 A Street Parking Car Dataset There are several datasets featuring a large amount of car images [18, 26, 23, 4, 5], but they are not suitable to evaluating occlusion handling, as the proportion of (moderately or heavily) occluded cars is marginal. [sent-229, score-1.091]
75 To evaluate our model on occlusion handling, we developed a large scale car dataset emphasizing street parking cars with a large amount of occlusions and diverse viewpoint changes (see Fig. [sent-231, score-1.401]
76 The dataset is composed of 881 images, most of which are collected by searching the internet and capturing cars on the streets around our campus, besides, we also collect and annotate some street scene images from [25, 4]. [sent-233, score-0.396]
77 4 shows the bounding box overlap distribution and average number of cars per image on our dataset. [sent-235, score-0.334]
78 These two statistics can be viewed as the summary of car occlusion distribution. [sent-236, score-0.763]
79 For image annotation, we adopt the weak annotation strategy, and just label the bounding boxes of cars in each image. [sent-237, score-0.356]
80 Detection To evaluate our model on car detection task, we train and test our model on three datasets. [sent-242, score-0.41]
81 Synthetic Dataset: In the first experiment, we test the effectiveness of our AND-OR structure in representing different part occlusion configurations. [sent-250, score-0.539]
82 For this purpose, we × generate a synthetic dataset using 5040 3 cars synthetic images as our training data, and a mixture of 3000 3 cars and 7 cars (we generate the 7 cars in a 1 7 grid) synthetic images as our testing ed tahtea. [sent-251, score-1.255]
83 On this dataset, the best DPM has 16 components and the best AND-OR structure has 8 views with 19 occlusion branches, 5 layers and 111 nodes. [sent-255, score-0.552]
84 This also suggests that the occlusion configurations in the synthetic data matches the real occlusion cases. [sent-273, score-1.058]
85 7(b) shows some examples of car detection results by our model. [sent-277, score-0.41]
86 The red rectangle shows the successful cases, the blue bounding boxes show the missing detections (we omit the cars smaller than 1000 pixels), and the green bounding box shows the false alarms. [sent-278, score-0.429]
87 From these examples, it can be seen our model is able to detect the cars with small, moderate occlusions as well as a considerable amount of cars with severe occlusions. [sent-279, score-0.542]
88 The failure cases are mainly caused by severe occlusions (greater than 60% of the car area is occluded), other occluders (e. [sent-280, score-0.436]
89 , bounding boxes includes more than one car or only a part of one car). [sent-285, score-0.524]
90 So we mainly use this dataset to show that our model can also be used in the general car detection task. [sent-288, score-0.456]
91 To approximate the occlusion configurations observed on this dataset, we generate synthetic images with car-to-car occlusion as well as with only car self-occlusions. [sent-289, score-1.381]
92 Car detection performance comparisons in terms of Precision-Recall curves on synthetic dataset, street parking dataset and VOC 2007 car dataset [4]. [sent-293, score-0.846]
93 structure has 6 views with 10 occlusion branches, 5 layers and 109 nodes. [sent-294, score-0.552]
94 View Estimation To verify the capability of our model on view estimation, we report the mean precision in pose estimation (MPPE) on both Pascal VOC 2006 car dataset [5] and 3D Car dataset [26] following the protocol in [12] and [26], respectively. [sent-301, score-0.55]
95 The AND-OR structure is a DAG with each subtree consisting of consistently visible parts and optional part clusters. [sent-307, score-0.601]
96 Experiments show that our CAD simulation strategy is effective and our model is better than the state-of-the-art model [6, 10] in terms of car detection and view estimation. [sent-309, score-0.531]
97 As our model uses weakly semantic parts from synthetic training images, the trained model can also be used to estimate the view and visible parts, even though such supervision is not provided in the training set. [sent-312, score-0.427]
98 Here, string x-y means the car is in view x with occlusion configuration y, corresponding to MP EPasc0D. [sent-315, score-0.923]
99 F-o3r4e1ach4-p2redict window, uralgoithmsalocap- ble of estimating the view and occlusion configuration (left), and localizing object parts (right). [sent-330, score-0.733]
100 the x-th AND node in layer 1 and y-th AND node within the corresponding subtree in layer 3 in our AND-OR structure. [sent-334, score-0.369]
wordName wordTfidf (topN-words)
[('occlusion', 0.395), ('car', 0.368), ('aot', 0.359), ('cars', 0.237), ('cad', 0.204), ('terminal', 0.184), ('parking', 0.176), ('dpm', 0.163), ('configurations', 0.136), ('visible', 0.125), ('parts', 0.125), ('nodes', 0.122), ('node', 0.114), ('optional', 0.101), ('occluded', 0.091), ('view', 0.09), ('synthetic', 0.087), ('branches', 0.087), ('lssvm', 0.085), ('structure', 0.083), ('street', 0.081), ('voc', 0.075), ('views', 0.074), ('pascal', 0.073), ('child', 0.073), ('configuration', 0.07), ('occlusions', 0.068), ('dag', 0.067), ('eao', 0.063), ('tg', 0.063), ('bounding', 0.062), ('subtree', 0.061), ('branching', 0.061), ('part', 0.061), ('app', 0.059), ('visibilities', 0.059), ('vor', 0.055), ('object', 0.053), ('latent', 0.051), ('root', 0.049), ('remembering', 0.048), ('tapp', 0.048), ('retrieves', 0.047), ('parent', 0.047), ('dataset', 0.046), ('real', 0.045), ('consistently', 0.045), ('detection', 0.042), ('modelling', 0.042), ('liebelt', 0.042), ('plainly', 0.042), ('nominated', 0.042), ('grammar', 0.041), ('layer', 0.04), ('duplicated', 0.039), ('vand', 0.039), ('placed', 0.037), ('ii', 0.037), ('appearance', 0.036), ('iii', 0.036), ('geo', 0.035), ('thh', 0.035), ('box', 0.035), ('dy', 0.034), ('temporary', 0.034), ('boxes', 0.033), ('acyclic', 0.032), ('kitti', 0.032), ('regularities', 0.032), ('streets', 0.032), ('handling', 0.032), ('geometry', 0.032), ('zhu', 0.032), ('deformable', 0.031), ('cccp', 0.031), ('dp', 0.031), ('simulation', 0.031), ('generating', 0.031), ('coded', 0.031), ('structural', 0.03), ('composing', 0.03), ('emphasizing', 0.03), ('learned', 0.03), ('bias', 0.029), ('savarese', 0.029), ('neighbouring', 0.029), ('visibility', 0.029), ('denoting', 0.029), ('initial', 0.028), ('positions', 0.028), ('grid', 0.028), ('correspondingly', 0.028), ('valued', 0.027), ('generic', 0.027), ('pass', 0.027), ('dx', 0.026), ('templates', 0.026), ('inferred', 0.025), ('invisible', 0.025), ('annotation', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
Author: Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu
Abstract: Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic programming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estimation. Experimental results show that (i) Our CAD simulation strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) [6, 10] in car detec- , tion on both our self-collected streetparking dataset and the Pascal VOC 2007 car dataset [4], (iii) The learned model is on-par with the state-of-the-art methods on car view estimation tested on two public datasets.
2 0.24608856 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.
3 0.22785595 190 iccv-2013-Handling Occlusions with Franken-Classifiers
Author: Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool
Abstract: Detecting partially occluded pedestrians is challenging. A common practice to maximize detection quality is to train a set of occlusion-specific classifiers, each for a certain amount and type of occlusion. Since training classifiers is expensive, only a handful are typically trained. We show that by using many occlusion-specific classifiers, we outperform previous approaches on three pedestrian datasets; INRIA, ETH, and Caltech USA. We present a new approach to train such classifiers. By reusing computations among different training stages, 16 occlusion-specific classifiers can be trained at only one tenth the cost of one full training. We show that also test time cost grows sub-linearly.
4 0.16222952 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
Author: Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
Abstract: Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraintspermit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
5 0.16052657 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
6 0.15869473 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
7 0.14101215 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
8 0.13347279 81 iccv-2013-Combining the Right Features for Complex Event Recognition
9 0.12597442 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
10 0.11527868 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
11 0.11522391 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
12 0.11509255 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
13 0.11264028 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
14 0.11263606 379 iccv-2013-Semantic Segmentation without Annotating Segments
15 0.11060391 317 iccv-2013-Piecewise Rigid Scene Flow
16 0.11024243 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
17 0.1071303 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
18 0.10672771 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
19 0.10478946 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
20 0.1045586 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
topicId topicWeight
[(0, 0.228), (1, -0.0), (2, 0.001), (3, -0.025), (4, 0.145), (5, -0.076), (6, -0.077), (7, 0.066), (8, -0.062), (9, -0.054), (10, 0.005), (11, -0.013), (12, -0.015), (13, -0.059), (14, 0.011), (15, 0.015), (16, 0.07), (17, 0.095), (18, 0.138), (19, 0.093), (20, -0.088), (21, 0.069), (22, -0.006), (23, -0.061), (24, 0.069), (25, -0.044), (26, -0.1), (27, -0.068), (28, -0.003), (29, -0.09), (30, 0.023), (31, -0.005), (32, -0.025), (33, 0.003), (34, -0.019), (35, 0.028), (36, 0.076), (37, 0.094), (38, -0.056), (39, -0.019), (40, 0.007), (41, -0.144), (42, -0.0), (43, 0.005), (44, 0.053), (45, 0.052), (46, 0.068), (47, 0.069), (48, -0.056), (49, -0.113)]
simIndex simValue paperId paperTitle
same-paper 1 0.96283942 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
Author: Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu
Abstract: Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic programming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estimation. Experimental results show that (i) Our CAD simulation strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) [6, 10] in car detec- , tion on both our self-collected streetparking dataset and the Pascal VOC 2007 car dataset [4], (iii) The learned model is on-par with the state-of-the-art methods on car view estimation tested on two public datasets.
2 0.74362707 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.
3 0.69350761 190 iccv-2013-Handling Occlusions with Franken-Classifiers
Author: Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool
Abstract: Detecting partially occluded pedestrians is challenging. A common practice to maximize detection quality is to train a set of occlusion-specific classifiers, each for a certain amount and type of occlusion. Since training classifiers is expensive, only a handful are typically trained. We show that by using many occlusion-specific classifiers, we outperform previous approaches on three pedestrian datasets; INRIA, ETH, and Caltech USA. We present a new approach to train such classifiers. By reusing computations among different training stages, 16 occlusion-specific classifiers can be trained at only one tenth the cost of one full training. We show that also test time cost grows sub-linearly.
4 0.63471681 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
Author: Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori
Abstract: The appearance of an object changes profoundly with pose, camera view and interactions of the object with other objects in the scene. This makes it challenging to learn detectors based on an object-level label (e.g., “car”). We postulate that having a richer set oflabelings (at different levels of granularity) for an object, including finer-grained subcategories, consistent in appearance and view, and higherorder composites – contextual groupings of objects consistent in their spatial layout and appearance, can significantly alleviate these problems. However, obtaining such a rich set of annotations, including annotation of an exponentially growing set of object groupings, is simply not feasible. We propose a weakly-supervised framework for object detection where we discover subcategories and the composites automatically with only traditional object-level category labels as input. To this end, we first propose an exemplar-SVM-based clustering approach, with latent SVM refinement, that discovers a variable length set of discriminative subcategories for each object class. We then develop a structured model for object detection that captures interactions among object subcategories and automatically discovers semantically meaningful and discriminatively relevant visual composites. We show that this model produces state-of-the-art performance on UIUC phrase object detection benchmark.
5 0.62293631 349 iccv-2013-Regionlets for Generic Object Detection
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
6 0.62281877 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
7 0.60917187 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
8 0.59970832 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
9 0.59643042 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
10 0.58713251 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
11 0.58051038 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
12 0.57829618 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
13 0.57515603 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
14 0.57320309 189 iccv-2013-HOGgles: Visualizing Object Detection Features
15 0.56089091 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
16 0.54569852 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
17 0.53952175 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
18 0.53569907 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
19 0.53526276 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
20 0.52981234 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
topicId topicWeight
[(2, 0.06), (12, 0.012), (13, 0.016), (26, 0.054), (31, 0.335), (42, 0.089), (64, 0.085), (73, 0.031), (78, 0.011), (89, 0.199)]
simIndex simValue paperId paperTitle
1 0.97321987 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
2 0.94828653 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
Author: Carlos Fernandez-Granda, Emmanuel J. Candès
Abstract: We present a framework to super-resolve planar regions found in urban scenes and other man-made environments by taking into account their 3D geometry. Such regions have highly structured straight edges, but this prior is challenging to exploit due to deformations induced by the projection onto the imaging plane. Our method factors out such deformations by using recently developed tools based on convex optimization to learn a transform that maps the image to a domain where its gradient has a simple group-sparse structure. This allows to obtain a novel convex regularizer that enforces global consistency constraints between the edges of the image. Computational experiments with real images show that this data-driven approach to the design of regularizers promoting transform-invariant group sparsity is very effective at high super-resolution factors. We view our approach as complementary to most recent superresolution methods, which tend to focus on hallucinating high-frequency textures.
3 0.93822283 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
4 0.93299532 38 iccv-2013-Action Recognition with Actons
Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.
5 0.92143285 357 iccv-2013-Robust Matrix Factorization with Unknown Noise
Author: Deyu Meng, Fernando De_La_Torre
Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from highdimensional visual data. Factorization approaches to lowrank subspace estimation minimize a loss function between an observed measurement matrix and a bilinear factorization. Most popular loss functions include the L2 and L1 losses. L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive syn- thetic and real-world experiments including structure from motion, face modeling and background subtraction.
6 0.87081873 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
same-paper 7 0.84655195 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
8 0.78572547 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
9 0.78176159 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
10 0.76792228 210 iccv-2013-Image Retrieval Using Textual Cues
11 0.76507103 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
12 0.76085711 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
13 0.73656923 180 iccv-2013-From Where and How to What We See
14 0.7364763 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging
15 0.7263453 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
16 0.72041154 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
17 0.7162717 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
18 0.71540296 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
19 0.71505719 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
20 0.71477139 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow