cvpr cvpr2013 cvpr2013-264 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
Reference: text
sentIndex sentText sentNum sentScore
1 Alison Noble1 Andrew Zisserman1 1Department of Engineering Science, University of Oxford, UK 2Skolkovo Institute of Science and Technology, Russia Abstract The objective of this work is to detect all instances of a class (such as cells or people) in an image. [sent-2, score-0.297]
2 The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. [sent-3, score-0.255]
3 Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. [sent-4, score-0.394]
4 Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. [sent-5, score-0.436]
5 For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. [sent-6, score-0.392]
6 We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. [sent-7, score-0.448]
7 The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians. [sent-9, score-0.555]
8 crowds of pedestrians, or animal and plant populations) and within the microscopy domain (cells of in-vitro cultures and developing embryos, blood samples, histopathology images, etc. [sent-13, score-0.183]
9 Such detection can be based on a sliding window or Hough transform, followed by an appropriate non-maxima suppression procedure [3, 8, 14], stochastic fitting of interacting particles or object models [9, 10, 24], or region-based detection [2, 18, 19]. [sent-17, score-0.222]
10 The second class contains the methods that avoid the detection of individual instances but instead perform analysis based on local or global texture and appearance descriptors, e. [sent-18, score-0.353]
11 by recovering the overall realvalued count of objects in the scene [5, 12, 16, 22] or by estimating the local real-valued density of the objects in each location of interest [11, 15]. [sent-20, score-0.263]
12 For the high-density images, however, detection-based analysis may fail badly, especially when the amount ofoverlap and inter-occlusion between objects makes the detection of individual instances hard or impossible even for human experts. [sent-26, score-0.35]
13 The analysis in this case is essentially reduced to texture matching between the test image and the training set, which may be feasible even when individual instances are not distinguishable. [sent-28, score-0.207]
14 an image from a surveillance camera may contain multiple individual pedestrians but also few groups of people which are hard to segment from each other [7]. [sent-33, score-0.277]
15 Likewise, a microscopy image may contain both regions of low and high cell density (sometimes corresponding to different morphological parts or different tissues). [sent-34, score-0.541]
16 The learning in our model is performed based on weak annotation (red dots) and is driven by an instance count loss. [sent-40, score-0.191]
17 Similarly to our initial approach [2], the parsing process is based on an efficient and exact inference procedure that detects a set of non-overlapping extremal regions delivering a maximum to the parsing functional. [sent-47, score-0.672]
18 The learning is performed in a structured SVM framework and optimizes the (convex upper bound on the) counting loss. [sent-48, score-0.249]
19 to choose the groups of the smallest size whenever objects are discernable, as this strategy tends to provide the highest counting accuracy. [sent-51, score-0.279]
20 We conduct a set of experiments with real and synthetic fluorescence microscopy images, as well a surveillance data from the UCSD pedestrian dataset. [sent-52, score-0.51]
21 For all datasets, the proposed method outperformed other detection methods, including a considerable improvement over the baseline [2], and is comparable with the methods that are trained to count (and do not perform detection). [sent-54, score-0.31]
22 Our main contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. [sent-57, score-0.436]
23 Thus rather than trying to reason about the boundary and part assignment between several tightly overlapping regions [3, 8, 14, 21], tuples ofobjects are detected as a whole, making the object detection process more resilient to strong object overlap. [sent-58, score-0.505]
24 Instead, we follow the observation [18] that good object support hypotheses can be provided by extremal regions of the image, for example MSER [17] (Figure 1-b). [sent-65, score-0.69]
25 These regions are well suited to biomedical data [2] and text detection [19]. [sent-66, score-0.324]
26 As an additional contribution, we extend the applicability of this approach by using extremal regions of a derived image (rather than the input image itself). [sent-67, score-0.627]
27 For example, we use the extremal regions of a soft background difference image to generate detection hypotheses for a surveillance image stream (whereas extremal regions of the input images themselves would provide a poor hypotheses set). [sent-68, score-1.476]
28 Our computational model is based on our previous work [2] that also used non-overlapping extremal regions. [sent-73, score-0.446]
29 Whilst that initial model achieves good results on those biomedical datasets where objects are clearly discernable from each other as extremal regions, it struggles to achieve high recall when that is not the case (i. [sent-74, score-0.627]
30 when for some object X, any extremal region containing X also contains another overlapping object Y; in this case [2] has no hope of detecting both X and Y as they have to be detected as separate extremal regions). [sent-76, score-1.183]
31 The Model For an input image I containing multiple instances of an object acnlas ins p(ustom imea gofe Iwh ciocnht may g be m overlapping) we owfa annt to automatically detect the instances and provide an estimate of their location. [sent-82, score-0.369]
32 We start by generating a pool of N nested regions, such that for each pair of regions Ri and Rj in the pool, these regions are either nested (i. [sent-83, score-0.679]
33 In ⊂the R simplest case, a pool can comprise extre∩mRal regions no tfh thee s min-put image (i. [sent-86, score-0.3]
34 io Muso ways, ecrraelalty,ing a new map I where higher-value regions correspond to higher probabilities oerfe an object’s presence. [sent-90, score-0.181]
35 oTrhrees pool toof candidate regions can then be generated as a set of extremal regions in the transformed image I. [sent-91, score-0.959]
36 Oionncse i tnh teh pool nosff noermsteedd regions i. [sent-92, score-0.3]
37 s generated, each region is scored using a set of classifiers that evaluate the similar- ity of such region to each of D classes, where each class signifies the integer number of instances of the object that the region contains (i. [sent-93, score-0.696]
38 Given the scores of the classifiers, an inference procedure selects a non-overlapping subset of regions, and assigns each selected region in the subset a class label, thus indicating the number of objects that our system believes this region represents. [sent-97, score-0.44]
39 The choice of the region subset and the class labels are driven by the optimization process that simply maximizes the total classifier score corresponding to selected regions and class labels subject to the non-overlap constraint. [sent-98, score-0.463]
40 More specifically, let Vi (d) denote the classifier score of a region Ri for class d (the higher the score, the more this region looks like a typical region containing d object centroids). [sent-99, score-0.558]
41 N}, where yi = 0 means that the region Ri {isy n|oit =sele 1c. [sent-104, score-0.189]
42 y ∈ Y (1) This maximization of (1) can be performed exactly and efficiently using dynamic programming (since the region pool has a tree structure this follows from the nestedness property of the regions). [sent-123, score-0.564]
43 Learning the model The model for the evaluation of the regions can learn from weak annotations, i. [sent-127, score-0.181]
44 Such learning is driven by an instance count loss (IC-loss) (2) that penalizes all deviations from the one-to-one correspondences between annotation dots and the selected regions (Figure 1). [sent-130, score-0.511]
45 LetS udjip now w bee th haev neum Mbe trra oinfi dnogt sim caognetasin Ied in the region , and Nj be the total number of dots in Ij . [sent-132, score-0.237]
46 1 Here, the first term penalizes the deviations between the assigned class label of the selected regions and the true number of dots inside of it. [sent-141, score-0.376]
47 h Tehde (ul anstcovered) dots for the yj configuration under the non-overlap dij yij constraint, and thus penalize false negatives (missed detections). [sent-144, score-0.602]
48 Assuming that the properties of each region are characterized by the feature vector we set the classification scores to be linear functions of these feature vectors: (d) = (wd · ), where wd is the parameter vector for the dth class, a·nd f has the same dimensionality as the feature vector. [sent-145, score-0.205]
49 However, when considering the possibility of regions containing multiple objects, we must take into account the increasing intraclass variability (e. [sent-181, score-0.244]
50 of region shape) for higher-order classes that would bias the labels assigned to the regions towards low-order classes. [sent-183, score-0.378]
51 In order to counterbalance such effect, we use a re-scaled penalization based on the true number of dots inside the region . [sent-184, score-0.393]
52 Intuitively, assigning a class 7 to a region that contains 6 instances is not as bad as assigning a class 3 to a region with 2 instances, thus it is not penalized so hard. [sent-185, score-0.602]
53 zero-loss) region configurations can be consistent with such annotation (Figure 1c,e). [sent-191, score-0.186]
54 The maximization of (1) can be performed exactly and efficiently by exploiting the nestedness property of the region pool. [sent-219, score-0.345]
55 Indeed, one can consider a tree-structured model, where each node corresponds to a region and where parent-child links correspond to the nestedness property. [sent-220, score-0.29]
56 Namely, the node Rj becomes a parent of the node Ri if Rj is the smallest region in the pool that Ri strictly belongs to. [sent-221, score-0.267]
57 In this way, because of the nestedness, the region pool can be organized into a forest. [sent-222, score-0.267]
58 (i) i =0 i where p(i) maps region Ri to the number of its parent region (p(i) = 0 for root regions in the forest), Wi (d, d) = 0, = Wi(d, 0) = Vi(d), Wi(0, d > 0) = −∞, and Wi(d1, d2 d1) = −∞ as long as (d02, d> >0. [sent-230, score-0.477]
59 For each selected region Ri we run k-means with k = yi on the image coordinates of all pixels in that region, thus obtaining an estimate for the set of centroids of individual objects. [sent-237, score-0.32]
60 The positive training examples for the binary classifier wd consist of all regions in the training images that contain d dots. [sent-240, score-0.238]
61 Experiments and Results To show the performance and generality of the method presented, results are reported for two different tasks: cell detection in microscopy images (Figure 2) and pedestrians detection in surveillance videos (Figure 3). [sent-244, score-0.674]
62 Our primary metric is mean absolute counting error, which measures the absolute mismatch in the number of objects in an image between the output and the GT. [sent-246, score-0.241]
63 Cell Detection Detecting cells in microscopy images is a challenging task in many real applications. [sent-253, score-0.261]
64 We have selected two datasets to show the applicability of our method for this scenario: a synthetic and a real dataset of fluorescence microscopy. [sent-255, score-0.206]
65 2 Table 1: Accuracy for the synthetic cell dataset and components evaluation. [sent-303, score-0.19]
66 The high cell confluency in the synthetic cell dataset [15] poses a difficult challenge for detection algorithms due to very high cell overlap. [sent-304, score-0.547]
67 Therefore, it is expected that counting algorithms such as [11, 15] would outperform detection methods. [sent-305, score-0.288]
68 Nonetheless, our method is able to produce a comparable mean counting error (MCE), while providing estimates of object localization evaluated with precision and recall. [sent-306, score-0.225]
69 regions without nested regions in the pool) nested within a given region. [sent-314, score-0.56]
70 This last descriptor often indicates the presence of individual objects existing inside the region being encoded. [sent-315, score-0.32]
71 The synthetic dataset of flourescence microscopy from [15] consists of 200 images generated with [13], divided in half for testing and training, with an average number of 171 64 cells per ± 333222333422 Figure 2: (best viewed in color) Results for our method on fluorescence microscopy datasets. [sent-317, score-0.735]
72 The output images show the selected regions, colourcoded according to the estimated number of objects inside of it (green=1, blue=2, purple=3, yellow=4, cyan=5, red=7), also indicated with digits omitting class 1for clarity. [sent-319, score-0.188]
73 Moreover, we compare to the counting methods [11, 15] and the detection method [3]. [sent-326, score-0.288]
74 As expected, the counting algorithms can outperform the detection methods in cases of very high object overlap such as this synthetic cells dataset. [sent-327, score-0.495]
75 The baseline [2], restricted to one object per extremal regions, cannot cope with the level of object clustering in this dataset and thus performs poorly. [sent-329, score-0.574]
76 The proposed method outperforms the two previous methods both in terms of the detection accuracy and the counting accuracy. [sent-342, score-0.288]
77 In general, the proposed method outperformed both competitors, both in terms of detection accuracy and, more substantially, in terms of the counting error. [sent-344, score-0.288]
78 Pedestrian detection We apply our method to detect and count pedestrians in the UCSD surveillance camera dataset [6]. [sent-348, score-0.416]
79 Extremal regions are collected from (c) the soft background difference image (see text), and a portion of those regions is shown over the original image (d). [sent-352, score-0.362]
80 The method selects non-overlapping regions (e) and estimates the number of instances of the object that the region contains, which allows the prediction of the location of the individual instances. [sent-353, score-0.568]
81 Digits indicate the estimated number of instances inside the region, and green regions correspond to single objects. [sent-354, score-0.374]
82 The pedestrians frequently occlude each other and are imaged at a very low resolution (the furthest pedestrians are just a few pixels tall). [sent-356, score-0.21]
83 All this makes detection very hard for this dataset, and although a number of counting methods have been evaluated on it, to the best of our knowledge, we are the first to run detection algorithms. [sent-357, score-0.383]
84 As pedestrians can correspond to both dark and bright regions, we cannot use the extremal regions of the input images. [sent-358, score-0.732]
85 Instead, to generate the tree of regions for this data, we computed the background image using a simple median filtering of a sparsely sampled set. [sent-359, score-0.211]
86 For each frame, we then simply compute the absolute value of the difference with the background and look for extremal regions in this difference image. [sent-360, score-0.627]
87 To reduce the number of candidate regions to a few hundreds, we applied a mild Gaussian smoothing to the difference image (σ = 1pixel). [sent-361, score-0.213]
88 The counting accuracy of our detection method is comparable with the accuracy of methods that are trained to count and are not able to estimate the locations ofindividual pedestrians (even for singletons). [sent-368, score-0.514]
89 For this dataset, we have observed that the method produced classes 1 to 5, indicating that discerning individual instances was harder than in the case of the real cell images. [sent-369, score-0.387]
90 In terms of the detection accuracy, the proposed method has also achieved an improvement over the baseline [2] (Table 4). [sent-370, score-0.189]
91 35 Table 3: Mean absolute errors for people counting in the surveillance video [6]. [sent-398, score-0.258]
92 Our detection method approaches the counting accuracy of the counting methods, while outperforming the baseline detection [2] in all splits. [sent-400, score-0.64]
93 Depending on the difficulty of the detection task, the model has the flexibility to choose groups of variable sizes (including individual instances if the task is easy). [sent-413, score-0.34]
94 The use of the model is particular attractive for biomedical images, where it considerably outperforms the baseline [2] that can only predict individual instances all the time. [sent-418, score-0.319]
95 Thanks to the presented generalization of the region pool generation process, we could also apply the model to object detection in surveillance imagery, obtaining good detection accuracy despite low resolution. [sent-419, score-0.554]
96 One of the limitations of the proposed method appears when the instances become even denser than in the considered datasets and a higher number of classes is needed to parse such images. [sent-421, score-0.187]
97 Finally, it is worth noting that all that is required of the candidate regions is that they are nested. [sent-427, score-0.213]
98 Thus, although we have used extremal regions for candidates, they could instead be generated by hierarchical image segmentation, e. [sent-428, score-0.627]
99 On the detection of multiple object instances using Hough transforms. [sent-454, score-0.265]
100 Computational framework for simulating fluorescence microscope images with cell populations. [sent-525, score-0.278]
wordName wordTfidf (topN-words)
[('extremal', 0.446), ('yij', 0.231), ('counting', 0.193), ('microscopy', 0.183), ('regions', 0.181), ('region', 0.148), ('fluorescence', 0.147), ('nestedness', 0.142), ('instances', 0.138), ('cell', 0.131), ('yj', 0.129), ('count', 0.121), ('pool', 0.119), ('tuples', 0.117), ('dij', 0.111), ('pedestrians', 0.105), ('penalization', 0.101), ('ucsd', 0.101), ('nested', 0.099), ('detection', 0.095), ('hj', 0.095), ('rj', 0.095), ('ri', 0.094), ('dji', 0.093), ('dots', 0.089), ('discernable', 0.085), ('flourescence', 0.085), ('singletons', 0.085), ('crowd', 0.084), ('fij', 0.083), ('cells', 0.078), ('mcj', 0.076), ('individual', 0.069), ('surveillance', 0.065), ('baseline', 0.064), ('nj', 0.063), ('rij', 0.063), ('centroids', 0.062), ('synthetic', 0.059), ('splits', 0.057), ('wd', 0.057), ('arteta', 0.057), ('ayxj', 0.057), ('bernardis', 0.057), ('downscale', 0.057), ('fiaschi', 0.057), ('lapping', 0.057), ('tthhiiss', 0.057), ('wwoorrkk', 0.057), ('pedestrian', 0.056), ('structured', 0.056), ('inside', 0.055), ('maximization', 0.055), ('vij', 0.052), ('class', 0.051), ('loss', 0.05), ('classes', 0.049), ('overlapping', 0.048), ('biomedical', 0.048), ('objects', 0.048), ('wi', 0.047), ('postprocessing', 0.047), ('upscale', 0.047), ('density', 0.046), ('inference', 0.045), ('dilation', 0.044), ('mser', 0.044), ('accomplished', 0.043), ('configuration', 0.042), ('yi', 0.041), ('hough', 0.04), ('barinova', 0.039), ('annotation', 0.038), ('overlap', 0.038), ('programming', 0.038), ('groups', 0.038), ('fj', 0.038), ('lempitsky', 0.037), ('vi', 0.037), ('badly', 0.037), ('intensities', 0.036), ('dot', 0.035), ('auxiliary', 0.034), ('digits', 0.034), ('assigning', 0.033), ('annotations', 0.033), ('dynamic', 0.032), ('possibility', 0.032), ('object', 0.032), ('driven', 0.032), ('purple', 0.032), ('candidate', 0.032), ('containing', 0.031), ('classifiers', 0.031), ('hundred', 0.031), ('hypotheses', 0.031), ('monitoring', 0.031), ('detect', 0.03), ('improvement', 0.03), ('tree', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
2 0.17482588 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
3 0.15949681 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
4 0.13940483 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
5 0.11548999 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
Author: Byung-Woo Hong, Zhaojin Lu, Ganesh Sundaramoorthi
Abstract: In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-toimplement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multilabel Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and thefunctions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the stateof-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.
6 0.10668455 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
7 0.10542698 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
8 0.10510716 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.10202605 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
10 0.10159196 234 cvpr-2013-Joint Spectral Correspondence for Disparate Image Matching
11 0.099377714 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
12 0.095935427 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
13 0.094872423 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
14 0.092196755 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
15 0.08876095 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
16 0.086277343 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
17 0.085821129 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs
18 0.08578717 187 cvpr-2013-Geometric Context from Videos
19 0.085135505 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
20 0.084858432 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization
topicId topicWeight
[(0, 0.241), (1, -0.033), (2, 0.03), (3, -0.042), (4, 0.079), (5, 0.026), (6, 0.052), (7, 0.037), (8, -0.019), (9, 0.011), (10, -0.014), (11, -0.036), (12, 0.042), (13, -0.086), (14, 0.023), (15, 0.006), (16, -0.025), (17, 0.06), (18, 0.058), (19, -0.029), (20, -0.022), (21, 0.077), (22, -0.126), (23, 0.031), (24, 0.019), (25, -0.039), (26, -0.059), (27, -0.034), (28, 0.062), (29, -0.044), (30, 0.038), (31, 0.062), (32, -0.026), (33, 0.108), (34, 0.013), (35, -0.118), (36, 0.06), (37, -0.039), (38, 0.074), (39, -0.056), (40, -0.08), (41, 0.097), (42, -0.103), (43, -0.001), (44, -0.03), (45, -0.002), (46, 0.09), (47, 0.023), (48, 0.043), (49, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.90028024 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
2 0.80468971 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
3 0.78195143 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
4 0.749008 282 cvpr-2013-Measuring Crowd Collectiveness
Author: Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Abstract: Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. The algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor1.
Author: Alessandro Perina, Nebojsa Jojic
Abstract: Recently, the Counting Grid (CG) model [5] was developed to represent each input image as a point in a large grid of feature counts. This latent point is a corner of a window of grid points which are all uniformly combined to match the (normalized) feature counts in the image. Being a bag of word model with spatial layout in the latent space, the CG model has superior handling of field of view changes in comparison to other bag of word models, but with the price of being essentially a mixture, mapping each scene to a single window in the grid. In this paper we introduce a family of componential models, dubbed the Componential Counting Grid, whose members represent each input image by multiple latent locations, rather than just one. In this way, we make a substantially more flexible admixture model which captures layers or parts of images and maps them to separate windows in a Counting Grid. We tested the models on scene and place classification where their com- ponential nature helped to extract objects, to capture parallax effects, thus better fitting the data and outperforming Counting Grids and Latent Dirichlet Allocation, especially on sequences taken with wearable cameras.
6 0.63548756 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
7 0.58415854 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
8 0.58300263 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
9 0.58122659 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
10 0.55708957 171 cvpr-2013-Fast Trust Region for Segmentation
11 0.5517841 281 cvpr-2013-Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation
12 0.54321319 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
13 0.54064214 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
14 0.53745317 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
15 0.53316367 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation
16 0.52697337 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
17 0.51658338 263 cvpr-2013-Learning the Change for Automatic Image Cropping
18 0.5104472 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
19 0.50940001 383 cvpr-2013-Seeking the Strongest Rigid Detector
20 0.50860685 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
topicId topicWeight
[(10, 0.117), (16, 0.032), (26, 0.046), (28, 0.016), (33, 0.249), (37, 0.227), (67, 0.077), (69, 0.057), (87, 0.091)]
simIndex simValue paperId paperTitle
same-paper 1 0.83925378 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
2 0.81361985 334 cvpr-2013-Pose from Flow and Flow from Pose
Author: Katerina Fragkiadaki, Han Hu, Jianbo Shi
Abstract: Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. Motion estimation is often inaccurate under fast movements of body parts. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucialfor extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.
3 0.79664111 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
4 0.79437172 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
Author: Guang Shu, Afshin Dehghan, Mubarak Shah
Abstract: We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specificfeatures like consistent colorpatterns. In our approach, we apply a Superpixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using superpixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background . Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.
5 0.79351604 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
6 0.78907567 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
7 0.78787225 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
8 0.78712326 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
9 0.7870934 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
10 0.78667468 414 cvpr-2013-Structure Preserving Object Tracking
11 0.78654194 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
12 0.78590852 325 cvpr-2013-Part Discovery from Partial Correspondence
13 0.78585523 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
14 0.78485793 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
15 0.78480881 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
16 0.78476149 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
18 0.78455794 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
19 0.78452438 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
20 0.78410691 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval