cvpr cvpr2013 cvpr2013-299 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. [sent-3, score-0.873]
2 Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. [sent-4, score-0.297]
3 Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. [sent-5, score-0.404]
4 Secondly, we employ a global consistency constraint on counts using Markov Random Field. [sent-6, score-0.52]
5 This caters for disparity in counts in local neighborhoods and across scales. [sent-7, score-0.529]
6 We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. [sent-8, score-1.179]
7 Introduction The problem of counting the number of objects, specifically people, in images and videos arises in several realworld applications including crowd management, design and analysis of buildings and spaces, and safety and security. [sent-12, score-0.817]
8 In certain scenarios, obtaining the people count is of direct importance, e. [sent-13, score-0.286]
9 The manual counting of individuals in very dense crowds is an extremely laborious task, but is performed nonetheless by experienced personnel when needed [18]. [sent-16, score-0.624]
10 Computer vision research in the area of crowd analysis has resulted in several automated and semi-automated solutions for density estimation and counting. [sent-17, score-0.689]
11 On average, each image in the crowd counting dataset contains around 1280 humans. [sent-21, score-0.755]
12 1) rather than a few tens of individuals [4, 5]; and (2) reliance on temporal constraints in crowd videos [20], which are not applicable to the more prevalent still images. [sent-24, score-0.767]
13 Some methods proposed in literature for crowd detection perform image segmentation without actual counting or localization [1], while others simply estimate the coarse density range within local regions [24]. [sent-26, score-1.036]
14 In terms of experimental data, most of the existing algorithms for exact counting have been tested on low 222555444755 to medium density crowds, e. [sent-27, score-0.395]
15 , USCD dataset with density of 11 − 46 people per frame [4], Mall dataset with density ooff 1131 − −5 436 6i pndeoivpildeua pelsr per mfraem [4e] [,5 M],a alnl dd aPtaEsTetS wdaitthas deet containing 35 3− in 4di0v people per farmamee [ []9,] a. [sent-29, score-0.563]
16 The proposed approach is motivated by the fact that in extremely dense crowds of people, no single feature or detection method is reliable enough to provide an accurate count due to low resolution, severe occlusion, foreshorten- ing, and perspective. [sent-33, score-0.446]
17 We observe however that densely packed crowds of individuals can be treated as a texture, albeit irregular and inhomogeneous at a coarse scale. [sent-35, score-0.342]
18 Furthermore, there does exist a spatial relationship that is expected to constrain the counting estimates in neighboring local image regions in terms of similarity of counts. [sent-37, score-0.297]
19 This observation has been used successfully for crowd detection in [1], although not for counting or localization. [sent-40, score-0.782]
20 Another main contribution of the proposed framework is the use of frequency-domain analysis in crowd counting. [sent-42, score-0.535]
21 Fourier transform has been used extensively in texture analysis [2], and specifically in crowd analysis [17]. [sent-43, score-0.617]
22 Given geo- metrically arranged texture elements, the Fourier transform can provide reliable estimates of the texton counts [14]. [sent-44, score-0.638]
23 In the domain of crowd counting however, the application of frequency analysis is severely limited due to two main reasons: (1) the spatial arrangement of texture elements is very irregular; and (2) the Fourier transform is not useful in localizing the repeating elements. [sent-45, score-0.898]
24 First, we employ Fourier analysis along with head detections and interest-point based counts in local neighborhoods on multiple scales to avoid the problem ofirregularity in the perceived textures emanating from images of dense crowds. [sent-47, score-0.759]
25 The count estimates from this localized multi-scale analysis are then aggregated subject to global consistency constraints. [sent-48, score-0.312]
26 , Fourier, interest points and head Detection, with their respective confidences, we compute counts at localized patches independently, which are then globally constrained to get an estimate of count for the entire image. [sent-55, score-0.987]
27 We propose a solution to obtain counts from multi-scale grid MRF which infers the solution simultaneously at all scales while enforcing the count consistency constraint. [sent-57, score-0.743]
28 This category of methods however is not useful for the kind of images we deal with, because human, or even head and face detection in these images is difficult due to severe occlusion and clutter, low resolution, and few pixels per individuals due to foreshortening. [sent-64, score-0.352]
29 We demonstrate this fact by reporting quantitative results of detection on our crowd image dataset. [sent-65, score-0.541]
30 Computation of such patterns of motion were also proposed in [22, 23, 12], but not with explicit application to the problem of crowd counting. [sent-67, score-0.514]
31 These algorithms require video frames as input, with reasonably high frame rate for reliable motion estimation, but are not suitable to still images of crowds, or even videos if the individuals in the crowd show nominal or no motion, e. [sent-68, score-0.742]
32 Another category of techniques proposed for crowd 222555444866 counting rely on estimation of direct relationships between low level or local features and counts, by learning regression functions. [sent-71, score-0.791]
33 This assumption is largely invalid in most real world scenarios due to perspective, changes in viewpoint, and changes in crowd density. [sent-74, score-0.514]
34 Chen et al [5] have recently proposed that information sharing among regions should allow more accurate and robust crowd counting. [sent-79, score-0.534]
35 They propose a single multioutput model for joint localized crowd counting based on ridge regression. [sent-80, score-0.783]
36 Their proposed framework employs interdependent local features from local spatial regions as input and people count from individual regions as multidimensional structured output. [sent-81, score-0.347]
37 We also collected, annotated, and tested on a large dataset of real world crowd images. [sent-84, score-0.514]
38 This variation in density may be inherent to the scene that the image captures (different distribution of individuals in different parts of the scene) or it may arise due to the viewpoint and perspective effects of the camera. [sent-90, score-0.369]
39 Thus, the proposed framework begins by counting individuals in small patches uniformly sampled over the image. [sent-92, score-0.573]
40 But, even though the density varies across the image, it does so smoothly, suggesting the density in adjacent patches should be similar. [sent-93, score-0.428]
41 When counting people in patches, we assume the density is uniform but implicitly assume that the number of people in each patch is independent of adjacent one of the few images where head detection gives reasonable results. [sent-95, score-0.89]
42 Once we estimate density or counts in each patch, we remove the independence assumption and place them in multi-scale Markov Random Field to model the dependence in counts among nearby patches. [sent-98, score-1.206]
43 Counting in Patches Given a patch P, we estimate the counts from three dif- ferent and complementary sources, alongside confidences for those counts. [sent-101, score-0.762]
44 The three sources are later combined to obtain a single estimate of count for that patch using the individual counts and confidences. [sent-102, score-0.997]
45 1 HOG based Head Detections The simplest approach to estimate counts is through human detections. [sent-105, score-0.528]
46 However, a quick glance at images of dense crowds reveals that the bodies are almost entirely occluded, leaving only heads for counting and analysis. [sent-106, score-0.441]
47 The consistency in scale and confidence is a measure of how reliable head detections are in that patch. [sent-113, score-0.272]
48 2 Fourier Analysis When a crowd image contains thousands of individuals, with each individual occupying only tens of pixels, especially those far away from the camera in an image with perspective distortion, histograms of gradients do not im- part any useful information. [sent-116, score-0.609]
49 However, a crowd is inherently repetitive in nature, since all humans appear the same from a distance. [sent-117, score-0.514]
50 The positive correlation is evident from the number of local maximas in the reconstructed patch, and the ground truth counts shown at the bottom. [sent-119, score-0.659]
51 , crowd density in the patch is uniform, can be captured by Fourier Transform, f(ξ), where the periodic occurrence of heads shows as peaks in the frequency domain. [sent-122, score-0.965]
52 3 Interest Points based Counting We use interest points not only to estimate counts but also to get a confidence whether the patch represents crowd or not. [sent-131, score-1.277]
53 2) and Fourier Analysis is crowd-blind, it is important to discard counts from such patches. [sent-133, score-0.515]
54 In order to obtain counts or densities using sparse SIFT features, we use Support Vector Regression using the counts computed at each patch from ground truth. [sent-135, score-1.219]
55 From the perspective of Statistics, the number of individuals in a particular patch can be seen as spatial Poisson Counting Process with parameter (corresponds to density), λ, i. [sent-136, score-0.385]
56 123e 4s on the left have confidence of crowd likelihood obtained through Eq. [sent-155, score-0.579]
57 In the top image, the gap between stadium tiers gets low confidence of crowd presence. [sent-157, score-0.579]
58 cGriovewnd a s e=t eofx pp(o−sitλive(+) and negative examples(−), the relative densoifti peso (iftirveeq(u+en)c aiensd n noergmatailvieze edx a bmy paleres(a−) o)f, tthhee rfeelaattuivree vary in positive and negative images, and can be used to identify crowd patches from non-crowd ones. [sent-177, score-0.634]
59 Assuming independence among features, the log-likelihood ϕ(P) of the ratio of patch containing crowd to non-crowd is [1]: log(γ1, γ2, . [sent-178, score-0.713]
60 i (2) The above equation gives us a confidence for presence of crowd in a patch. [sent-190, score-0.6]
61 Fusion of Three Sources For learning and fusion at the patch level, we densely sample overlapping patches from the training images and 222555445088 Σ Σ÷ ÷ Σ Σ÷ ÷ ΣΣ÷ ÷ Figure 5: The figure shown multi-scale Markov random Field for inferring counts for the entire image. [sent-195, score-0.785]
62 using the annotation, obtain counts for the corresponding patches. [sent-197, score-0.495]
63 Computing counts and confidences from the three sources, we scale individual features and regress using ? [sent-198, score-0.58]
64 Counting in Images In order to impose smoothness among counts from different patches, we place them in an MRF framework with grid structure. [sent-202, score-0.495]
65 Then, the beliefs in the groups of 2 2 are added giving the beliefs for the intermediate nodes b2ti× ×a2bo avree tahded ebdo tgtoivmin layer. [sent-256, score-0.291]
66 The sum of labels (counts) at the bottom layer gives the count for the image. [sent-273, score-0.281]
67 6 shows three instances where the estimated count of patch was improved based on neighbors (both spatial and layer). [sent-275, score-0.416]
68 In all cases, the patch under consideration lies in the center of 3 3 patch set. [sent-276, score-0.34]
69 co Inns tthraein fit using oM cRoluFm, thnes, overestimated counts are reduced, becoming closer to ground truth. [sent-278, score-0.534]
70 The patch in the middle had a much lower count than neighbors which after inference increased becoming similar to its neighbors. [sent-280, score-0.42]
71 Although the new estimate is closer to ground truth, the increase is not necessarily correct since the lower count was due to presence of a non-human object (an ambulance). [sent-281, score-0.274]
72 The second row shows the ground truth counts, and the estimated counts before and after MRF inference are shown in third and fourth rows, respectively. [sent-286, score-0.592]
73 consists of 50 images with counts ranging between 94 and 4543 with an average of 1280 individuals per image. [sent-288, score-0.711]
74 One of the images is a painting while another is an abstract depiction of a crowd (the one with the least count, shown in Fig. [sent-290, score-0.514]
75 Some examples of images with the associated ground truth counts can be seen in Fig. [sent-293, score-0.57]
76 tWioen uals eedff itcwieon simple measures dto 5 quantify tshseresults: mean and deviation of Absolute Difference (AD), and mean and deviation of Normalized Absolute Difference (NAD), which was obtained by normalizing the absolute difference with the actual count for each image. [sent-296, score-0.352]
77 The first row in Table 1shows the results of using counts from Fourier Analysis only, giving AD of 703. [sent-299, score-0.539]
78 Including counts from head detections improves AD marginally to 510. [sent-307, score-0.656]
79 Adding counts from regression on sparse SIFT features reduces error in both measures, giving values of 468. [sent-309, score-0.575]
80 Finally, inferring counts for complete images using counts from patches through multi-scale MRF further improves AD taking it to 419. [sent-312, score-1.11]
81 8a show average of actual counts per patch in that image. [sent-318, score-0.741]
82 For easier analysis, the x-axis shows images sorted with respect to actual counts in both plots. [sent-319, score-0.57]
83 It can be seen that AD per patch increases as the actual counts increases, except for the images in the range 25 to 45 with corresponding actual counts in the range of 1000−2500 per image. [sent-320, score-1.312]
84 A NDot, but lowest deviations as well, which means the approach consistently predict correct counts for patches in this range. [sent-322, score-0.642]
85 The reason for better performance in the middle range is obvious: the counts range from 94 −4543, so the largest count ivsi a tsr:e tmheen cdoouunst s4 r8a3n2g%e f orofm mth 9e4 s−m4al5le4s3t, scoou thnet. [sent-323, score-0.723]
86 o A euDn P rCP+atchtaP reP DAtNhc Image number (a) Image number (b) Figure 8: This figure shows analysis of patch estimates in terms of absolute and normalized absolute differences. [sent-327, score-0.345]
87 Means are shown in black asterisk, standard deviations with red bars, and ground truth counts with olive dots. [sent-329, score-0.637]
88 [20], and Lempitsky and Zisserman [13], which were suitable for this dataset since other methods for crowd counting mostly deal with videos or use human detection, and cannot be used for testing on this dataset. [sent-332, score-0.775]
89 The x-axis shows the average counts of each of the 10 groups. [sent-342, score-0.495]
90 Density aware person detection [20] performs best around counts of 1000, but its error increases as we move away. [sent-343, score-0.559]
91 The reason becomes obvious when we look at the absolute counts output by the method in Fig. [sent-344, score-0.554]
92 The reason lies in the algorithm itself, as it is designed to minimize the maximum AD across images when training, and since images with higher counts tend to have higher AD, the learning focuses on such images. [sent-348, score-0.495]
93 The learner gets biased towards high density images, thus, producing a lower AD overall, but overestimating at lower counts (Fig. [sent-349, score-0.672]
94 8a reveals that patch density increases super-linearly for this group, which otherwise is linear for first nine groups. [sent-356, score-0.324]
95 At very high density, the relative frequencies across patches with different density may become similar, resulting in a loss of discriminative power. [sent-359, score-0.274]
96 Conclusion We presented an approach to count number of individuals in extremely dense crowds, on a scale not tackled before. [sent-361, score-0.452]
97 We fuse information from three sources in terms of counts, confidences and different measures at the patch level, and then enforce smoothness constraint on nearby patches to improve estimates of incorrect patches, thereby 222555555311 DNA3201. [sent-362, score-0.466]
98 Possible improvements include explicit preprocessed estimation of crowd density, and making regression an explicit function of density so that it better adapts to various crowd sizes. [sent-375, score-1.218]
99 Privacy preserving crowd monitoring: Counting people without people models or tracking. [sent-404, score-0.682]
100 A neural-based crowd estimation by hybrid global learning algorithm. [sent-417, score-0.514]
wordName wordTfidf (topN-words)
[('crowd', 0.514), ('counts', 0.495), ('counting', 0.241), ('count', 0.202), ('individuals', 0.187), ('patch', 0.17), ('density', 0.154), ('crowds', 0.133), ('nad', 0.131), ('fourier', 0.125), ('patches', 0.12), ('head', 0.109), ('beliefs', 0.101), ('people', 0.084), ('ad', 0.083), ('sources', 0.076), ('confidence', 0.065), ('confidences', 0.064), ('cody', 0.061), ('fifty', 0.061), ('absolute', 0.059), ('layer', 0.058), ('crowded', 0.056), ('detections', 0.052), ('peaks', 0.049), ('mrf', 0.048), ('actual', 0.047), ('repetitions', 0.047), ('tens', 0.046), ('haroon', 0.046), ('imran', 0.046), ('marathons', 0.046), ('giving', 0.044), ('textons', 0.042), ('maximas', 0.04), ('olive', 0.04), ('heads', 0.04), ('ground', 0.039), ('frequency', 0.038), ('person', 0.037), ('estimates', 0.036), ('regression', 0.036), ('gt', 0.036), ('extremely', 0.036), ('truth', 0.036), ('underestimate', 0.035), ('texture', 0.035), ('neighborhoods', 0.034), ('bars', 0.033), ('estimate', 0.033), ('steady', 0.032), ('sift', 0.032), ('dp', 0.032), ('repeats', 0.031), ('per', 0.029), ('independence', 0.029), ('brostow', 0.029), ('localized', 0.028), ('reconstructed', 0.028), ('perspective', 0.028), ('sorted', 0.028), ('dense', 0.027), ('rodriguez', 0.027), ('periods', 0.027), ('florida', 0.027), ('pn', 0.027), ('detection', 0.027), ('deviations', 0.027), ('middle', 0.026), ('transform', 0.026), ('groups', 0.025), ('consistency', 0.025), ('texton', 0.025), ('army', 0.025), ('begins', 0.025), ('ends', 0.024), ('positives', 0.024), ('producing', 0.023), ('repeating', 0.023), ('rabaud', 0.023), ('irregular', 0.022), ('poisson', 0.022), ('cybernetics', 0.022), ('deviation', 0.022), ('estimated', 0.022), ('neighbors', 0.022), ('belief', 0.021), ('evident', 0.021), ('individual', 0.021), ('buildings', 0.021), ('log', 0.021), ('gives', 0.021), ('analysis', 0.021), ('reliable', 0.021), ('scales', 0.021), ('videos', 0.02), ('regions', 0.02), ('densities', 0.02), ('discard', 0.02), ('ebdo', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
2 0.43067363 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
3 0.27465966 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy
Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.
4 0.22504666 282 cvpr-2013-Measuring Crowd Collectiveness
Author: Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Abstract: Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. The algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor1.
5 0.17482588 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
7 0.11646836 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
8 0.10649917 393 cvpr-2013-Separating Signal from Noise Using Patch Recurrence across Scales
9 0.10094873 166 cvpr-2013-Fast Image Super-Resolution Based on In-Place Example Regression
10 0.090446219 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
11 0.085698687 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
12 0.082305536 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition
13 0.081476092 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
14 0.06989979 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
15 0.069563575 169 cvpr-2013-Fast Patch-Based Denoising Using Approximated Patch Geodesic Paths
16 0.066905707 148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes
17 0.066347182 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
18 0.0662902 451 cvpr-2013-Unsupervised Salience Learning for Person Re-identification
19 0.065293327 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
20 0.065006666 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
topicId topicWeight
[(0, 0.185), (1, -0.012), (2, 0.009), (3, -0.03), (4, 0.019), (5, 0.027), (6, -0.013), (7, 0.016), (8, 0.023), (9, 0.013), (10, -0.013), (11, -0.005), (12, 0.071), (13, -0.065), (14, 0.087), (15, 0.023), (16, -0.047), (17, 0.048), (18, 0.132), (19, -0.078), (20, 0.028), (21, 0.199), (22, -0.133), (23, -0.048), (24, -0.11), (25, -0.103), (26, -0.182), (27, -0.214), (28, 0.044), (29, -0.224), (30, -0.055), (31, -0.051), (32, -0.012), (33, 0.225), (34, 0.016), (35, -0.11), (36, 0.088), (37, -0.116), (38, 0.134), (39, -0.11), (40, -0.098), (41, 0.172), (42, -0.207), (43, 0.051), (44, -0.077), (45, -0.127), (46, 0.152), (47, 0.062), (48, -0.05), (49, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.94954747 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
2 0.90009636 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
3 0.86061746 282 cvpr-2013-Measuring Crowd Collectiveness
Author: Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Abstract: Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. The algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor1.
4 0.63793522 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy
Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.
Author: Alessandro Perina, Nebojsa Jojic
Abstract: Recently, the Counting Grid (CG) model [5] was developed to represent each input image as a point in a large grid of feature counts. This latent point is a corner of a window of grid points which are all uniformly combined to match the (normalized) feature counts in the image. Being a bag of word model with spatial layout in the latent space, the CG model has superior handling of field of view changes in comparison to other bag of word models, but with the price of being essentially a mixture, mapping each scene to a single window in the grid. In this paper we introduce a family of componential models, dubbed the Componential Counting Grid, whose members represent each input image by multiple latent locations, rather than just one. In this way, we make a substantially more flexible admixture model which captures layers or parts of images and maps them to separate windows in a Counting Grid. We tested the models on scene and place classification where their com- ponential nature helped to extract objects, to capture parallax effects, thus better fitting the data and outperforming Counting Grids and Latent Dirichlet Allocation, especially on sequences taken with wearable cameras.
6 0.53632641 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
7 0.43394578 166 cvpr-2013-Fast Image Super-Resolution Based on In-Place Example Regression
8 0.34724244 393 cvpr-2013-Separating Signal from Noise Using Patch Recurrence across Scales
9 0.34680456 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
10 0.3451699 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
11 0.31434441 266 cvpr-2013-Learning without Human Scores for Blind Image Quality Assessment
12 0.31357652 169 cvpr-2013-Fast Patch-Based Denoising Using Approximated Patch Geodesic Paths
13 0.31109712 118 cvpr-2013-Detecting Pulse from Head Motions in Video
14 0.30369461 35 cvpr-2013-Adaptive Compressed Tomography Sensing
15 0.29805598 464 cvpr-2013-What Makes a Patch Distinct?
16 0.28912795 195 cvpr-2013-HDR Deghosting: How to Deal with Saturation?
17 0.28365341 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
18 0.28256351 81 cvpr-2013-City-Scale Change Detection in Cadastral 3D Models Using Images
19 0.2748926 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation
20 0.27177244 451 cvpr-2013-Unsupervised Salience Learning for Person Re-identification
topicId topicWeight
[(10, 0.131), (14, 0.147), (16, 0.025), (26, 0.051), (28, 0.023), (33, 0.284), (37, 0.012), (59, 0.012), (67, 0.065), (69, 0.046), (77, 0.016), (87, 0.105)]
simIndex simValue paperId paperTitle
1 0.93247253 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
2 0.9290517 169 cvpr-2013-Fast Patch-Based Denoising Using Approximated Patch Geodesic Paths
Author: Xiaogang Chen, Sing Bing Kang, Jie Yang, Jingyi Yu
Abstract: Patch-based methods such as Non-Local Means (NLM) and BM3D have become the de facto gold standard for image denoising. The core of these approaches is to use similar patches within the image as cues for denoising. The operation usually requires expensive pair-wise patch comparisons. In this paper, we present a novel fast patch-based denoising technique based on Patch Geodesic Paths (PatchGP). PatchGPs treat image patches as nodes and patch differences as edge weights for computing the shortest (geodesic) paths. The path lengths can then be used as weights of the smoothing/denoising kernel. We first show that, for natural images, PatchGPs can be effectively approximated by minimum hop paths (MHPs) that generally correspond to Euclidean line paths connecting two patch nodes. To construct the denoising kernel, we further discretize the MHP search directions and use only patches along the search directions. Along each MHP, we apply a weightpropagation scheme to robustly and efficiently compute the path distance. To handle noise at multiple scales, we conduct wavelet image decomposition and apply PatchGP scheme at each scale. Comprehensive experiments show that our approach achieves comparable quality as the state-of-the-art methods such as NLM and BM3D but is a few orders of magnitude faster.
3 0.91897672 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
Author: Anton Milan, Konrad Schindler, Stefan Roth
Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
same-paper 4 0.91596162 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
5 0.90804994 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
6 0.90703571 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
7 0.90257782 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
8 0.90249842 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
9 0.90169144 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
10 0.90001673 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
11 0.89964932 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery
12 0.89945728 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
13 0.89897346 143 cvpr-2013-Efficient Large-Scale Structured Learning
14 0.89886707 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
15 0.89879078 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
16 0.89858848 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
17 0.89810985 414 cvpr-2013-Structure Preserving Object Tracking
18 0.89750582 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
19 0.89726323 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
20 0.89722002 298 cvpr-2013-Multi-scale Curve Detection on Surfaces