cvpr cvpr2013 cvpr2013-100 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. [sent-5, score-0.921]
2 Through a line sampling process, the video is first converted into a temporal slice image. [sent-6, score-0.593]
3 Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. [sent-7, score-0.777]
4 Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. [sent-8, score-1.094]
5 Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. [sent-9, score-0.437]
6 Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets. [sent-10, score-0.481]
7 Introduction The goal of crowd counting is to estimate the number of people in a region of interest (ROI counting), or passing through a line of interest (LOI counting) in video. [sent-12, score-0.971]
8 Crowd counting has many potential real-world applications, including surveillance (e. [sent-13, score-0.321]
9 , detecting abnormally large crowds, and controlling the number of people in a region), resource management (counting the number of people entering and exiting), and urban planning (identifying the flow rate of people around an area). [sent-15, score-0.561]
10 Beyond people, these counting methods can also be applied to other objects, such as animals passing through a particular boundary, blood cells flowing through a blood vessel under a microscope, and the rate of car traffic. [sent-16, score-0.399]
11 Therefore crowd counting is a crucial topic in video surveillance and other related fields. [sent-17, score-0.716]
12 However, it is still a challenging task because of several factors: 1) in crowded scenes, occlusion between pedestrians is common, especially for large groups in confined areas; 2) the perspective of the scene causes people to appear larger and move faster when they are close to the camera. [sent-18, score-0.406]
13 Line counting example: a) crowd scene and line-ofinterest; b) temporal slice of the scene; c) Flow-mosaicking [1] result where a large blob leads to a big jump in the cumulative count. [sent-51, score-1.303]
14 In contrast, our method can predict instantaneous counts better, yielding a better cumulative prediction over time. [sent-52, score-0.522]
15 Most previous approaches [2–6] focus on solving the ROI counting problem, and are based on the countingby-regression framework, where features extracted from the ROI are directly regressed to the number of people. [sent-54, score-0.402]
16 By bypassing intermediate steps, such as people detection, which can be error-prone on large crowds with severe occlusion, these counting-by-regression methods achieve accurate counts even on sizable crowds. [sent-55, score-0.478]
17 The goal of LOI counting is to count the number of people crossing a line (or visual gate) in the video (see Fig. [sent-57, score-0.886]
18 , the total count since the start of the video, and the instantaneous count, i. [sent-61, score-0.466]
19 , the count at any particular time or short temporal window. [sent-63, score-0.405]
20 A naive approach to LOI counting is to apply ROI counting on the regions on each side of the LOI, and take the count difference. [sent-64, score-0.8]
21 However, this LOI count will be erroneous when people enter and exit the ROIs at the same time, since the number of people in the regions remains the same. [sent-65, score-0.538]
22 [1]) are based on extracting and counting crowd blobs from 222555333977 ? [sent-68, score-0.721]
23 Results of instantaneous count estimation on: a) UCSD and b) LHI datasets. [sent-101, score-0.466]
24 The red and blue segments correspond to crowds moving in different directions, and the instantaneous count estimates appear above and below the image. [sent-103, score-0.636]
25 However, there are several drawbacks of these “blob-centric” methods: 1) because the blob is not counted until it has completely crossed the line, large blobs (e. [sent-107, score-0.268]
26 , containing more than 10 people) yield big jumps in the cumulative count, which leads to poor instantaneous count estimates (see Fig. [sent-109, score-0.609]
27 Moreover, these methods typically require spatial-temporal normalization to handle the differences in pedestrian size due to the camera perspective and pedestrian velocity. [sent-111, score-0.292]
28 Current perspective normalization methods [2, 7] require marking a reference person in different positions in the video. [sent-112, score-0.281]
29 To address the above problems, we propose a novel line counting algorithm that estimates instantaneous people counts using local-level features and regression without perspective normalization (see Fig. [sent-116, score-1.316]
30 First, to overcome the drawbacks of “blob-centric” methods, we propose an integer programming approach to estimate the instantaneous counts on the LOI, from a set of ROI counts in the temporal slice image. [sent-119, score-1.175]
31 The cumulative counts of our method are smoother and more accurate than “blob-centric” methods. [sent-120, score-0.26]
32 Second, we introduce a novel local histogram-of-orientedgradients (HOG) feature, which is robust to the effects of perspective and velocity and yields accurate counts even without spatial-temporal normalization. [sent-121, score-0.316]
33 Third, we demonstrate experimentally that our method can achieve state-ofthe-art results for both cumulative and instantaneous LOI counts on two challenging datasets. [sent-122, score-0.522]
34 Related work Counting-by-regression methods focus on either counting people in a region-of-interest (ROI), or counting people passing through a line-of-interest (LOI). [sent-124, score-0.969]
35 For ROI counting, features are extracted from each crowd segment in an image, and a regression function maps between the feature space and the number of people in the segment. [sent-125, score-0.749]
36 Typically low-level global features are extracted from the crowd segment, internal edges, and textures [ 1, 2, 4, 6]. [sent-126, score-0.457]
37 The segment area is a prototypical feature that can indicate the total number of pedestrians in the segment. [sent-127, score-0.187]
38 [2] shows that there is a near linear relationship between the segment area and the number of pedestrian, as long as the feature extraction process properly weights each pixel according to the perspective of the scene. [sent-128, score-0.164]
39 [2] also provides a method to generate a perspective map, by manually labeling a reference person at opposite positions in the scene. [sent-129, score-0.178]
40 Low-level features can also be extracted from each crowd blob, i. [sent-130, score-0.412]
41 Regression methods include Gaussian process regression [8] or Bayesian Poisson regression [4], which are both kernel methods that can estimate non-linear functions. [sent-133, score-0.164]
42 The crowd density at each pixel is regressed from the feature vector, and the number of pedestrians in a ROI is obtained by integrating over the crowd density map. [sent-135, score-0.836]
43 Line-of-interest (LOI) counting essentially estimates the number of people in a temporal-slice image (e. [sent-136, score-0.484]
44 , the y-t slice of the video volume), the result of which represents the number of people passing through the line within that time window. [sent-138, score-0.576]
45 However, with the basic temporal slice, people moving at fast speeds will have fewer pixels than 222555334088 those moving slowly, thus confounding the regression function. [sent-139, score-0.6]
46 The flow-mosaiking framework [ 1] corrects for this by changing the thickness of the line, based on the the average velocity of the pixels in the crowd blob, resulting in a “flow mosaic”. [sent-140, score-0.403]
47 [7] is used for perspective normalization, and the count in each blob is estimated from low-level features. [sent-141, score-0.423]
48 The blob count can only be estimated after the blob has passed the line, and hence large jumps in the cumulative count can occur, and instantaneous counts (indicating when each person passes the line) are not possible. [sent-142, score-1.332]
49 In contrast to [ 1], our proposed approach performs ROI counting on windows in the temporal slice image, and uses integer programming to recover the instantaneous count on the line. [sent-143, score-1.314]
50 Finally, counting can also be performed using people detection methods [ 1 1–1 3], which are based on “individualcentric” features, i. [sent-147, score-0.465]
51 While this results in a model that is better adapted to varying poses of a single person, it still has problems in detecting partially-occluded people in groups. [sent-151, score-0.167]
52 3a shows an example of a temporal-slice image with a sizable crowd walking in two directions. [sent-155, score-0.384]
53 In this paper, we propose a local HOG descriptor for crowd counting. [sent-159, score-0.405]
54 Examples of local HOG features: a) temporal-slice image; b) image patches and their local HOG features; c) one bin of the bag-of-words histogram versus crowd size. [sent-190, score-0.477]
55 3b presents examples ofthe local HOG features extracted from a crowd in the temporal slice image. [sent-195, score-0.87]
56 Finally, we ycoienldsi tdheere bde applying a weight htoe tehxegradient magnitudes using a spatial Gaussian kernel (similar to the SIFT descriptor [ 10]), but this did not increase the counting accuracy. [sent-200, score-0.335]
57 Global descriptor of local HOG features The number of extracted local HOG features depends on the size of the crowd segments in each video frame, with potentially hundreds of local features extracted per frame due to dense sampling. [sent-203, score-0.689]
58 3c plots the value of one bin of the histogram versus the number of people in the crowd segment. [sent-207, score-0.568]
59 The bin value varies linearly with the number of people, which suggests that the bagof-words of local HOG can be a suitable feature for crowd counting. [sent-208, score-0.427]
60 Normalization will obfuscate the absolute number of codewords in the segment, making histograms from large crowds similar to those from small crowds, which confounds the regression function. [sent-212, score-0.201]
61 Line counting framework In this section, we propose our line counting framework, which is illustrated in Fig. [sent-214, score-0.682]
62 Given an input video se- quence, the video is first segmented into crowds of interest, e. [sent-216, score-0.206]
63 A temporal slice image and temporal slice segmentation are formed by sampling the LOI over time. [sent-259, score-0.928]
64 Next, a sliding window is placed over the temporal slice, forming a set of temporal ROIs. [sent-260, score-0.459]
65 Features are extracted from each temporal ROI, and the number of people in each ROI is estimated using a regression function. [sent-261, score-0.487]
66 Finally, an integer programming approach is used to recover the instantaneous count from the set of temporal ROI counts. [sent-262, score-0.786]
67 Crowd segmentation Motion segmentation is first applied to the video to focus the counting algorithm on different crowds of interest (e. [sent-265, score-0.512]
68 We use a mixture of dynamic textures motion model [ 14] to extract the regions with different crowd flows. [sent-268, score-0.367]
69 The video is divided into a set of spatiotemporal video cubes, from which a mixture of dynamic textures is learned using the EM algorithm [ 14]. [sent-269, score-0.16]
70 Static or very slow moving pedestrians will not be included in the motion segmentation, which is desirable, since the counting algorithm should ignore people who have stopped on the line, in order to avoid double counting. [sent-271, score-0.636]
71 Line sampling and temporal ROI In contrast to flow-mosaicking [ 1], we use line sampling with a fixed line-width to obtain the temporal slice image. [sent-274, score-0.762]
72 4, the input video image and its corresponding segmentation are sampled at the same line per ? [sent-276, score-0.161]
73 a) temporal-slice image, and its b) temporal and c) spatial weighting maps. [sent-391, score-0.223]
74 The sampled image slices and segment slices are collected to form the temporal slice image and temporal slice segmentation, where each column in the slice image corresponds to the LOI at a given time. [sent-393, score-1.238]
75 To obtain the temporal ROIs, a sliding window is moved horizontally across the slice image, using a stepsize of one pixel. [sent-394, score-0.514]
76 Feature extraction Features are extracted from each crowd segment in each temporal ROI. [sent-397, score-0.645]
77 A set of local HOG features is extracted from densely sampled patches over the crowd segment in the temporal ROI. [sent-401, score-0.728]
78 The set of local HOGs is then summarized using the bag-of-words model, as described in Section 3, resulting in a single feature vector for each crowd segment for each ROI. [sent-402, score-0.456]
79 Spatial-temporal normalization Because the temporal slice image is generated using a fixed-width line, the width of a person will change with its velocity. [sent-405, score-0.648]
80 In particular, people moving slowly across the LOI will appear wider than those moving fast, as illustrated in Fig. [sent-406, score-0.301]
81 Hence, temporal normalization is required during 222555444200 Table 1. [sent-408, score-0.323]
82 A temporal weight map wv (x, y) is formed from the tangent velocity of each LOI pixel, estimated with optical flow [ 15] (see Fig. [sent-411, score-0.321]
83 Faster moving people have higher weights, since their features will be present for less time. [sent-413, score-0.254]
84 In addition to the temporal normalization, the features must also be normalized to adjust for perspective effects of the angled camera. [sent-414, score-0.352]
85 Both weighting maps are applied when extracting lowlevel features from the image, yielding a spatio-temporal normalization summarized in Table 1. [sent-417, score-0.178]
86 For example, when the edge is oriented horizontally (θ = 90◦), only the temporal weight is applied, since there is no component of the edge in the spatial direction. [sent-422, score-0.263]
87 However, normalization of the local HOG features is not necessary; our experimental results show similar performance between local HOG with and without spatiotemporal normalization, which indicates the robustness of the feature to perspective and velocity variations. [sent-425, score-0.377]
88 Finally, note that flow mosaicking [ 1] performs temporal normalization by sampling the LOI using a variable linewidth, where the current width is based on the average speed of the crowd blob. [sent-426, score-0.741]
89 Because the same line-width must be applied to the whole blob, blobs containing both fast and slow people will not be normalized correctly. [sent-427, score-0.268]
90 In contrast to [ 1], we use a fixed line-width and per-pixel temporal normalization, which can better handle large crowd blobs with people moving at different speeds (e. [sent-428, score-0.869]
91 Count Regression For each temporal ROI, the count in each crowd segment of the ROI is predicted using a regression function that directly maps between the feature vector (input) and the number of people in the crowd segment (output). [sent-434, score-1.49]
92 Gaussian process regression (GPR) [8] has shown promising results for the people counting task [2]. [sent-435, score-0.547]
93 However, pedestrian counts are discrete non-negative integer values, and hence it is not suitable to use GP regression, which models continuous real-valued outputs. [sent-436, score-0.312]
94 Aiming to take full advantage of Bayesian inference, we use Bayesian Poisson regression [3], which directly learns a regression function with discrete integer outputs. [sent-437, score-0.238]
95 7a presents an example ofthe predicted counts for the temporal ROIs, along with the ground-truth. [sent-440, score-0.371]
96 Instantaneous count estimation In the final stage, the instantaneous counts on the LOI are recovered from the temporal ROI counts using an integer programming formulation. [sent-443, score-1.126]
97 The ith temporal ROI spans time ithrough i+ L − 1, where L is the width of the ROI. [sent-444, score-0.235]
98 Ltimet ni bhreo tuhgeh hco iu +nt L i n− t 1he, withhe temporal eR wOIid, tahn dof sj eb eR tOhIe. [sent-445, score-0.259]
99 The temporal ROI count ni is the sum of the instantaneous counts sj, within the temporal window of the ROI, L−1 ni = si+ si+1 + ··· + si+L−1 = ? [sent-447, score-1.097]
100 , sM]T, where N is the number of temporal ROIs and M is the number of video frames, we have n = As, (2) where A ∈ {0, 1}N×M is an association matrix with entries aij=? [sent-455, score-0.255]
wordName wordTfidf (topN-words)
[('loi', 0.455), ('crowd', 0.341), ('roi', 0.314), ('counting', 0.298), ('instantaneous', 0.262), ('slice', 0.23), ('count', 0.204), ('temporal', 0.201), ('counts', 0.17), ('people', 0.167), ('hog', 0.165), ('blob', 0.143), ('normalization', 0.122), ('pedestrians', 0.099), ('crowds', 0.098), ('cumulative', 0.09), ('line', 0.086), ('regression', 0.082), ('blobs', 0.082), ('crossing', 0.077), ('perspective', 0.076), ('integer', 0.074), ('rois', 0.071), ('segment', 0.066), ('person', 0.061), ('video', 0.054), ('moving', 0.053), ('wpwv', 0.048), ('pedestrian', 0.047), ('programming', 0.045), ('velocity', 0.043), ('cityu', 0.043), ('sizable', 0.043), ('slices', 0.04), ('passing', 0.039), ('bin', 0.037), ('descriptor', 0.037), ('extracted', 0.037), ('sliding', 0.036), ('width', 0.034), ('jumps', 0.034), ('features', 0.034), ('regressed', 0.033), ('wv', 0.033), ('blood', 0.031), ('slowly', 0.028), ('local', 0.027), ('wp', 0.027), ('textures', 0.026), ('spatiotemporal', 0.026), ('horizontally', 0.026), ('speeds', 0.025), ('poisson', 0.023), ('drawbacks', 0.023), ('crowded', 0.023), ('surveillance', 0.023), ('legs', 0.023), ('histogram', 0.023), ('formed', 0.023), ('patches', 0.022), ('feature', 0.022), ('reference', 0.022), ('sampling', 0.022), ('weighting', 0.022), ('occlusion', 0.022), ('flow', 0.021), ('hogs', 0.021), ('hco', 0.021), ('angled', 0.021), ('confounds', 0.021), ('ehea', 0.021), ('gpr', 0.021), ('lhi', 0.021), ('window', 0.021), ('hence', 0.021), ('segmentation', 0.021), ('rbf', 0.021), ('interest', 0.02), ('torsos', 0.02), ('oinu', 0.02), ('abnormally', 0.02), ('crossed', 0.02), ('mosaic', 0.02), ('si', 0.02), ('adjust', 0.02), ('ni', 0.019), ('slow', 0.019), ('opposite', 0.019), ('estimates', 0.019), ('internal', 0.019), ('interval', 0.019), ('cise', 0.019), ('cubes', 0.019), ('microscope', 0.019), ('resource', 0.019), ('confined', 0.019), ('confounding', 0.019), ('corrects', 0.019), ('sj', 0.018), ('edge', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
2 0.43067363 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
3 0.25912619 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy
Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.
4 0.15949681 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
Author: Carlos Arteta, Victor Lempitsky, J. Alison Noble, Andrew Zisserman
Abstract: The objective of this work is to detect all instances of a class (such as cells or people) in an image. The instances may be partially overlapping and clustered, and hence quite challenging for traditional detectors, which aim at localizing individual instances. Our approach is to propose a set of candidate regions, and then select regions based on optimizing a global classification score, subject to the constraint that the selected regions are non-overlapping. Our novel contribution is to extend standard object detection by introducing separate classes for tuples of objects into the detection process. For example, our detector can pick a region containing two or three object instances, while assigning such region an appropriate label. We show that this formulation can be learned within the structured output SVM framework, and that the inference in such model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations – a dot on each instance. The improvement resulting from the addition of the capability to detect tuples of objects is demonstrated on quite disparate data sets: fluorescence microscopy images and UCSD pedestrians.
5 0.1342231 282 cvpr-2013-Measuring Crowd Collectiveness
Author: Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Abstract: Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. The algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor1.
6 0.12932752 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
8 0.1039891 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
9 0.10324201 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
10 0.1013686 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
11 0.092836276 172 cvpr-2013-Finding Group Interactions in Social Clutter
12 0.08844357 187 cvpr-2013-Geometric Context from Videos
13 0.086820818 440 cvpr-2013-Tracking People and Their Objects
14 0.084905006 137 cvpr-2013-Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis
15 0.082671396 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
16 0.082561485 383 cvpr-2013-Seeking the Strongest Rigid Detector
17 0.075179681 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
18 0.074801095 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
19 0.071149871 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
20 0.070322946 139 cvpr-2013-Efficient 3D Endfiring TRUS Prostate Segmentation with Globally Optimized Rotational Symmetry
topicId topicWeight
[(0, 0.146), (1, -0.013), (2, 0.008), (3, -0.08), (4, -0.016), (5, 0.011), (6, 0.008), (7, 0.005), (8, 0.018), (9, 0.039), (10, -0.005), (11, -0.023), (12, 0.103), (13, -0.108), (14, 0.12), (15, 0.062), (16, -0.012), (17, 0.074), (18, 0.059), (19, -0.088), (20, -0.008), (21, 0.179), (22, -0.176), (23, 0.007), (24, -0.105), (25, -0.092), (26, -0.172), (27, -0.16), (28, 0.052), (29, -0.159), (30, -0.026), (31, 0.021), (32, -0.034), (33, 0.223), (34, 0.018), (35, -0.169), (36, 0.147), (37, -0.062), (38, 0.12), (39, -0.074), (40, -0.122), (41, 0.134), (42, -0.224), (43, 0.038), (44, -0.067), (45, -0.183), (46, 0.065), (47, 0.093), (48, -0.049), (49, -0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.95912158 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
2 0.8755008 282 cvpr-2013-Measuring Crowd Collectiveness
Author: Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Abstract: Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields. Collectiveness, which indicates the degree of individuals acting as a union in collective motion, is a fundamental and universal measurement for various crowd systems. By integrating path similarities among crowds on collective manifold, this paper proposes a descriptor of collectiveness and an efficient computation for the crowd and its constituent individuals. The algorithm of the Collective Merging is then proposed to detect collective motions from random motions. We validate the effectiveness and robustness of the proposed collectiveness descriptor on the system of self-driven particles. We then compare the collectiveness descriptor to human perception for collective motion and show high consistency. Our experiments regarding the detection of collective motions and the measurement of collectiveness in videos of pedestrian crowds and bacteria colony demonstrate a wide range of applications of the collectiveness descriptor1.
3 0.85552245 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
Author: Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark con- trast to datasets usedfor existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.
4 0.58355433 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy
Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.
Author: Alessandro Perina, Nebojsa Jojic
Abstract: Recently, the Counting Grid (CG) model [5] was developed to represent each input image as a point in a large grid of feature counts. This latent point is a corner of a window of grid points which are all uniformly combined to match the (normalized) feature counts in the image. Being a bag of word model with spatial layout in the latent space, the CG model has superior handling of field of view changes in comparison to other bag of word models, but with the price of being essentially a mixture, mapping each scene to a single window in the grid. In this paper we introduce a family of componential models, dubbed the Componential Counting Grid, whose members represent each input image by multiple latent locations, rather than just one. In this way, we make a substantially more flexible admixture model which captures layers or parts of images and maps them to separate windows in a Counting Grid. We tested the models on scene and place classification where their com- ponential nature helped to extract objects, to capture parallax effects, thus better fitting the data and outperforming Counting Grids and Latent Dirichlet Allocation, especially on sequences taken with wearable cameras.
6 0.52593672 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
7 0.41163993 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
8 0.36174843 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
9 0.34686545 118 cvpr-2013-Detecting Pulse from Head Motions in Video
10 0.30119228 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
11 0.29037637 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos
12 0.28272185 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video
13 0.27090108 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
14 0.26655269 172 cvpr-2013-Finding Group Interactions in Social Clutter
15 0.26096255 137 cvpr-2013-Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis
16 0.25440827 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
17 0.25356442 35 cvpr-2013-Adaptive Compressed Tomography Sensing
18 0.24785769 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
19 0.24743116 81 cvpr-2013-City-Scale Change Detection in Cadastral 3D Models Using Images
20 0.24646382 187 cvpr-2013-Geometric Context from Videos
topicId topicWeight
[(10, 0.11), (16, 0.015), (26, 0.086), (33, 0.305), (61, 0.183), (65, 0.012), (67, 0.065), (69, 0.041), (80, 0.018), (87, 0.069)]
simIndex simValue paperId paperTitle
1 0.95437282 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
Author: Julio Zaragoza, Tat-Jun Chin, Michael S. Brown, David Suter
Abstract: We investigate projective estimation under model inadequacies, i.e., when the underpinning assumptions oftheprojective model are not fully satisfied by the data. We focus on the task of image stitching which is customarily solved by estimating a projective warp — a model that is justified when the scene is planar or when the views differ purely by rotation. Such conditions are easily violated in practice, and this yields stitching results with ghosting artefacts that necessitate the usage of deghosting algorithms. To this end we propose as-projective-as-possible warps, i.e., warps that aim to be globally projective, yet allow local non-projective deviations to account for violations to the assumed imaging conditions. Based on a novel estimation technique called Moving Direct Linear Transformation (Moving DLT), our method seamlessly bridges image regions that are inconsistent with the projective model. The result is highly accurate image stitching, with significantly reduced ghosting effects, thus lowering the dependency on post hoc deghosting.
2 0.91958821 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
Author: Hao Jiang, Jianxiong Xiao
Abstract: We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter.
3 0.9059965 72 cvpr-2013-Boundary Detection Benchmarking: Beyond F-Measures
Author: Xiaodi Hou, Alan Yuille, Christof Koch
Abstract: For an ill-posed problem like boundary detection, human labeled datasets play a critical role. Compared with the active research on finding a better boundary detector to refresh the performance record, there is surprisingly little discussion on the boundary detection benchmark itself. The goal of this paper is to identify the potential pitfalls of today’s most popular boundary benchmark, BSDS 300. In the paper, we first introduce a psychophysical experiment to show that many of the “weak” boundary labels are unreliable and may contaminate the benchmark. Then we analyze the computation of f-measure and point out that the current benchmarking protocol encourages an algorithm to bias towards those problematic “weak” boundary labels. With this evidence, we focus on a new problem of detecting strong boundaries as one alternative. Finally, we assess the performances of 9 major algorithms on different ways of utilizing the dataset, suggesting new directions for improvements.
same-paper 4 0.89988655 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
Author: Zheng Ma, Antoni B. Chan
Abstract: We propose an integer programming method for estimating the instantaneous count of pedestrians crossing a line of interest in a video sequence. Through a line sampling process, the video is first converted into a temporal slice image. Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count. Given that count in a sliding window is the sum of the instantaneous counts in the corresponding time interval, an integer programming method is proposed to recover the number of pedestrians crossing the line of interest in each frame. Integrating over a specific time interval yields the cumulative count of pedestrian crossing the line. Compared with current methods for line counting, our proposed approach achieves state-of-the-art performance on several challenging crowd video datasets.
5 0.89705992 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition
Author: LiMin Wang, Yu Qiao, Xiaoou Tang
Abstract: This paper proposes motionlet, a mid-level and spatiotemporal part, for human motion recognition. Motionlet can be seen as a tight cluster in motion and appearance space, corresponding to the moving process of different body parts. We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability. Towards this goal, we develop a data-driven approach to learn motionlets from training videos. First, we extract 3D regions with high motion saliency. Then we cluster these regions and preserve the centers as candidate templates for motionlet. Finally, we examine the representative and discriminative power of the candidates, and introduce a greedy method to select effective candidates. With motionlets, we present a mid-level representation for video, called motionlet activation vector. We conduct experiments on three datasets, KTH, HMDB51, and UCF50. The results show that the proposed methods significantly outperform state-of-the-art methods.
6 0.89068991 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
8 0.87988484 311 cvpr-2013-Occlusion Patterns for Object Class Detection
9 0.87455148 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
10 0.87429613 152 cvpr-2013-Exemplar-Based Face Parsing
11 0.87325388 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
12 0.87211359 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
13 0.87188101 96 cvpr-2013-Correlation Filters for Object Alignment
14 0.87113744 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
15 0.87093925 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill
16 0.87084234 88 cvpr-2013-Compressible Motion Fields
17 0.87053019 440 cvpr-2013-Tracking People and Their Objects
18 0.87007642 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
19 0.87005264 325 cvpr-2013-Part Discovery from Partial Correspondence
20 0.86965847 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation