Abstract: The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm.
same-paper 1 0.86548084 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
Author: Varun Ramakrishna, Takeo Kanade, Yaser Sheikh
Abstract: The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm.
2 0.84936899 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
Author: Joseph J. Lim, C. Lawrence Zitnick, Piotr Dollár
Abstract: We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both topdown and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200× faster than competing methods. We also achieve large improvements ainn dcoetmecptietoinn agc mceutrhaocdys f.o Wr teh ael sboot atochmi-evuep ltaarsgkse of pedestrian and object detection as measured on INRIA [5] and PASCAL [10], respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.
3 0.81215292 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
Author: Asad A. Butt, Robert T. Collins
Abstract: We propose a method for global multi-target tracking that can incorporate higher-order track smoothness constraints such as constant velocity. Our problem formulation readily lends itself to path estimation in a trellis graph, but unlike previous methods, each node in our network represents a candidate pair of matching observations between consecutive frames. Extra constraints on binary flow variables in the graph result in a problem that can no longer be solved by min-cost network flow. We therefore propose an iterative solution method that relaxes these extra constraints using Lagrangian relaxation, resulting in a series of problems that ARE solvable by min-cost flow, and that progressively improve towards a high-quality solution to our original optimization problem. We present experimental results showing that our method outperforms the standard network-flow formulation as well as other recent algorithms that attempt to incorporate higher-order smoothness constraints.
4 0.80524278 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
Author: Jianguo Li, Yimin Zhang
Abstract: This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the wellknown Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the stateof-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.
5 0.80474597 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
6 0.80357218 311 cvpr-2013-Occlusion Patterns for Object Class Detection
8 0.80188811 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
9 0.8010838 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
10 0.79963481 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
11 0.79959875 202 cvpr-2013-Hierarchical Saliency Detection
12 0.79957283 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
13 0.79946691 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
14 0.79933685 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
15 0.7992416 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
16 0.79895824 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation
17 0.79892051 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
18 0.79874504 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
19 0.79863232 334 cvpr-2013-Pose from Flow and Flow from Pose
20 0.79853427 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation