cvpr cvpr2013 cvpr2013-440 knowledge-graph by maker-knowledge-mining

440 cvpr-2013-Tracking People and Their Objects


Source: pdf

Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe

Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de , Abstract Current pedestrian tracking approaches ignore important aspects of human behavior. [sent-3, score-0.308]

2 In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. [sent-6, score-0.267]

3 Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. [sent-7, score-0.3]

4 These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. [sent-8, score-0.355]

5 The inferred interactions can then be used to support tracking by recovering lost object tracks. [sent-9, score-0.447]

6 We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios. [sent-10, score-0.293]

7 Still, most current approaches are so far limited to recognizing and tracking a small number of known object categories, such as pedestrians or cars. [sent-13, score-0.314]

8 Recently, tracking approaches have been extended by social walking models [15] and by modeling of group behavior [4, 11, 16, 20]. [sent-14, score-0.293]

9 However, another major factor that influences peoples’ behavior and dynamics— their interactions with scene objects—has so far been underrepresented. [sent-15, score-0.205]

10 Such interactions are harder to incorporate, since their analysis requires recognizing the presence of objects whose shape and appearance may as yet be unknown. [sent-16, score-0.22]

11 de pull side right pull side left pull right pull left push group Figure 1: Our proposed approach models pairwise interactions between persons and objects in a probabilistic graphical model, taking into account object shape, relative arrangement, and temporal consistency. [sent-20, score-1.203]

12 Thus, it can infer which objects belong to which persons and predict how the interactions will continue. [sent-21, score-0.474]

13 Recognized interactions are visualized by colored lines linking the foot points of interacting objects (Legend: , pull side left, pull right, pull left, push, group). [sent-22, score-0.702]

14 With this paper we present a mobile scene understanding approach for inner-city shopping areas, airports, or train stations. [sent-26, score-0.205]

15 In such scenarios, people often handle luggage items, child strollers, trolleys, etc. [sent-27, score-0.198]

16 Our approach can track persons and other scene objects from a mobile platform and jointly infer both the object class and the interaction type from observed appearances and dynamics. [sent-29, score-0.88]

17 This model can determine which persons and objects belong together and in what way they interact. [sent-31, score-0.266]

18 Based on the recognized interaction, it can then predict how the interaction will most likely continue and how one object’s trajectory will be affected by another object’s observed motion. [sent-32, score-0.316]

19 Realizing such an approach for a mobile platform cannot be done in a standard tracking-by-detection framework 333666555866 dashed lines indicate inference from preceding and to subsequent frames. [sent-33, score-0.198]

20 based on pre-trained object detectors [1, 2, 3, 6, 12, 19], since object class inference will only become possible after an object configuration has already been tracked for several frames. [sent-35, score-0.315]

21 In a tracking-by-detection approach, this would cause a tracking failure. [sent-41, score-0.2]

22 Instead, our approach treats the child as an unknown moving object (visualized by a box) and it can still recognize that this object forms a group with the child’s mother (shown by the green connecting line), thus affecting the mother’s trajectory. [sent-42, score-0.331]

23 In detail, our paper makes the following contributions: (1) We propose a probabilistic graphical model for recognizing pairwise person-object interactions taking into account object shape, relative arrangement, and temporal consistency. [sent-43, score-0.3]

24 This model can jointly infer object classes and interaction patterns more robustly than could be done from individual observations. [sent-44, score-0.312]

25 (2) This scene interpretation allows our approach to make improved predictions for the continuation of each tracked object’s trajectory with increased robustness to occlusions and detection failures. [sent-46, score-0.299]

26 (3) In order to make this approach feasible on noisy stereo depth data, we propose several detailed contributions spanning the entire tracking pipeline. [sent-47, score-0.263]

27 (4) We introduce a novel benchmark dataset for person-object interaction consisting of 325 video sequences with a total of almost 15,000 frames and use it to quantitatively evaluate our approach’s performance. [sent-49, score-0.285]

28 posed graphical model for object and interaction classification. [sent-54, score-0.301]

29 5 integrates the model into a tracking pipeline for robust scene interpretation. [sent-59, score-0.27]

30 Incorporating social walking models into modeling the dynamics of individual pedestrians [15, 20] and groups [4, 11, 16] has been shown to yield significant improvement for tracking in crowded scenes. [sent-66, score-0.341]

31 Similarly, [4] have shown that tracking results can be improved by simultaneously tracking multiple people and estimating their collective activities. [sent-67, score-0.44]

32 However, those approaches consider only other pedestrians as possible scene objects and ignore the impact of a large variety of other objects such as bicycles, child strollers, shopping carts, or wheelchairs often present in street scenes. [sent-68, score-0.48]

33 For example, [17] propose to detect abandoned luggage items by analyzing the size and velocity of tracked foreground blobs. [sent-71, score-0.233]

34 [5] propose a more elaborate approach for carried item detection that compares the segmented object area to learned temporal templates of pedestrian shapes. [sent-72, score-0.245]

35 Recently, [13] has proposed a tracking-before-detection approach that can track both known and unknown object 333666555977 person 2−wheel bag tgh[i]me014. [sent-74, score-0.228]

36 categories from a mobile platform based on stereo data. [sent-91, score-0.212]

37 Their method relies on stereo region-of-interest (ROI) extraction to extract possible object candidates [2, 3, 9, 13] and to track them over time. [sent-92, score-0.202]

38 We take inspiration from this approach in order to develop our model, but significantly extend it with improved methods for candidate object segmentation, data association, and object interaction handling. [sent-93, score-0.296]

39 Modeling Person-Object Interactions We model all person-object interactions in the scene in a pairwise manner. [sent-95, score-0.251]

40 We try to robustly explain what is happening in the scene under the basic assumption that persons’ actions will be the dominant cause of observable object motion, meaning that an object can only move because of a person’s impact. [sent-98, score-0.184]

41 Looking at a scene of various given objects, their past trajectories and current positions, we derive a number of individual and pairwise features to infer the type of interaction. [sent-100, score-0.248]

42 Firstly, we model the appearance of objects and persons and try to assign them to one of the classes: stroller, 2wheel bag, 4-wheel bag, walking aid, person, autonomous (e. [sent-101, score-0.269]

43 For each personobject and person-person pair, we can determine their relative positions in the scene, as well as their relative velocities derived from their trajectories. [sent-104, score-0.203]

44 Together with the object appearances, we use those as features in order to infer the interaction type. [sent-105, score-0.312]

45 wise interaction, the action group is defined as true if and only if two persons belong to the same group of people. [sent-110, score-0.394]

46 An intuitive notion of group transitivity will then allow us to robustly identify all persons belonging to the same group. [sent-111, score-0.191]

47 Since we do not know the entity class a priori, we determine for each interaction a probability for the actor to be a person. [sent-115, score-0.278]

48 At runtime we will then observe the appearances of our actors Yo and Yp, as well as their relative positions and velocities, xrel and vrel, respectively (c. [sent-124, score-0.188]

49 To infer an interaction between these two, as well as the object type and person classification, we perform exact Belief Propagation using the junction tree algorithm [14]. [sent-128, score-0.401]

50 The object-type classifier assumes a correct tracking and the input of a 3D point cloud that only contains points belonging to the person to be classified. [sent-129, score-0.386]

51 For example, one would always expect a stroller that is pushed by a person to be located in front of her (c. [sent-141, score-0.282]

52 In a scene with n entities (persons/dynamic or static objects) there are n · (n 1) pairwise interactoiorn sst. [sent-168, score-0.217]

53 This means that an object o might for example be detected to interact with two persons p1/2 in a scene, being interpreted as − a stroller in the first case and as a suitcase in the second. [sent-172, score-0.432]

54 We incorporate evidence from other interactions in the same scene by marginalizing object types over all pairwise assignments and thus interconnecting all Co and Cp that belong to the same entity. [sent-176, score-0.348]

55 Each entity e interacts with every other of the n entities in two ways, once as object and once asperson. [sent-178, score-0.206]

56 The rationale is that an object that has been detected as a person in one frame is likely (but not certain, due to tracking uncertainties) to be a person again in the next one. [sent-182, score-0.435]

57 For example, tracking can be facilitated in a setting where objects are occluded or lost. [sent-189, score-0.285]

58 Knowing that a person pushed a stroller s in the past frames raises the suspicion he will do so again in the current frame. [sent-190, score-0.345]

59 Furthermore, we can infer the relative position of s to all other entities j ∈ J for the set of all entities J that it interacted with in tjhe ∈ past forra mthee. [sent-193, score-0.275]

60 5 shows an overview of our tracking system that we use for generating observations (the positions, velocities and 3D object shapes) for the proposed graphical model. [sent-210, score-0.403]

61 In each frame, the newly extracted objects are linked to trajectory hypotheses on the ground plane by starting new trajectories backwards in time (e) and extending already existing tracks with new observations (c). [sent-214, score-0.361]

62 5, the model consists of a center axis (which is initially placed at the center position ofeach segmented object) and several height layers from which rays are cast in a fixed number of directions up to the height of the object. [sent-217, score-0.371]

63 With the process so far we obtain an over-complete set of trajectory hypotheses which we prune to a final set mostly consistent with the scene by applying model selection in every frame as proposed by [12]. [sent-221, score-0.244]

64 The initial step of tracking is to generate ROIs for potential objects, given the depth information. [sent-224, score-0.2]

65 However, such a simple approach ignores the fact that the target objects we are interested in for tracking need to be connected to the ground plane. [sent-233, score-0.285]

66 6(1), only the torso of the woman pushing the stroller is visible, which means that only these points will contribute to the histogram bins resulting in a very low bin value, as shown in Fig. [sent-235, score-0.301]

67 The GCTs are generated for each tracker hypothesis by placing the center of the GCT on the initial inlier (segmented region with the 3D points) of the hypothesis and casting radial rays over a number of discrete height levels. [sent-247, score-0.349]

68 From the GCTs we generate for each trajectory hypothesis a volumetric feature (Fig. [sent-252, score-0.242]

69 Thus, for each valid trajectory we compute a volumetric histogram over height bins as follows: |Vi | = ? [sent-254, score-0.359]

70 9 1 (left) using manual point cloud segmentation annotations, (middle) using tracked point cloud data. [sent-270, score-0.207]

71 is the median distance of the ray rj and support(rj) > θ means that we consider only rays that have accumulated at least θ distances already, where θ is interlinked to the lifetime of the GCT. [sent-272, score-0.231]

72 In addition, we exploit GCTs in the model selection procedure, where we model the interaction between trajectories by considering the intersection between the footprints of individual tracks. [sent-275, score-0.335]

73 Modeling object footprints by a fixed rectangular (or circular) shape leads to high interaction costs for close-by objects due to high overlap, as shown in Fig. [sent-279, score-0.373]

74 The projected ray points are weighted by the number of distances of the corresponding ray and thus represent the significance of a ray and the ground projection bin. [sent-284, score-0.279]

75 8(4), the bin intersection between the objects is significantly smaller than in the fixed-footprint case, and using the weighting results in a low intersection value. [sent-286, score-0.229]

76 This extension makes tracking more robust in our scenarios, since our objects of interest are usually situated close to a person. [sent-288, score-0.285]

77 As our tracking core, we employ an extended version of the robust multi-hypothesis tracking framework presented in [12]. [sent-290, score-0.4]

78 From the 3D points of the segmented regions, we generate the footpoint positions of the objects by simply tracking the center of mass of the point cloud and projecting it onto the ground plane. [sent-292, score-0.583]

79 Furthermore, the 3D points are back-projected to the image in order to obtain a color histogram for each object, which is required for the trajectory hypothesis generation process in order to associate the detections. [sent-293, score-0.187]

80 The footpoint positions of the objects are linked to trajectories using a Kalman Filter with a constant-velocity motion model. [sent-294, score-0.263]

81 In each frame, we run two trajectory generation processes: one looking backwards in time in order to generate new trajectories and one looking forward and extending the existing tracks. [sent-295, score-0.236]

82 a50’92 #fasle postivies/miage #fasle postivies/miage Figure 11: Pedestrian tracking performance on (left) BAHNHOF and (right) SUNNY DAY. [sent-309, score-0.2]

83 For each tracked object, we annotated an action and a reference object it is interacting with. [sent-311, score-0.361]

84 For the test dataset, we acquired the images in crowded and challenging shopping streets from a moving platform with different object appearances and dynamics. [sent-313, score-0.314]

85 In total, we have annotated 153 sequences (7885 frames) as training and 172 sequences (7130 frames) as test set in order to asses the performance of our model. [sent-316, score-0.187]

86 The person-object interaction classification strongly depends on the output of the tracker, since it requires positions, velocities and GCTs of the individual objects. [sent-319, score-0.266]

87 For that reason, we first verify that our tracking approach is sufficiently robust for tracking in com- plex mobile scenarios. [sent-320, score-0.476]

88 The sequences were acquired from a similar capturing platform in busy pedestrian scenes. [sent-322, score-0.221]

89 Since our approach tracks all objects in the scene, but in this dataset only the pedestrians are annotated, we classify each segmented ROI using the pedestrian classifier from [7] before passing it to the tracker. [sent-324, score-0.371]

90 Secondly, if both of these objects are persons then we just detected a group, else the baseline? [sent-334, score-0.226]

91 a detector based on a classifier that only takes into account the height of a tracked object as described in Sec. [sent-346, score-0.309]

92 Next, we perform the same experiment on our training data, but this time with actual results from our tracking pipeline instead of tracking results based on annotated object segmentations. [sent-372, score-0.509]

93 Because of the competitive performance of our tracking system, we do not lose much against the results in our experiments before. [sent-375, score-0.243]

94 624 with the full combination of our tracker, object classifier based on GCTs, interaction model and frame inference (c. [sent-381, score-0.329]

95 201 pulsdi30etf4560 predcited tmie setps predcited tmie setps predcited tmie setps Figure 13: Error Bars of position prediction. [sent-400, score-0.717]

96 When we lose track of an object, the Kalman filter will predict future positions based on its underlying motion model. [sent-411, score-0.189]

97 Our inference-based prediction observes the positions of all other entities in the scene and uses the interaction distribution it learned so far to infer the most likely position of the lost object. [sent-412, score-0.593]

98 Conclusion We have presented a framework that can track both known and unknown objects and simultaneously infer which objects belong together. [sent-423, score-0.365]

99 Furthermore, the proposed model can be used to infer object types and the interaction patterns occurring between associated objects. [sent-424, score-0.312]

100 For the future, we plan to extend the model to more object and interaction types. [sent-428, score-0.239]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gct', 0.386), ('gcts', 0.276), ('tracking', 0.2), ('stroller', 0.193), ('interaction', 0.182), ('pull', 0.146), ('persons', 0.141), ('interactions', 0.135), ('trajectory', 0.134), ('height', 0.116), ('action', 0.113), ('kalman', 0.11), ('pedestrian', 0.108), ('entities', 0.101), ('roi', 0.099), ('tracked', 0.095), ('ray', 0.093), ('person', 0.089), ('cp', 0.086), ('objects', 0.085), ('velocities', 0.084), ('luggage', 0.083), ('mitzel', 0.083), ('predcited', 0.083), ('strollers', 0.083), ('tmie', 0.083), ('vrel', 0.083), ('xrel', 0.083), ('track', 0.082), ('rois', 0.081), ('segmented', 0.08), ('rj', 0.079), ('mobile', 0.076), ('child', 0.075), ('bahnhof', 0.073), ('sunny', 0.073), ('setps', 0.073), ('platform', 0.073), ('infer', 0.073), ('scene', 0.07), ('tracker', 0.068), ('positions', 0.064), ('stereo', 0.063), ('frames', 0.063), ('graphical', 0.062), ('trajectories', 0.059), ('rays', 0.059), ('shopping', 0.059), ('object', 0.057), ('pedestrians', 0.057), ('cloud', 0.056), ('volumetric', 0.055), ('abandoned', 0.055), ('asses', 0.055), ('cpe', 0.055), ('fasle', 0.055), ('footpoint', 0.055), ('personobject', 0.055), ('tobias', 0.055), ('arrangement', 0.055), ('lost', 0.055), ('bins', 0.054), ('bin', 0.054), ('hypothesis', 0.053), ('annotated', 0.052), ('jc', 0.052), ('group', 0.05), ('mother', 0.049), ('wheelchairs', 0.049), ('footprints', 0.049), ('inference', 0.049), ('prediction', 0.048), ('co', 0.048), ('entity', 0.048), ('actor', 0.048), ('pairwise', 0.046), ('yo', 0.045), ('seq', 0.045), ('intersection', 0.045), ('interacting', 0.044), ('walking', 0.043), ('lose', 0.043), ('push', 0.043), ('onto', 0.043), ('moving', 0.043), ('cutout', 0.043), ('backwards', 0.043), ('accumulating', 0.043), ('pellegrini', 0.043), ('yp', 0.042), ('appearances', 0.041), ('classifier', 0.041), ('interact', 0.041), ('coe', 0.041), ('crowded', 0.041), ('sequences', 0.04), ('belong', 0.04), ('ess', 0.04), ('hypotheses', 0.04), ('people', 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 440 cvpr-2013-Tracking People and Their Objects

Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe

Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.

2 0.17283918 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

Author: Shoou-I Yu, Yi Yang, Alexander Hauptmann

Abstract: A device just like Harry Potter’s Marauder’s Map, which pinpoints the location ofeachperson-of-interest at all times, provides invaluable information for analysis of surveillance videos. To make this device real, a system would be required to perform robust person localization and tracking in real world surveillance scenarios, especially for complex indoor environments with many walls causing occlusion and long corridors with sparse surveillance camera coverage. We propose a tracking-by-detection approach with nonnegative discretization to tackle this problem. Given a set of person detection outputs, our framework takes advantage of all important cues such as color, person detection, face recognition and non-background information to perform tracking. Local learning approaches are used to uncover the manifold structure in the appearance space with spatio-temporal constraints. Nonnegative discretization is used to enforce the mutual exclusion constraint, which guarantees a person detection output to only belong to exactly one individual. Experiments show that our algorithm performs robust lo- calization and tracking of persons-of-interest not only in outdoor scenes, but also in a complex indoor real-world nursing home environment.

3 0.17272736 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking

Author: unkown-author

Abstract: We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4.

4 0.16682382 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking

Author: Anton Milan, Konrad Schindler, Stefan Roth

Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.

5 0.16198784 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

Author: Nikolaos Kyriazis, Antonis Argyros

Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.

6 0.16026846 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking

7 0.15836537 172 cvpr-2013-Finding Group Interactions in Social Clutter

8 0.15552871 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

9 0.1483667 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms

10 0.14582103 414 cvpr-2013-Structure Preserving Object Tracking

11 0.14208491 314 cvpr-2013-Online Object Tracking: A Benchmark

12 0.13457182 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree

13 0.12830038 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

14 0.126618 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

15 0.12473469 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

16 0.12460931 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models

17 0.12423992 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning

18 0.1227531 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

19 0.12257988 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

20 0.11748455 287 cvpr-2013-Modeling Actions through State Changes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.276), (1, 0.024), (2, 0.014), (3, -0.162), (4, -0.062), (5, -0.059), (6, 0.11), (7, -0.051), (8, 0.073), (9, 0.15), (10, -0.041), (11, -0.062), (12, 0.025), (13, 0.022), (14, -0.04), (15, 0.031), (16, -0.033), (17, 0.113), (18, 0.013), (19, 0.017), (20, 0.062), (21, 0.021), (22, -0.025), (23, 0.131), (24, 0.003), (25, -0.022), (26, 0.036), (27, -0.036), (28, -0.027), (29, -0.074), (30, -0.041), (31, -0.016), (32, -0.004), (33, 0.02), (34, 0.033), (35, -0.028), (36, -0.033), (37, 0.042), (38, 0.029), (39, -0.028), (40, -0.024), (41, 0.034), (42, 0.048), (43, 0.026), (44, 0.124), (45, 0.033), (46, -0.049), (47, -0.064), (48, 0.009), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93254638 440 cvpr-2013-Tracking People and Their Objects

Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe

Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.

2 0.80962116 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

Author: Nikolaos Kyriazis, Antonis Argyros

Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.

3 0.80427533 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof

Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –

4 0.77486551 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

Author: Shoou-I Yu, Yi Yang, Alexander Hauptmann

Abstract: A device just like Harry Potter’s Marauder’s Map, which pinpoints the location ofeachperson-of-interest at all times, provides invaluable information for analysis of surveillance videos. To make this device real, a system would be required to perform robust person localization and tracking in real world surveillance scenarios, especially for complex indoor environments with many walls causing occlusion and long corridors with sparse surveillance camera coverage. We propose a tracking-by-detection approach with nonnegative discretization to tackle this problem. Given a set of person detection outputs, our framework takes advantage of all important cues such as color, person detection, face recognition and non-background information to perform tracking. Local learning approaches are used to uncover the manifold structure in the appearance space with spatio-temporal constraints. Nonnegative discretization is used to enforce the mutual exclusion constraint, which guarantees a person detection output to only belong to exactly one individual. Experiments show that our algorithm performs robust lo- calization and tracking of persons-of-interest not only in outdoor scenes, but also in a complex indoor real-world nursing home environment.

5 0.76999795 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking

Author: Martin Hofmann, Daniel Wolf, Gerhard Rigoll

Abstract: We generalize the network flow formulation for multiobject tracking to multi-camera setups. In the past, reconstruction of multi-camera data was done as a separate extension. In this work, we present a combined maximum a posteriori (MAP) formulation, which jointly models multicamera reconstruction as well as global temporal data association. A flow graph is constructed, which tracks objects in 3D world space. The multi-camera reconstruction can be efficiently incorporated as additional constraints on the flow graph without making the graph unnecessarily large. The final graph is efficiently solved using binary linear programming. On the PETS 2009 dataset we achieve results that significantly exceed the current state of the art.

6 0.76221383 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking

7 0.72783285 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery

8 0.67644203 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models

9 0.65648586 301 cvpr-2013-Multi-target Tracking by Rank-1 Tensor Approximation

10 0.62422585 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking

11 0.60956109 314 cvpr-2013-Online Object Tracking: A Benchmark

12 0.59202003 123 cvpr-2013-Detection of Manipulation Action Consequences (MAC)

13 0.59131438 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms

14 0.59129041 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

15 0.58753991 103 cvpr-2013-Decoding Children's Social Behavior

16 0.58456975 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking

17 0.58384454 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models

18 0.57646024 414 cvpr-2013-Structure Preserving Object Tracking

19 0.56831652 172 cvpr-2013-Finding Group Interactions in Social Clutter

20 0.55508214 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.121), (16, 0.017), (26, 0.399), (33, 0.221), (67, 0.063), (69, 0.055), (87, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86653352 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration

Author: Adrien Bartoli, Toby Collins

Abstract: It has been shown that a surface deforming isometrically can be reconstructed from a single image and a template 3D shape. Methods from the literature solve this problem efficiently. However, they all assume that the camera model is calibrated, which drastically limits their applicability. We propose (i) a general variational framework that applies to (calibrated and uncalibrated) general camera models and (ii) self-calibrating 3D reconstruction algorithms for the weak-perspective and full-perspective camera models. In the former case, our algorithm returns the normal field and camera ’s scale factor. In the latter case, our algorithm returns the normal field, depth and camera ’s focal length. Our algorithms are the first to achieve deformable 3D reconstruction including camera self-calibration. They apply to much more general setups than existing methods. Experimental results on simulated and real data show that our algorithms give results with the same level of accuracy as existing methods (which use the true focal length) on perspective images, and correctly find the normal field on affine images for which the existing methods fail.

2 0.85982907 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization

Author: Liang Li, Wei Feng, Liang Wan, Jiawan Zhang

Abstract: This paper addresses a challenging problem of regularizing arbitrary superpixels into an optimal grid structure, which may significantly extend current low-level vision algorithms by allowing them to use superpixels (SPs) conveniently as using pixels. For this purpose, we aim at constructing maximum cohesive SP-grid, which is composed of real nodes, i.e. SPs, and dummy nodes that are meaningless in the image with only position-taking function in the grid. For a given formation of image SPs and proper number of dummy nodes, we first dynamically align them into a grid based on the centroid localities of SPs. We then define the SP-grid coherence as the sum of edge weights, with SP locality and appearance encoded, along all direct paths connecting any two nearest neighboring real nodes in the grid. We finally maximize the SP-grid coherence via cascade dynamic programming. Our approach can take the regional objectness as an optional constraint to produce more semantically reliable SP-grids. Experiments on object localization show that our approach outperforms state-of-the-art methods in terms of both detection accuracy and speed. We also find that with the same searching strategy and features, object localization at SP-level is about 100-500 times faster than pixel-level, with usually better detection accuracy.

3 0.8379364 281 cvpr-2013-Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation

Author: Jordi Pont-Tuset, Ferran Marques

Abstract: This paper tackles the supervised evaluation of image segmentation algorithms. First, it surveys and structures the measures used to compare the segmentation results with a ground truth database; and proposes a new measure: the precision-recall for objects and parts. To compare the goodness of these measures, it defines three quantitative meta-measures involving six state of the art segmentation methods. The meta-measures consist in assuming some plausible hypotheses about the results and assessing how well each measure reflects these hypotheses. As a conclusion, this paper proposes the precision-recall curves for boundaries and for objects-and-parts as the tool of choice for the supervised evaluation of image segmentation. We make the datasets and code of all the measures publicly available.

same-paper 4 0.83257097 440 cvpr-2013-Tracking People and Their Objects

Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe

Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.

5 0.80577463 152 cvpr-2013-Exemplar-Based Face Parsing

Author: Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe Lin, Jianchao Yang

Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.

6 0.78561592 311 cvpr-2013-Occlusion Patterns for Object Class Detection

7 0.72708708 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill

8 0.72668177 88 cvpr-2013-Compressible Motion Fields

9 0.68929607 21 cvpr-2013-A New Perspective on Uncalibrated Photometric Stereo

10 0.67291802 465 cvpr-2013-What Object Motion Reveals about Shape with Unknown BRDF and Lighting

11 0.67155963 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

12 0.66625738 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

13 0.6575312 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure

14 0.65547258 96 cvpr-2013-Correlation Filters for Object Alignment

15 0.65364164 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

16 0.65246803 208 cvpr-2013-Hyperbolic Harmonic Mapping for Constrained Brain Surface Registration

17 0.65161961 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

18 0.65118086 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm

19 0.65093511 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

20 0.65056622 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition