cvpr cvpr2013 cvpr2013-331 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nikolaos Kyriazis, Antonis Argyros
Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
Reference: text
sentIndex sentText sentNum sentScore
1 We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. [sent-5, score-0.326]
2 Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. [sent-6, score-1.125]
3 We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. [sent-7, score-0.747]
4 This work focuses on a scenario, where the hand of a human actor interacts with a number of objects placed on a table. [sent-12, score-0.507]
5 A key observation is that in such a scenario, the human hand is the single actor and scene state changes can be attributed to the actions of the human hand and their induced consequences. [sent-16, score-0.908]
6 Thus, we model the dynamics of a hand interacting with a physical world. [sent-18, score-0.771]
7 Moreover, 999 (a)(b)(c) Figure 1: The exploitation of the single actor hypothesis through physics modelling, allows physically plausi- ble, heuristic-free 3D tracking of hand-object interactions. [sent-19, score-0.796]
8 (b), (c) By searching for hand motion only, we are able to track the 3D state of the entire scene. [sent-21, score-0.523]
9 The state can be overt (partially visible hand and objects (b)) or even covert (totally occluded objects like the ball-inside-the-cup (c)). [sent-22, score-0.438]
10 we make a distinction between active and passive entities, to come up with effective, physically plausible interpretations of scenes, exhibiting complex hand-object(s) interaction. [sent-23, score-0.327]
11 We monitor the 3D state of the scene by means of tracking (Fig. [sent-24, score-0.391]
12 The objective function is a quantification of the discrepancy between a given hypothesis over the scene state and observations, and is parameterized over a hand motion between two successive instants in time. [sent-26, score-0.792]
13 A hypothesized hand motion is simulated in a physics-based simulation environment that reflects the latest state of the scene, as it has been tracked up to that point in time. [sent-27, score-0.709]
14 The simulated handobject(s) interaction yields an expectation over the appearance of such a hypothesis, that regards both the hand and the object(s). [sent-28, score-0.492]
15 A comparison of this expectation to actual observations quantifies the compatibility of the hand motion hypothesis to the data. [sent-29, score-0.648]
16 A highly preferable hypothesis is one that explains (a) where the hand is in the new tracking frame and (b) the consequences of its interaction with the scene, as those are reflected in the observations. [sent-30, score-0.677]
17 The expectation and comparison mechanisms are implemented as a forward model that accounts for the dynamics and the appearance of a scene. [sent-31, score-0.403]
18 This model is turned into an inference mechanism over the physical state of the scene by means of black-box optimization. [sent-32, score-0.368]
19 Relevant work Our work aims at deriving physically plausible interpretations of the interaction of a human with the environment. [sent-37, score-0.371]
20 [9] used the notion of physical stability to hypothesize physically plausible 3D scene interpretations. [sent-48, score-0.512]
21 Several other approaches consider dynamics explicitly, but restrict understanding to either the actor or a single object, only. [sent-49, score-0.492]
22 [23] modeled the dynamics of the golf swing motion to track golf swings in 3D from a single camera. [sent-52, score-0.539]
23 [24] fused motion planning and contact dynamics to track humans from multiple cameras and a ground assumption. [sent-55, score-0.481]
24 Duff and Wyatt [8] used physical simulation and search heuristics to track a fast moving ball, despite occlusions and for the 2D case. [sent-61, score-0.415]
25 In previous work [13], we performed 3D motion estimation for a bouncing ball, from a single camera and despite severe occlusions by exploiting dynamics modelling. [sent-62, score-0.421]
26 Ye and Liu [26] synthesized physically plausible hand movements, from pour or absent hand observations, that explained the manipulation of objects with known trajectories from a hand whose rough lo- cation was also known. [sent-63, score-1.041]
27 There are also approaches that go beyond abstractions of dynamics while considering ensembles of entities rather than entities in isolation. [sent-64, score-0.524]
28 Salzmann and Urtasun [22] approached the problem of 3D tracking by attributing motion of parts to net forces that act upon them at each tracking frame. [sent-67, score-0.56]
29 In previous work we tracked the constellation of a hand and an object from multiple cameras [17], and the full articulation of two interacting hands from a RGBD sensor [18], all in 3D, by employing synthetic 3D models. [sent-72, score-0.426]
30 While a hand transports an object, we do not directly perceive the hand touching it. [sent-84, score-0.5]
31 Our recently proposed method in [17] demonstrates multicamera-based joint hand-object tracking that is performed based on two criteria: (a) the appearance of the hand-object ensemble matches the observations and (b) the 111000 hand does not share the same space as the object. [sent-88, score-0.537]
32 The laws of physics guarantee that a ball that is trapped between a cup and the table, has to travel inside the cup being moved by a hand. [sent-103, score-0.523]
33 Instead, within our framework and as long as the single actor hypothesis holds, tracking scenes of different cardinalities does not alter the dimensionality of the problem. [sent-107, score-0.562]
34 Methodology Dynamics, as a rich and powerful modelling tool, consti- tutes an excellent framework where the single actor hypothesis is naturally expressed. [sent-109, score-0.462]
35 Additionally, it is powerful because the predictive power of dynamics is the most elaborate reflection of how entities interact in a truly physical world. [sent-111, score-0.554]
36 A hand motion is sought that best explains the evolution of a scene between two consecutive time instants t and t 1. [sent-113, score-0.536]
37 Hand motion is parameterized as the transition from a reference hand pose ht (e. [sent-114, score-0.63]
38 As new observations arrive, a new tracking frame is defined, for which the tracking solution is established by a hypothesizeand-test fashion, driven by Particle Swarm Optimization (PSO) [11]. [sent-117, score-0.486]
39 Hypotheses of hand motion are tested in a physics-based simulation environment and the outlook of the induced scene state is rendered into maps that are comparable to the observations. [sent-118, score-0.782]
40 The sought solution is a physically plausible scene interpretation that is most compatible with the observations. [sent-120, score-0.336]
41 Forward model We use a forward model that regards the physical state of a scene and its appearance, as observed by a camera. [sent-123, score-0.492]
42 Given a hand motion, forward modelling produces two different outputs. [sent-127, score-0.422]
43 First, through dynamics simulation, it updates the poses and velocities of objects, as these have been altered due to the hypothesized hand motion. [sent-128, score-0.569]
44 Second, the resulting scene state is rendered so that a direct comparison between hypotheses and actual observations is possible. [sent-129, score-0.429]
45 All entities are represented in a dynamics simulator (Bullet [6]). [sent-133, score-0.455]
46 Entities are essentially represented as 3D shapes with inertia tensors, masses, friction and restitution coefficients. [sent-134, score-0.332]
47 Still, the selected simulator can generate realistic dynamic behaviour, which is the key in extracting physically plausible scene interpretations. [sent-139, score-0.413]
48 The collision spheres (green) inside the hand model give it physical substance. [sent-152, score-0.772]
49 The hand is able to change the state of the scene by means of forces that are the result of its accelerated surface contacting the surface of the objects. [sent-156, score-0.495]
50 We approximate the effective surface of the hand by a compound of spheres that are strategically inscribed at various locations inside the 3D volume of the hand’s structure (Fig. [sent-157, score-0.394]
51 If sk is the k-th sphere of the collision model and its 3D position is given through the application of the kinematics function Kk (h) for a hand pose h, then for a hand motion from ht to ht+1, sk is given a velocity vk = (Kk (ht+1) − Kk (ht)) /Δt. [sent-159, score-1.273]
52 The spheres of the hand’s collision model are not allowed to rotate, so that all tangential collision energy is transferred to the colliding object and is not spent on the rotation of the spheres, too. [sent-161, score-0.548]
53 The collisions among the spheres are ignored so as to better approximate the flexibility of the hand’s surface, by accounting for the whole hand collision model as a union rather as a collection of independent entities. [sent-163, score-0.596]
54 By modulating the mass and friction coefficient of the spheres the hand becomes less/more capable of manipulating heavy or slippery objects. [sent-164, score-0.631]
55 No gravitational force is assigned to the spheres as it is assumed to always be eliminated by the torques of the hand joints. [sent-165, score-0.482]
56 For a hand motion, a given state of the rest of the scene and a time step, simulation of dynamics is responsible for evolving the scene into a new, physically plausible state. [sent-166, score-1.199]
57 The hand spheres bare kinetic energy and transfer that energy, through collision, to other objects. [sent-167, score-0.459]
58 Dynamics simulation is responsible for applying collision checking, force direction estimation and preservation of energy and momentum in order to transform the old scene state to the new one. [sent-168, score-0.539]
59 (2), si is the collision shape, mi is the mass, Ii is the inertia tensor, Fi is the friction coefficient, Ri is the restitution coefficient, p? [sent-175, score-0.534]
60 2 Appearance model Every hand motion hypothesis yields a new expectation over the physical state of the scene. [sent-182, score-0.842]
61 A hypothesis scores well when its simulated expectations over the scene evolution match the new observations well. [sent-184, score-0.414]
62 For a hand motion hypothesis h (c), a synthetic depth map Idr is rendered (e). [sent-201, score-0.636]
63 Inference In order to infer total state change from new observations we formulate an optimization problem, which we solve for the hand motion alone. [sent-207, score-0.553]
64 All scene changes are attributed to hand intervention, which, in optimization terms, amounts to 27 parameters. [sent-208, score-0.336]
65 Thus, at any tracking iteration at time t, given (a) the hand position ht and (b) the state of the scene St, we seek for a new hand pose ht+1 defined as ht+1 ? [sent-209, score-1.122]
66 (5) ht+1 must be such that the motion of the hand from ht to ht+1 best explains the observed evolution of the scene. [sent-211, score-0.584]
67 Function E defines a penalty to be minimized over hand motion hypotheses. [sent-214, score-0.359]
68 However, we need to penalize for hand motions that contain inter-penetrations of distinct hand subparts (e. [sent-224, score-0.5]
69 CM (h) where function CM provides the collision check pairs for the sub-parts, sk is the k-th collision element and function PD computes pair-wise penetration depth, that is com- puted by the simulator. [sent-229, score-0.49]
70 The data term D combines equations (1), (3) and (4) to quantify the difference between the observation of a scene and the expected outcome of a hand motion hypothesis: D (h? [sent-230, score-0.445]
71 Every invocation involves 3D rendering and dynamics simulation, both being computationally demanding tasks. [sent-262, score-0.341]
72 GPU architectures are used in order to accelerate rendering and multicore CPU architectures are exploited for the acceleration of dynamics simulation for each PSO generation. [sent-263, score-0.486]
73 A hypothesis of a new hand pose ht amounts to a relative motion with respect to 1 13 3 ht−1 . [sent-274, score-0.737]
74 The hypothesis that optimally explains the evolution of a scene in terms of appearance is dubbed as the tracking solution for the current frame. [sent-283, score-0.469]
75 The scene state that accompanies the winning hypothesis replaces St−1 for the next tracking frame. [sent-284, score-0.577]
76 We used GPU threads for 3D rendering and objective evaluation and CPU threads × for dynamics simulation. [sent-287, score-0.341]
77 Acquisition was performed at a 30fps rate and therefore the dynamics simulation time interval was set to Δt = 1/30s. [sent-291, score-0.421]
78 The mass of the hand model collision spheres was set to 1 and the friction factor to 10. [sent-298, score-0.833]
79 Quantitative evaluation For the problem of scene tracking and when a hand is involved, it is very difficult to acquire ground truth information. [sent-303, score-0.535]
80 A human hand grasped a cup firmly, lifted it and moved it around in various angles. [sent-433, score-0.448]
81 In a sequence of 500 frames, the hand grasped the cup firmly in the last 370 frames (see the 1st column of Fig. [sent-434, score-0.492]
82 By construction, the pose of the hand was correlated with the pose of the cup. [sent-436, score-0.336]
83 We tracked this scene and thus gained access to the inferred poses of the hand and the cup. [sent-437, score-0.336]
84 In the synthetic experiments even low budgets suffice for adequately accurate tracking of both the hand and the object. [sent-459, score-0.514]
85 The (correct) motion of all objects was inferred as a consequence ofthe hand’s motion that best explained the observations in total. [sent-471, score-0.347]
86 The first experiment considered a hand and a cup (2nd column of Fig. [sent-476, score-0.409]
87 At this footage the hand picked up the cup and put it back on the table in an upside-down orientation. [sent-478, score-0.409]
88 In the second experiment a hand lifted and manipulated a plastic bowling ball that was barely graspable due to its size (3rd column of Fig. [sent-481, score-0.351]
89 Given enough friction, our hand modelling was able to explain the lifting and manipulation of the object. [sent-483, score-0.349]
90 Even when almost the entire hand was occluded by the ball we came up with plausible hypotheses. [sent-484, score-0.471]
91 One cup trapped the ball and was moved around, moving the ball inside it and pushing other cups when in its way. [sent-489, score-0.429]
92 The hand pushed an empty cup, which in turn pushed the cup containing the ball. [sent-491, score-0.409]
93 As the hand shuffled the cups intense occlusions occurred that did not prevent our framework from maintaining plausible hypotheses about the 3D position and orientation of the fully/partially occluded hand, cups and of the truly invisible ball. [sent-492, score-0.642]
94 This scenario challenged both the dynamics modelling and the optimization module, because stacking of generic geometry is indeed a difficult problem for dynamics simulators to handle stably, which in turn yields an erratic behaviour in the objective function. [sent-496, score-0.651]
95 Summary In this work we enabled the efficient 3D tracking of complex scenes by exploiting the single actor hypothesis. [sent-499, score-0.415]
96 To achieve this, we proposed the use of a dynamics model (physics simulator) and a appearance model (3D rendering) as a powerful, combined forward model, that is turned into an inference mechanism by means of black-box optimization. [sent-500, score-0.349]
97 A natural extension of this work would be to consider a wider observation horizon in order to tackle cases where the hand is not constantly observed to manipulate objects but only initiates motion by passing kinetic energy. [sent-504, score-0.465]
98 Notably, the single actor hypothesis does not constrain the actor to be single but only that all source of state change is directly and efficiently modelled: it can also regard the extension to two active hands, an active body or even active objects, etc. [sent-508, score-0.685]
99 Efficient model-based 3d tracking of hand articulations using kinect. [sent-613, score-0.449]
100 Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. [sent-620, score-0.73]
wordName wordTfidf (topN-words)
[('dynamics', 0.276), ('hand', 0.25), ('actor', 0.216), ('collision', 0.202), ('tracking', 0.199), ('ht', 0.188), ('physical', 0.176), ('friction', 0.161), ('cup', 0.159), ('pso', 0.149), ('hypothesis', 0.147), ('simulation', 0.145), ('spheres', 0.144), ('physically', 0.13), ('plausible', 0.12), ('motion', 0.109), ('state', 0.106), ('physics', 0.104), ('entities', 0.102), ('ball', 0.101), ('modelling', 0.099), ('kyriazis', 0.097), ('observations', 0.088), ('bullet', 0.087), ('permanence', 0.087), ('restitution', 0.087), ('sk', 0.086), ('scene', 0.086), ('rendered', 0.086), ('inertia', 0.084), ('interaction', 0.081), ('observability', 0.081), ('idr', 0.077), ('ido', 0.077), ('oikonomidis', 0.077), ('simulator', 0.077), ('mass', 0.076), ('forward', 0.073), ('rgbd', 0.071), ('interacting', 0.069), ('cups', 0.068), ('budgets', 0.065), ('ilo', 0.065), ('kinetic', 0.065), ('rendering', 0.065), ('hypotheses', 0.063), ('articulation', 0.062), ('velocity', 0.059), ('track', 0.058), ('swarm', 0.058), ('budget', 0.058), ('simulated', 0.056), ('expectation', 0.054), ('instants', 0.054), ('forces', 0.053), ('regards', 0.051), ('brubaker', 0.051), ('golf', 0.048), ('cviu', 0.048), ('popovi', 0.046), ('hands', 0.045), ('yielded', 0.045), ('depth', 0.044), ('abstractions', 0.044), ('blackbox', 0.044), ('delamarre', 0.044), ('duff', 0.044), ('firmly', 0.044), ('gbs', 0.044), ('gravitational', 0.044), ('ilr', 0.044), ('kjellstrom', 0.044), ('papadourakis', 0.044), ('torques', 0.044), ('vondrak', 0.044), ('wyatt', 0.044), ('tensors', 0.043), ('pose', 0.043), ('hypothesized', 0.043), ('objects', 0.041), ('parameterized', 0.04), ('interpretations', 0.04), ('urtasun', 0.04), ('particle', 0.039), ('st', 0.039), ('accompanies', 0.039), ('bmva', 0.039), ('effortlessly', 0.039), ('nikolaos', 0.039), ('grasped', 0.039), ('contact', 0.038), ('entity', 0.038), ('orientation', 0.037), ('evolution', 0.037), ('fleet', 0.037), ('triangular', 0.037), ('exhibiting', 0.037), ('dd', 0.036), ('occlusions', 0.036), ('game', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999887 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
Author: Nikolaos Kyriazis, Antonis Argyros
Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
2 0.17448376 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
Author: Vineet Gandhi, Remi Ronfard
Abstract: We introduce a generative model for learning person and costume specific detectors from labeled examples. We demonstrate the model on the task of localizing and naming actors in long video sequences. More specifically, the actor’s head and shoulders are each represented as a constellation of optional color regions. Detection can proceed despite changes in view-point and partial occlusions. We explain how to learn the models from a small number of labeled keyframes or video tracks, and how to detect novel appearances of the actors in a maximum likelihood framework. We present results on a challenging movie example, with 81% recall in actor detection (coverage) and 89% precision in actor identification (naming).
3 0.16213487 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
4 0.16198784 440 cvpr-2013-Tracking People and Their Objects
Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe
Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.
5 0.15815909 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
Author: Cheng Li, Kris M. Kitani
Abstract: We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding handobject manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand detection, such as gesture interfaces or sign-language recognition, ego-centric videos present new challenges such as rapid changes in illuminations, significant camera motion and complex hand-object manipulations. To quantify the challenges and performance in this new domain, we present a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions. Using both our dataset and a publicly available ego-centric indoors dataset, we give extensive analysis of detection performance using a wide range of local appearance features. Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination. We propose a modeling strategy based on our findings and show that our model outperforms several baseline approaches.
6 0.13617802 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
7 0.12587032 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
8 0.12380564 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
9 0.11654726 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
10 0.11255006 314 cvpr-2013-Online Object Tracking: A Benchmark
11 0.10999618 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
12 0.10276655 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
13 0.10269926 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
14 0.10220528 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms
15 0.099508569 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization
16 0.097533628 334 cvpr-2013-Pose from Flow and Flow from Pose
17 0.096899576 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
18 0.094326675 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
19 0.093959987 226 cvpr-2013-Intrinsic Characterization of Dynamic Surfaces
20 0.091531925 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
topicId topicWeight
[(0, 0.208), (1, 0.09), (2, 0.013), (3, -0.091), (4, -0.045), (5, -0.049), (6, 0.083), (7, -0.044), (8, 0.065), (9, 0.104), (10, -0.027), (11, 0.003), (12, -0.058), (13, 0.069), (14, 0.033), (15, -0.006), (16, 0.027), (17, 0.099), (18, -0.011), (19, 0.028), (20, 0.06), (21, 0.001), (22, -0.008), (23, 0.04), (24, -0.015), (25, 0.024), (26, 0.029), (27, -0.05), (28, -0.063), (29, -0.032), (30, -0.054), (31, -0.007), (32, -0.001), (33, 0.001), (34, -0.034), (35, -0.014), (36, 0.034), (37, 0.051), (38, 0.027), (39, -0.014), (40, -0.027), (41, 0.038), (42, 0.048), (43, 0.049), (44, 0.057), (45, 0.059), (46, 0.037), (47, -0.056), (48, 0.022), (49, 0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.96045756 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
Author: Nikolaos Kyriazis, Antonis Argyros
Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
2 0.82414865 440 cvpr-2013-Tracking People and Their Objects
Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe
Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.
3 0.79697978 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
4 0.74742961 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
Author: Martin Hofmann, Daniel Wolf, Gerhard Rigoll
Abstract: We generalize the network flow formulation for multiobject tracking to multi-camera setups. In the past, reconstruction of multi-camera data was done as a separate extension. In this work, we present a combined maximum a posteriori (MAP) formulation, which jointly models multicamera reconstruction as well as global temporal data association. A flow graph is constructed, which tracks objects in 3D world space. The multi-camera reconstruction can be efficiently incorporated as additional constraints on the flow graph without making the graph unnecessarily large. The final graph is efficiently solved using binary linear programming. On the PETS 2009 dataset we achieve results that significantly exceed the current state of the art.
5 0.68495041 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
Author: Anton Milan, Konrad Schindler, Stefan Roth
Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
7 0.63736695 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
8 0.63662934 301 cvpr-2013-Multi-target Tracking by Rank-1 Tensor Approximation
9 0.63075429 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
10 0.62227911 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
11 0.61427486 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
12 0.61237985 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
13 0.60995144 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms
14 0.60459936 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
15 0.59518987 314 cvpr-2013-Online Object Tracking: A Benchmark
16 0.58182871 267 cvpr-2013-Least Soft-Threshold Squares Tracking
17 0.58164775 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
18 0.56130707 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
19 0.55353385 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
20 0.55268645 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
topicId topicWeight
[(10, 0.116), (16, 0.044), (26, 0.083), (33, 0.209), (47, 0.241), (67, 0.065), (69, 0.074), (87, 0.072)]
simIndex simValue paperId paperTitle
same-paper 1 0.83730191 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
Author: Nikolaos Kyriazis, Antonis Argyros
Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
Author: Jörg Hendrik Kappes, Markus Speth, Gerhard Reinelt, Christoph Schnörr
Abstract: Discrete graphical models (also known as discrete Markov random fields) are a major conceptual tool to model the structure of optimization problems in computer vision. While in the last decade research has focused on fast approximative methods, algorithms that provide globally optimal solutions have come more into the research focus in the last years. However, large scale computer vision problems seemed to be out of reach for such methods. In this paper we introduce a promising way to bridge this gap based on partial optimality and structural properties of the underlying problem factorization. Combining these preprocessing steps, we are able to solve grids of size 2048 2048 in less than 90 seconds. On the hitherto unsolva2b04le8 C×h2i0ne4s8e character dataset of Nowozin et al. we obtain provably optimal results in 56% of the instances and achieve competitive runtimes on other recent benchmark problems. While in the present work only generalized Potts models are considered, an extension to general graphical models seems to be feasible.
3 0.78759879 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys
Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.
4 0.76662046 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations
Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich
Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.
5 0.73188597 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
6 0.72883654 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
7 0.72540027 440 cvpr-2013-Tracking People and Their Objects
8 0.7219975 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
9 0.72135299 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
10 0.7210201 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
11 0.72060865 414 cvpr-2013-Structure Preserving Object Tracking
12 0.72003776 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
13 0.71888494 325 cvpr-2013-Part Discovery from Partial Correspondence
14 0.7171154 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
15 0.71668136 152 cvpr-2013-Exemplar-Based Face Parsing
16 0.716582 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
17 0.71656507 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
18 0.71556967 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
19 0.71489561 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
20 0.71412891 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling