cvpr cvpr2013 cvpr2013-441 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jingchen Liu, Peter Carr, Robert T. Collins, Yanxi Liu
Abstract: We employ hierarchical data association to track players in team sports. Player movements are often complex and highly correlated with both nearby and distant players. A single model would require many degrees of freedom to represent the full motion diversity and could be difficult to use in practice. Instead, we introduce a set of Game Context Features extracted from noisy detections to describe the current state of the match, such as how the players are spatially distributed. Our assumption is that players react to the current situation in only a finite number of ways. As a result, we are able to select an appropriate simplified affinity model for each player and time instant using a random decisionforest based on current track and game contextfeatures. Our context-conditioned motion models implicitly incorporate complex inter-object correlations while remaining tractable. We demonstrate significant performance improvements over existing multi-target tracking algorithms on basketball and field hockey sequences several minutes in duration and containing 10 and 20 players respectively.
Reference: text
sentIndex sentText sentNum sentScore
1 edu l Abstract We employ hierarchical data association to track players in team sports. [sent-3, score-0.949]
2 Instead, we introduce a set of Game Context Features extracted from noisy detections to describe the current state of the match, such as how the players are spatially distributed. [sent-6, score-0.522]
3 Our assumption is that players react to the current situation in only a finite number of ways. [sent-7, score-0.48]
4 As a result, we are able to select an appropriate simplified affinity model for each player and time instant using a random decisionforest based on current track and game contextfeatures. [sent-8, score-0.777]
5 We demonstrate significant performance improvements over existing multi-target tracking algorithms on basketball and field hockey sequences several minutes in duration and containing 10 and 20 players respectively. [sent-10, score-1.006]
6 Surveillance is perhaps the most common scenario for multi-target tracking, but team sports is another popular domain that has a wide range of applications in strategy analysis, automated broadcasting, and content-based retrieval. [sent-13, score-0.467]
7 Tracking players in team sports has three significant differences compared to pedestrians in surveillance. [sent-16, score-0.918]
8 The global distribution of players often indicates which team is attacking, and local distributions denote when opposing players are closely following each other. [sent-23, score-1.141]
9 We use contextual information such as this to create a more accurate motion affinity model for tracking players. [sent-24, score-0.349]
10 The overhead views of basketball and field hockey show the input detections and corresponding ground truth annotations. [sent-25, score-0.418]
11 because players on the same team will be visually similar. [sent-27, score-0.7]
12 For example, opposing players may exhibit strong local correlations when ‘marking’ each other (such as one-on-one defensive assignments). [sent-30, score-0.498]
13 Similarly, players who are far away from each other move in globally correlated ways because they are reacting to the same ball. [sent-31, score-0.467]
14 [4] modeled intertarget correlations between pedestrians using context which consisted of additional terms in the data association affinity measure based on the spatiotemporal properties of tracklet pairs. [sent-35, score-0.543]
15 Following this convention, we will describe correlations between player movements in terms of game context. [sent-36, score-0.699]
16 Much like the differences between the individual target motions in surveillance and team sports, game context is more complex and dynamic compared to context in surveillance. [sent-37, score-0.76]
17 For example, teams will frequently gain and lose posses- sion of the ball, and the motions of all players will change drastically at each turnover. [sent-38, score-0.529]
18 Because a player’s movement is influenced by multiple factors, the traditional multi-target tracking formulation using a set of independent autoregressive motion models is a poor representation of how sports players move. [sent-39, score-0.912]
19 However, motion affinity models conditioned on multiple targets (and that do not decompose into a product of pairwise terms) make the data association problem NP-hard [7]. [sent-40, score-0.417]
20 In this work, we show how data association is an effective solution for sports player tracking by devising an accurate model of player movements that remains tractable by conditioning on features describing the current state of the game, such as which team has possession of the ball. [sent-41, score-1.65]
21 One of our key contributions is a new set of broad game context features (GCF) for team sports and their estimation from noisy player detections. [sent-42, score-1.19]
22 As a result, we can better assess the affinity between trajectory segments by implicitly modeling complex interactions through a random decision forest based on track and game context features. [sent-43, score-0.657]
23 We demonstrate the ability to track 20 players in over 30 minutes of international field hockey matches, and 10 players in 5 minutes of college basketball. [sent-44, score-1.182]
24 Related Work Recent success in pedestrian tracking has posed multitarget tracking as data association: long object trajectories are found by linking together a series of detections or short tracklets. [sent-46, score-0.734]
25 Data association is often formulated as a linear assignment problem where the cost of linking one tracklet to another is some function of extracted features (typically motion and appearance). [sent-48, score-0.453]
26 Often, the affinity for linking two tracklets together depends on how well the hypothesized motion agrees with one of the global motions. [sent-52, score-0.372]
27 [17] generated a Bayes network of splitting and merging tracklets for a long ten minute soccer sequence, and found the most probable assignment of player identities using max-margin message passing. [sent-58, score-0.601]
28 In both pedestrian and player tracking, object motions are often assumed to be independent and modeled as zero displacement (for erratic motion) and/or constant velocity (for smooth motion governed by inertia). [sent-59, score-0.695]
29 In reality, the locations and motions of players are strongly correlated. [sent-60, score-0.527]
30 Recently, multi-object motion models have been used in pedestrian tracking to anticipate how people will change their trajectories to avoid collisions [18], or for estimating whether a pair of trajectories have correlated motions [4]. [sent-62, score-0.87]
31 [12] estimated motion fields using the velocities of tracklets to anticipate how the play would evolve, but did not use the motion fields to track players over long sequences. [sent-64, score-0.837]
32 , TN} of object trajectories wprhoebraeb leeac she trajectory is, a temporal sequence otf t rdaejetecctotiorienss Tn = {Oa, Ob, . [sent-72, score-0.378]
33 t (3) where Tnt indicates the trajectory of the nth player at time iwnhteerrveal T t. [sent-86, score-0.509]
34 In team sports, the prior is a highly complex function and is not well approximated by a series of independent trajectory assessments. [sent-87, score-0.421]
35 We maintain the formulation of conditional independence between trajectories, but condition each individual trajectory prior on a set of game context features θ which describe the current state of the match P(T ) d=ef ? [sent-88, score-0.457]
36 (4) Conditioning the individual motion models on game context implicitly encodes higher-order inter-trajectory relationships and long-term intra-trajectory information without sacrificing tractability. [sent-91, score-0.438]
37 Hierarchical Association Because the solution space of data association grows exponentially with the number of frames, we adopt hierarchical association to handle sequences that are several minutes long (see Fig. [sent-94, score-0.535]
38 Low-Level Trajectories A set Υ of low-level tracklets is extracted from the detections by fitting constant velocity models to clusters of detections in 0. [sent-96, score-0.376]
39 Mid-Level Trajectories Similar to [9], the Hungarian algorithm is used to combine subsequent low-level trajectories into a set Γ of mid-level trajectories up to 60s in duration. [sent-100, score-0.394]
40 Generally, mid-level trajectories terminate when abrupt motions occur or when a player is not detected for more than two seconds. [sent-102, score-0.661]
41 High-Level Trajectories MAP association is equivalent to minimum cost flow in a cost flow network [27] where a vertex iis defined for each mid-level trajectory Γi and edge weights reflect the likelihood and prior in (4). [sent-103, score-0.462]
42 The complete trajectory Tn of each player corresponds to (a)(b)(c) Figure 2. [sent-106, score-0.509]
43 (a) low-level tracklets Υ from noisy detections; (b) mid-level trajectories Γ obtained via the Hungarian algorithm [9]; (c) N high-level player trajectories T via a gcoarstia fnlo awlg noeritwthmork [ [27]. [sent-108, score-0.936]
44 The probability that Γi and Γj belong to the same team is Pa(Γi → Γj) = ai · aj + (1 − ai) · (1 − aj) (7) where ai and 1 − ai are the confidence scores of the midlevel trajectory belonging to team A and B respectively. [sent-118, score-0.822]
45 Before describing the form of Pm (Γi → Γj |θ) in more detail, we first discuss how to extract a set→ →of Γ game cno mnteoxret features θ from noisy detections O. [sent-124, score-0.402]
46 Game Context Features In team sports, players assess the current situation and react accordingly. [sent-126, score-0.767]
47 As a result, a significant amount of contextual information is implicitly encoded in player locations. [sent-127, score-0.401]
48 In practice, the set ofdetected player positions in each frame contains errors, including both missed detections and false detections. [sent-128, score-0.519]
49 We introduce four features (two global and two local) for describing the current game situation with respect to a pair of trajectories that can be extracted from a varying number of noisy detected player locations O . [sent-129, score-0.92]
50 Absolute Occupancy Map We describe the distribution of players during a time interval using an occupancy map, which is a spatial quantization of the number of detected players, so that we get a description vector of constant length regardless of miss detections and false alarms. [sent-132, score-0.702]
51 The underline assumption is that players may exhibit different motion patterns under different spatial distributions. [sent-134, score-0.502]
52 For example, a concentrated distribution may indicate a higher likelihood of abrupt motion changes, and smooth motions are more likely to happen during player transitions with a spread-out distribution. [sent-135, score-0.594]
53 We compute a time-averaged player count for each quantized area. [sent-136, score-0.375]
54 When evaluating the affinity for Γi → Γj, we average the occupancy vector over the time win→dow Γ (ti1, tj0) and the nearest cluster ID is taken as the context feature of absolute occupancy = k ∈ {1, . [sent-141, score-0.494]
55 Relative Occupancy Map The relative distribution of players is often indicative of identity [17]. [sent-151, score-0.413]
56 For example, a forward on the right side typically remains in front and to the right of teammates regardless of whether the team is defending in the back-court or attacking in the front-court. [sent-152, score-0.433]
57 Additionally, the motion of a player is often influenced by nearby players. [sent-153, score-0.464]
58 Like absolute occupancy maps, we cluster the 16 bin relative occupancy counts (first 8 bins describing same-team distribution, last 8 bins describing opponent distribution) using K-means. [sent-157, score-0.426]
59 For each pair of (Γi, Γj), we extract the occupancy vector vi and vj, with cluster ID ki, kj, from the end tracklet of Γi and the beginning tracklet of Γj . [sent-158, score-0.382]
60 Focus Area In team sports such as soccer or basketball, there is often a local region with relatively high player density that moves smoothly in time and may indicate the current or future location of the ball [12,20]. [sent-167, score-0.868]
61 We assume the movement of individual players should correlate with the focus area over long time periods, thus this feature is useful for associations Γi → Γj with large temporal gaps (when the motion prediction→ →is Γalso less reliable). [sent-169, score-0.758]
62 We estim→ate Γ the location and movement of th→e f oΓcus area by applying meanshift mode-seeking to track the local center of mass of the noisy player detections. [sent-174, score-0.511]
63 Given a pair of mid-level trajectories (Γi, Γj), we interpolate the trajectory within the temporal window (ti1, tj0) and calculate the variance ofits relative distance to the trajectory ofthe focus area σij . [sent-175, score-0.537]
64 Because player motions are globally correlated, the affinity of two mid-level trajectories over large windows should agree with the overall movement trend of the focus area. [sent-181, score-0.799]
65 Chasing Detection Individual players are often instructed to follow or mark a particular opposition player. [sent-184, score-0.413]
66 Basketball, for example, commonly uses a one-on-one defense system where a defending player is assigned to follow a corresponding attacking player. [sent-185, score-0.472]
67 We introduce chasing (close-interaction) links to detect when one player is marking another. [sent-186, score-0.544]
68 If trajectories Γi and Γj both appear to be following a nearby reference trajectory Γk, there is a strong possibility that Γj is the continuation of Γi (assuming the mid-level trajectory of the reference player is continuous during the gap between Γi and Γj, see Fig. [sent-187, score-0.93]
69 The chasing continuity feature that measures whether trajectories Γi and Γj are marking the same player is given by θi(Cj) =k=m1,. [sent-194, score-0.743]
70 1) and game context features using a Random Decision Forest, which is robust against the overfitting that might occur when using limited training data via bootstrapping, especially when the data is not easily separable due to association ambiguity in the real world. [sent-203, score-0.528]
71 6), the occupancy-feature is more effective at handling short-term association (when feature tg is small)and the chasing-feature is more important in connecting trajectories with long temporal gaps (tg is big). [sent-206, score-0.554]
72 Kinematic features 111888333422 We generate training data by extracting kinematic fea- fi(jK) tures and game context features θij for all pairs of mid-level trajectories (Γi, Γj). [sent-209, score-0.559]
73 Using ground truth tracking data, we assign binary labels yij ∈ {1, 0} indicating whether the association Γi → Γj is co∈rrec {t1 or }no itn. [sent-210, score-0.396]
74 , long gap association and short gap association may be split at different levels and handled with different feature sets. [sent-214, score-0.544]
75 Experiments We validate our framework on two sports: field hockey with 20 players and basketball with 10 players. [sent-218, score-0.747]
76 We use simple RGB-based color histogram classifiers to estimate the confidence score ai ∈ [0, 1] of tracklet ibelonging to team 0 or 1. [sent-220, score-0.43]
77 All models apply hierarchical association and start with the same set of mid-level trajectories Γ. [sent-225, score-0.402]
78 The only difference between the models is the motion affinity used during the final association stage. [sent-226, score-0.39]
79 We have also examined other features for describing aspects of game context, such as variance of tracklet velocity or team separability. [sent-230, score-0.76]
80 Three errors are commonly evaluated in the multi-target tracking literature: (1) the number of incorrect associations Nerr, (2) the number of missed detections Nmiss, and (3) the number of false detections Nfa. [sent-233, score-0.464]
81 t In addition to overall tracking performance, we also evaluate in isolation the high-level association stage Γ → Teva, wuahtiech in ni iss othlaet key part gohf- our lf arassmoecwiaotirokn. [sent-237, score-0.369]
82 We define recall = 1 − Tmiss/Tgap, where Tgap i→s th Γe accumulation of temporal gaps tgap between high-level associations, and Tmiss is the total length of mid-level trajectories Γi being missed. [sent-239, score-0.394]
83 poral gap tgap being correctly associated during the highlevel association, which reflects the algorithm’s ability to associate trajectories with long-term misses. [sent-245, score-0.359]
84 The average player detection miss and false-alarm rates are 14. [sent-249, score-0.4]
85 On the other hand, the absolute and relative player distributions feature has the smallest temporal gap, indicating it is more useful for short-term misses. [sent-258, score-0.478]
86 Thus the major difference in their performances is reflected in the terms for incorrect association Ne∗rr and miss association Perr. [sent-261, score-0.435]
87 Trade-off curve between Pmiss and Ne∗rr for (a) field hockey sequences and (b) basketball sequences. [sent-263, score-0.369]
88 On the other hand, same-game learning outperforms cross-game learn- ing in terms of generalization, which matches our intuition that the game context features are more similar within the same game with the same players, e. [sent-282, score-0.585]
89 The dataset is more challenging due to a higher player density and less training data. [sent-288, score-0.375]
90 Comparison of same/cross game learning (Hockey) 111888333644 more important for basketball sequences, indicating that one-on-one defensive situations occur more frequently in basketball than field hockey. [sent-385, score-0.627]
91 Summary In this work, we use hierarchical association to track multiple players in team sports over long periods of time. [sent-387, score-1.159]
92 Although the motions of players are complex and highly correlated with teammates and opponents, the short-term movement of each player is often reactive to the current situation. [sent-388, score-1.022]
93 Using this insight, we define a set of game context features and decompose the motion likelihood of all players into independent per-player models contingent on game state. [sent-389, score-1.128]
94 Higher-order inter-player dependencies are implicitly encoded into a random decision forest based on track and game context features. [sent-390, score-0.427]
95 In both sports, motion models conditioned on game context features consistently improve tracking results by more than 10%. [sent-393, score-0.603]
96 Robust object tracking by hierachical association of detection responses. [sent-444, score-0.369]
97 Identifying [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] players in broadcast sports videos using conditional random fields. [sent-488, score-0.593]
98 Tracking multiple sports players through occlusion, congestion and scale. [sent-495, score-0.63]
99 Multiple player tracking in sports video: A dual-mode two-way bayesian inference approach with progressive observation modeling. [sent-550, score-0.719]
100 Global data association for multi-object tracking using network flows. [sent-569, score-0.393]
wordName wordTfidf (topN-words)
[('players', 0.413), ('player', 0.375), ('team', 0.287), ('game', 0.262), ('association', 0.205), ('trajectories', 0.197), ('sports', 0.18), ('hockey', 0.166), ('tracking', 0.164), ('occupancy', 0.154), ('tracklets', 0.142), ('basketball', 0.142), ('trajectory', 0.134), ('tracklet', 0.114), ('chasing', 0.11), ('pmiss', 0.11), ('tgap', 0.11), ('affinity', 0.096), ('motions', 0.089), ('motion', 0.089), ('detections', 0.084), ('wm', 0.078), ('pfa', 0.075), ('gcfs', 0.073), ('associations', 0.072), ('tnt', 0.071), ('velocity', 0.066), ('context', 0.061), ('rr', 0.061), ('nfa', 0.06), ('attacking', 0.06), ('minutes', 0.06), ('nerr', 0.055), ('correlated', 0.054), ('gap', 0.052), ('pedestrian', 0.05), ('teammates', 0.049), ('ne', 0.048), ('temporal', 0.047), ('ij', 0.045), ('linking', 0.045), ('crowded', 0.045), ('hungarian', 0.044), ('track', 0.044), ('movement', 0.042), ('likelihood', 0.041), ('gaps', 0.04), ('kinematic', 0.039), ('pedestrians', 0.038), ('continuation', 0.038), ('tn', 0.037), ('congestion', 0.037), ('defending', 0.037), ('nmiss', 0.037), ('ntp', 0.037), ('react', 0.037), ('tg', 0.035), ('sequences', 0.035), ('marking', 0.034), ('forest', 0.034), ('missed', 0.034), ('movements', 0.033), ('nillius', 0.033), ('gcf', 0.033), ('dij', 0.031), ('describing', 0.031), ('situation', 0.03), ('minute', 0.03), ('anticipate', 0.03), ('long', 0.03), ('ai', 0.029), ('flow', 0.029), ('absolute', 0.029), ('correlations', 0.029), ('opposing', 0.028), ('defensive', 0.028), ('conditioned', 0.027), ('continuity', 0.027), ('opponent', 0.027), ('teams', 0.027), ('andriyenko', 0.027), ('aj', 0.027), ('jk', 0.027), ('indicating', 0.027), ('erratic', 0.026), ('adaptivity', 0.026), ('implicitly', 0.026), ('false', 0.026), ('ball', 0.026), ('field', 0.026), ('log', 0.025), ('area', 0.025), ('links', 0.025), ('miss', 0.025), ('successor', 0.025), ('kj', 0.025), ('noisy', 0.025), ('strongly', 0.025), ('network', 0.024), ('autoregressive', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
Author: Jingchen Liu, Peter Carr, Robert T. Collins, Yanxi Liu
Abstract: We employ hierarchical data association to track players in team sports. Player movements are often complex and highly correlated with both nearby and distant players. A single model would require many degrees of freedom to represent the full motion diversity and could be difficult to use in practice. Instead, we introduce a set of Game Context Features extracted from noisy detections to describe the current state of the match, such as how the players are spatially distributed. Our assumption is that players react to the current situation in only a finite number of ways. As a result, we are able to select an appropriate simplified affinity model for each player and time instant using a random decisionforest based on current track and game contextfeatures. Our context-conditioned motion models implicitly incorporate complex inter-object correlations while remaining tractable. We demonstrate significant performance improvements over existing multi-target tracking algorithms on basketball and field hockey sequences several minutes in duration and containing 10 and 20 players respectively.
2 0.59967273 356 cvpr-2013-Representing and Discovering Adversarial Team Behaviors Using Player Roles
Author: Patrick Lucey, Alina Bialkowski, Peter Carr, Stuart Morgan, Iain Matthews, Yaser Sheikh
Abstract: In this paper, we describe a method to represent and discover adversarial group behavior in a continuous domain. In comparison to other types of behavior, adversarial behavior is heavily structured as the location of a player (or agent) is dependent both on their teammates and adversaries, in addition to the tactics or strategies of the team. We present a method which can exploit this relationship through the use of a spatiotemporal basis model. As players constantly change roles during a match, we show that employing a “role-based” representation instead of one based on player “identity” can best exploit the playing structure. As vision-based systems currently do not provide perfect detection/tracking (e.g. missed or false detections), we show that our compact representation can effectively “denoise ” erroneous detections as well as enabling temporal analysis, which was previously prohibitive due to the dimensionality of the signal. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed highdefinition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the- art real-time player detector and compare it to manually labelled data.
3 0.23342182 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
4 0.20864645 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
Author: Anton Milan, Konrad Schindler, Stefan Roth
Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
5 0.19838989 301 cvpr-2013-Multi-target Tracking by Rank-1 Tensor Approximation
Author: Xinchu Shi, Haibin Ling, Junling Xing, Weiming Hu
Abstract: In this paper we formulate multi-target tracking (MTT) as a rank-1 tensor approximation problem and propose an ?1 norm tensor power iteration solution. In particular, a high order tensor is constructed based on trajectories in the time window, with each tensor element as the affinity of the corresponding trajectory candidate. The local assignment variables are the ?1 normalized vectors, which are used to approximate the rank-1 tensor. Our approach provides a flexible and effective formulation where both pairwise and high-order association energies can be used expediently. We also show the close relation between our formulation and the multi-dimensional assignment (MDA) model. To solve the optimization in the rank-1 tensor approximation, we propose an algorithm that iteratively powers the intermediate solution followed by an ?1 normalization. Aside from effectively capturing high-order motion information, the proposed solver runs efficiently with proved convergence. The experimental validations are conducted on two challenging datasets and our method demonstrates promising performances on both.
6 0.19232532 174 cvpr-2013-Fine-Grained Crowdsourcing for Fine-Grained Recognition
7 0.16972324 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree
8 0.16449039 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
9 0.16307902 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
10 0.16253906 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
12 0.12684467 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
13 0.12460931 440 cvpr-2013-Tracking People and Their Objects
14 0.11652821 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
15 0.104302 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
16 0.10359015 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
17 0.097798474 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
18 0.097463213 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms
19 0.096357435 314 cvpr-2013-Online Object Tracking: A Benchmark
20 0.096115604 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields
topicId topicWeight
[(0, 0.181), (1, 0.025), (2, -0.007), (3, -0.123), (4, -0.044), (5, -0.035), (6, 0.119), (7, -0.161), (8, 0.024), (9, 0.213), (10, 0.034), (11, -0.022), (12, -0.023), (13, -0.027), (14, 0.1), (15, 0.093), (16, -0.03), (17, 0.084), (18, 0.035), (19, -0.098), (20, 0.098), (21, 0.051), (22, -0.099), (23, 0.203), (24, 0.088), (25, 0.124), (26, 0.152), (27, -0.193), (28, -0.103), (29, -0.199), (30, -0.07), (31, -0.027), (32, -0.067), (33, -0.15), (34, 0.195), (35, -0.101), (36, -0.172), (37, -0.147), (38, 0.031), (39, 0.095), (40, 0.072), (41, -0.184), (42, 0.013), (43, -0.007), (44, -0.063), (45, 0.054), (46, 0.118), (47, -0.016), (48, -0.054), (49, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.93313366 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
Author: Jingchen Liu, Peter Carr, Robert T. Collins, Yanxi Liu
Abstract: We employ hierarchical data association to track players in team sports. Player movements are often complex and highly correlated with both nearby and distant players. A single model would require many degrees of freedom to represent the full motion diversity and could be difficult to use in practice. Instead, we introduce a set of Game Context Features extracted from noisy detections to describe the current state of the match, such as how the players are spatially distributed. Our assumption is that players react to the current situation in only a finite number of ways. As a result, we are able to select an appropriate simplified affinity model for each player and time instant using a random decisionforest based on current track and game contextfeatures. Our context-conditioned motion models implicitly incorporate complex inter-object correlations while remaining tractable. We demonstrate significant performance improvements over existing multi-target tracking algorithms on basketball and field hockey sequences several minutes in duration and containing 10 and 20 players respectively.
2 0.88822937 356 cvpr-2013-Representing and Discovering Adversarial Team Behaviors Using Player Roles
Author: Patrick Lucey, Alina Bialkowski, Peter Carr, Stuart Morgan, Iain Matthews, Yaser Sheikh
Abstract: In this paper, we describe a method to represent and discover adversarial group behavior in a continuous domain. In comparison to other types of behavior, adversarial behavior is heavily structured as the location of a player (or agent) is dependent both on their teammates and adversaries, in addition to the tactics or strategies of the team. We present a method which can exploit this relationship through the use of a spatiotemporal basis model. As players constantly change roles during a match, we show that employing a “role-based” representation instead of one based on player “identity” can best exploit the playing structure. As vision-based systems currently do not provide perfect detection/tracking (e.g. missed or false detections), we show that our compact representation can effectively “denoise ” erroneous detections as well as enabling temporal analysis, which was previously prohibitive due to the dimensionality of the signal. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed highdefinition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the- art real-time player detector and compare it to manually labelled data.
3 0.71639895 301 cvpr-2013-Multi-target Tracking by Rank-1 Tensor Approximation
Author: Xinchu Shi, Haibin Ling, Junling Xing, Weiming Hu
Abstract: In this paper we formulate multi-target tracking (MTT) as a rank-1 tensor approximation problem and propose an ?1 norm tensor power iteration solution. In particular, a high order tensor is constructed based on trajectories in the time window, with each tensor element as the affinity of the corresponding trajectory candidate. The local assignment variables are the ?1 normalized vectors, which are used to approximate the rank-1 tensor. Our approach provides a flexible and effective formulation where both pairwise and high-order association energies can be used expediently. We also show the close relation between our formulation and the multi-dimensional assignment (MDA) model. To solve the optimization in the rank-1 tensor approximation, we propose an algorithm that iteratively powers the intermediate solution followed by an ?1 normalization. Aside from effectively capturing high-order motion information, the proposed solver runs efficiently with proved convergence. The experimental validations are conducted on two challenging datasets and our method demonstrates promising performances on both.
4 0.63070089 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
Author: Anton Milan, Konrad Schindler, Stefan Roth
Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
5 0.59123749 174 cvpr-2013-Fine-Grained Crowdsourcing for Fine-Grained Recognition
Author: Jia Deng, Jonathan Krause, Li Fei-Fei
Abstract: Fine-grained recognition concerns categorization at sub-ordinate levels, where the distinction between object classes is highly local. Compared to basic level recognition, fine-grained categorization can be more challenging as there are in general less data and fewer discriminative features. This necessitates the use of stronger prior for feature selection. In this work, we include humans in the loop to help computers select discriminative features. We introduce a novel online game called “Bubbles ” that reveals discriminative features humans use. The player’s goal is to identify the category of a heavily blurred image. During the game, the player can choose to reveal full details of circular regions ( “bubbles”), with a certain penalty. With proper setup the game generates discriminative bubbles with assured quality. We next propose the “BubbleBank” algorithm that uses the human selected bubbles to improve machine recognition performance. Experiments demonstrate that our approach yields large improvements over the previous state of the art on challenging benchmarks.
6 0.52332401 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
7 0.48869315 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
8 0.4750585 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree
9 0.46353415 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
10 0.41511104 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
12 0.40423554 440 cvpr-2013-Tracking People and Their Objects
13 0.38196418 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
14 0.35163581 118 cvpr-2013-Detecting Pulse from Head Motions in Video
15 0.34126398 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
16 0.33945054 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
17 0.30563998 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
18 0.29319739 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
19 0.28739318 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
20 0.27867779 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
topicId topicWeight
[(10, 0.111), (16, 0.022), (26, 0.053), (28, 0.01), (33, 0.262), (67, 0.058), (69, 0.034), (80, 0.011), (86, 0.261), (87, 0.097)]
simIndex simValue paperId paperTitle
same-paper 1 0.83278161 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
Author: Jingchen Liu, Peter Carr, Robert T. Collins, Yanxi Liu
Abstract: We employ hierarchical data association to track players in team sports. Player movements are often complex and highly correlated with both nearby and distant players. A single model would require many degrees of freedom to represent the full motion diversity and could be difficult to use in practice. Instead, we introduce a set of Game Context Features extracted from noisy detections to describe the current state of the match, such as how the players are spatially distributed. Our assumption is that players react to the current situation in only a finite number of ways. As a result, we are able to select an appropriate simplified affinity model for each player and time instant using a random decisionforest based on current track and game contextfeatures. Our context-conditioned motion models implicitly incorporate complex inter-object correlations while remaining tractable. We demonstrate significant performance improvements over existing multi-target tracking algorithms on basketball and field hockey sequences several minutes in duration and containing 10 and 20 players respectively.
2 0.82200772 155 cvpr-2013-Exploiting the Power of Stereo Confidences
Author: David Pfeiffer, Stefan Gehrig, Nicolai Schneider
Abstract: Applications based on stereo vision are becoming increasingly common, ranging from gaming over robotics to driver assistance. While stereo algorithms have been investigated heavily both on the pixel and the application level, far less attention has been dedicated to the use of stereo confidence cues. Mostly, a threshold is applied to the confidence values for further processing, which is essentially a sparsified disparity map. This is straightforward but it does not take full advantage of the available information. In this paper, we make full use of the stereo confidence cues by propagating all confidence values along with the measured disparities in a Bayesian manner. Before using this information, a mapping from confidence values to disparity outlier probability rate is performed based on gathered disparity statistics from labeled video data. We present an extension of the so called Stixel World, a generic 3D intermediate representation that can serve as input for many of the applications mentioned above. This scheme is modified to directly exploit stereo confidence cues in the underlying sensor model during a maximum a poste- riori estimation process. The effectiveness of this step is verified in an in-depth evaluation on a large real-world traffic data base of which parts are made publicly available. We show that using stereo confidence cues allows both reducing the number of false object detections by a factor of six while keeping the detection rate at a near constant level.
3 0.80562401 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling
Author: Andrew Kae, Kihyuk Sohn, Honglak Lee, Erik Learned-Miller
Abstract: Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited to modeling local interactions among adjacent regions (e.g., superpixels). However, CRFs are limited in dealing with complex, global (long-range) interactions between regions. Complementary to this, restricted Boltzmann machines (RBMs) can be used to model global shapes produced by segmentation models. In this work, we present a new model that uses the combined power of these two network types to build a state-of-the-art labeler. Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF. We demonstrate its labeling performance for the parts of complex face images from the Labeled Faces in the Wild data set. This hybrid model produces results that are both quantitatively and qualitatively better than the CRF alone. In addition to high-quality labeling results, we demonstrate that the hidden units in the RBM portion of our model can be interpreted as face attributes that have been learned without any attribute-level supervision.
4 0.79632789 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: We propose a new approach for estimation of the positions of facial keypoints with three-level carefully designed convolutional networks. At each level, the outputs of multiple networks are fused for robust and accurate estimation. Thanks to the deep structures of convolutional networks, global high-level features are extracted over the whole face region at the initialization stage, which help to locate high accuracy keypoints. There are two folds of advantage for this. First, the texture context information over the entire face is utilized to locate each keypoint. Second, since the networks are trained to predict all the keypoints simultaneously, the geometric constraints among keypoints are implicitly encoded. The method therefore can avoid local minimum caused by ambiguity and data corruption in difficult image samples due to occlusions, large pose variations, and extreme lightings. The networks at the following two levels are trained to locally refine initial predictions and their inputs are limited to small regions around the initial predictions. Several network structures critical for accurate and robust facial point detection are investigated. Extensive experiments show that our approach outperforms state-ofthe-art methods in both detection accuracy and reliability1.
5 0.77883035 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
Author: Guang Chen, Yuanyuan Ding, Jing Xiao, Tony X. Han
Abstract: Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-partmodel detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-ofthe-art approaches, contextually boosting deformable part models (ver.5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].
6 0.76180267 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
7 0.75835633 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
8 0.75766814 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
9 0.75650156 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery
10 0.75640345 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
11 0.75560403 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
12 0.75478685 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
13 0.75383449 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
14 0.75365728 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
15 0.75361985 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
16 0.75356787 143 cvpr-2013-Efficient Large-Scale Structured Learning
17 0.7533834 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
18 0.75328481 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
19 0.75308192 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
20 0.7529043 72 cvpr-2013-Boundary Detection Benchmarking: Beyond F-Measures