iccv iccv2013 iccv2013-393 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, Qiang Ji
Abstract: We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other. Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking associations . We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and tracklet associations. We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 edu @ Abstract We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. [sent-10, score-0.381]
2 The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other. [sent-11, score-1.324]
3 Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking associations . [sent-12, score-1.22]
4 We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and tracklet associations. [sent-13, score-1.044]
5 We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets. [sent-14, score-0.377]
6 Introduction Reliably tracking and clustering of faces in unconstrained videos is a challenging problem, which is complicated by drastic changes in backgrounds, illuminations, view points, camera movements and occlusions that frequently occur in actual videos. [sent-16, score-0.402]
7 Most previous works treat these two problems separately, as either a face tracking problem where sequences of face images are associated [14, 18] or a face clustering problem where face images are partitioned into different clusters [3, 22, 23]. [sent-17, score-0.69]
8 We address in this paper the problem of simultaneously clustering and linking of short sequences of detected faces (termed as face tracklets) in videos. [sent-18, score-0.902]
9 Figure 1 exemplifies the benefits of simultaneous face clustering and tracklet linking: in the first case (top Figure1. [sent-20, score-0.957]
10 Detected face tracklets are indicated by bounding boxes connected with solid lines (we only highlight a few detected tracklets for the sake of presentation). [sent-22, score-0.909]
11 Linkings and cluster labels of tracklets are indicated by the dashed curves and numbers over the bounding boxes, respectively. [sent-23, score-0.523]
12 (Top) Without considering clustering labels, tracklets of different clusters are linked incorrectly. [sent-24, score-0.573]
13 (Bottom) Without considering tracklet linking, tracklets in the same track are incorrectly partitioned into different clusters. [sent-25, score-1.009]
14 row), linking tracklets without considering their cluster labels leads to incorrect association tracklets from cluster 1 and 2 together. [sent-26, score-1.316]
15 In the second case (bottom row), incorrect clustering of faces (in this case, separation of faces of the same person into two clusters 1 and 2) can be avoided with the knowledge that there is a high likelihood that they are in the same long track. [sent-27, score-0.502]
16 The basis of our method is a novel hidden Markov random field (HMRF) 1 model [8] that jointly models the face cluster labels and face tracklet associations. [sent-28, score-1.022]
17 We formulate the problem of simultaneously clustering and linking of tracklets as a Bayesian inference problem based on this model, and provide an efficient coordinate-descent solution. [sent-29, score-0.952]
18 Specifically, from detected face tracklets with similar 1As the constraints used in our problem are un-directed correlations, HMRF is a better choice than the directed models such as HMM or DBN. [sent-30, score-0.552]
19 Overal workflow of our method for simultaneous face clustering and tracklet linking. [sent-32, score-0.957]
20 With an input video, we first det ct al faces in each frame, and then form face tracklets from adjacent frames. [sent-33, score-0.631]
21 The face tracklets are iteratively clustered and linked into longer tracks in a bootstrapping manner, with the final output of the algorithm being the complete long tracks of detected faces with cluster labels. [sent-34, score-0.965]
22 • Our face tracking and clustering performances improves on t threa csktianteg-o af-ntdhe c-alurts treersinuglts pfeorr ftohrrmeea dnacetas se imts. [sent-41, score-0.351]
23 These methods commonly treat faces from different video frames as a set of still images, and apply conventional clustering algorithms based on similarities in appearance and poses. [sent-49, score-0.396]
24 A few recent works also study the use of pairwise constraints that are distinct for detected faces in videos [3, 22, 23]: faces in the same tracklet should be must-linked, while faces from a pair of overlapped tracklets should be cannot-linked. [sent-50, score-1.6]
25 As shown in the bottom row of Figure 1, because of the pose change and low resolution, two tracklets of the same person are easily to be grouped into different clusters. [sent-52, score-0.347]
26 If given a pair of non-overlapped tracklets belongs to the same track, many more pairwise constraints can be obtained, which are expected to further enhance the performance of constrained face clustering. [sent-53, score-0.557]
27 Compared to traditional monolithic tracking solutions, the tracklet based methods are more robust and suitable for tracking multiple objects in heavily occluded scenes. [sent-58, score-0.743]
28 Though bootstrapping clustering and tracklet linking has been discussed in the context of simplified context with fixed camera (e. [sent-60, score-1.233]
29 Problem Formulation We assume that a long video has been processed to obtain a set of face tracklets U = (u1, u2 , · · · , un) (details given in Section 5). [sent-66, score-0.504]
30 We use t(i) , X(i) and L(i) to represent the ensemble of , and of tracklet ui, respectively. [sent-68, score-0.645]
31 We also compute similarities between every pair of tracklets (details to be described in Section 4. [sent-69, score-0.365]
32 ink the tracklet into longer tj(i) xj(i) 2857 lj(i) (HMRF) model. [sent-72, score-0.663]
33 The top layer represents the cluster label variables, while the bottom layer denotes the linking variables. [sent-74, score-0.49]
34 Note that the linking variables in the same row/column are fully connected, and the links between two non-adjacent cluster label variables are omitted for clarity. [sent-76, score-0.49]
35 tracks and partition the face images into distinct clusters, based on cues from face appearances and motion trajectories. [sent-77, score-0.35]
36 We denote the cluster labels of the tracklets as a vector y = (y1, y2 , · · · , yn) with each yi ∈ {1, 2, . [sent-79, score-0.569]
37 The linking rel,a·ti·o·n ,sy of tracklets are represented . [sent-83, score-0.75]
38 w,Kith} a matrix O ∈ {0, 1}n×n wtrahcekrele Oij =e e1p rife saenndt only tifh t arac mkaltertsix ui an ∈d uj are adjacent in a track with ui precedes uj, and Oii = 1if and only if tracklet ui is the last tracklet in a long track. [sent-84, score-1.722]
39 Specifically, we model the likelihood of the appearances of the face images in the tracklets given their cluster labels with a simple Gaussian model, P(U|y;θ) = ? [sent-89, score-0.619]
40 The tuning of λ2 will be = (6) and disdis- 2858 Algorithm 1 Algorithm 1Overall algorithm for simultaneous face clustering and tracklet linking. [sent-153, score-0.957]
41 Input: tracklets U, their similarity M, number of clusters K Output: cluster labels y and tracklet linking relation O Initialize O based on M, using Hungarian algorithm; while not converge do optimizing y and θ with fixed O (Section 3. [sent-154, score-1.556]
42 This constrained clustering problem can be directly solved by the simulated filed algorithm [2, 23], in which the inference of y and the learning of parameters θ = (μ, β) are performed alternatively. [sent-202, score-0.281]
43 =hait in linking t shaet stfriaedck aleuttso, we aall syo. [sent-239, score-0.403]
44 nstraints from the clustering results with the term βλ2 (I(yi = yj) 1) in Equation (10): if yi yj, then the similarity be)tw −ee 1n) ui Eaqndu uj nw (il1l0 b)e: irfed yuc=e? [sent-241, score-0.441]
45 We augment the HMRF-based constrained clustering proposed in [23] by incorporating additional constraints obtained from tracklet linking results. [sent-249, score-1.31]
46 The pairwise constraints used in the original HMRF-based clustering include two types: (1) faces in one tracklet should belong to the same cluster; (2) faces from two overlapped tracklets (some faces of them co-exist in the same frame) should belong to different clusters. [sent-250, score-1.682]
47 a The additional constraints are derived from tracklet linking: if two tracklets ui and uj are linked after tracklet linking, faces from them should grouped into the same cluster. [sent-258, score-2.053]
48 For example, given ui and uj are linked, if uj is overlapped with another tracklet uk, then faces from ui and uk should also be cannot-linked. [sent-260, score-1.204]
49 To reduce the computation of clustering, we adopt the approximation framework in [23], which is based on the observation that faces in adjacent frames of the same tracklet are very similar. [sent-274, score-0.837]
50 Its main process is summarized as follows: (1) randomly sampling a fixed number of faces from each tracklet to obtain a subset of ns faces; (2) running constrained clustering on this subset; (3) determining the labels of all faces based on the labels of faces in the subset. [sent-275, score-1.381]
51 As a simple example, if 5 faces are sampled from one tracklet, and their labels are (3, 2, 3, 3, 1) after clustering, then the label of this tracklet is determined as the mode value 3. [sent-276, score-0.852]
52 All faces in this tracklet are also relabeled as 3. [sent-277, score-0.79]
53 The cluster labels of tracklets are called as tracklet-level clustering, while labels of all faces are referred to as face-level clustering. [sent-278, score-0.669]
54 Since the number of frames in each tracklet is not equal, the clustering accuracies of these two levels may be different. [sent-279, score-0.848]
55 Since the constraints in V are always correct, while the constraints in O that from tracklet linking may have some errors, we set λ2 < 1. [sent-281, score-1.15]
56 Besides, the tracklet linking results are believed to become more accurate as the iteration proceeds, leading to more reliable constraints in O. [sent-282, score-1.099]
57 Tracklet Linking A key component for tracklet linking is the tracklet similarity represented in matrix M. [sent-286, score-1.693]
58 As reported in some previous works [24, 17], the tracklet similarity takes account of three aspects, including the temporal adjacency, appearance affinity and motion smoothness. [sent-287, score-0.684]
59 is a column vector containing the frame indices of all faces in tracklet ui. [sent-296, score-0.811]
60 twteen ui and uj and t0 is a pre-defined threshold to avoid linking two tracklets with a large frame gap. [sent-300, score-0.958]
61 A tracklet is further represented by the average vector of the included faces. [sent-303, score-0.645]
62 In particular, denote ∈ R4×1 as the location and scale of the jth bounding box in tracklet ui, represented by the horizontal and vertical coordinate of the central pixel and the width and height of the box. [sent-306, score-0.669]
63 Treating each face as a point, then one tracklet can be seen as a sequence of discrete points in a 4-dimensional space. [sent-307, score-0.757]
64 The solid circles correspond to detected faces in one tracklet, while the dashed circles are those predicted by the fitted trajectory. [sent-318, score-0.265]
65 The value c will influence the final number of tracks and the lengths of tracks: if c is large, then many short tracks will be obtained; otherwise fewer but longer tracks will be presented. [sent-322, score-0.279]
66 The ratio βλλ12is adjusted to control the relative weight between the constraints and tracklet similarities. [sent-325, score-0.696]
67 In Frontal, there are frequent occlusions and fast movements that make tracklet linking difficult. [sent-333, score-1.094]
68 The Turning video has frequent occlusions and many profile faces that make clustering challenging. [sent-334, score-0.36]
69 The detected faces in adjacent frames are then linked, based on similarities in their appearances, locations and scales of the bounding boxes. [sent-343, score-0.276]
70 The small tracklets which include less than 10 faces are deleted in Frontal and Turning, while tracklets including less than 20 faces are deleted in BBT01. [sent-348, score-1.034]
71 In particular, we evaluate clustering performances at two levels: the face-level clustering gives the cluster label for each detected face; the tracklet-level clustering outputs the cluster label for each tracklet, which is determined based on the face-level clustering, as mentioned in Section 4. [sent-352, score-0.751]
72 For tracklet linking, we adopt the following metrics used in [11]: the number of predicted tracks (PT, i. [sent-354, score-0.726]
73 Since we just focus on the performance of tracklet linking, rather than tracking, the metric of the mostly lost tracks (ML) is not used here. [sent-357, score-0.726]
74 Besides, the ground-truth tracks (GT) is predefined based on a threshold of the frame gap t0: firstly, for the same person, we link all tracklets based on temporal correlations; then, if the frame gap between two adjacent tracklets is larger than t0, they will be cut to different tracks. [sent-358, score-0.844]
75 Comparisons For face clustering, we compare the proposed method with HMRF-pc [23], which corresponds to only using the clustering step in our framework without the constraints from tracklet linking. [sent-362, score-0.972]
76 For tracklet linking, we compare a baseline tracklet linking method (denoted as BasicLinking), corresponding to only running the tracklet linking step in our framework without constraints from clustering results. [sent-363, score-2.956]
77 Besides, we also compare with the state-of-the-art method in the literature of face tracklet linking 4 [18]5. [sent-364, score-1.16]
78 4To highlight the benefits of the additional constraints from tracklet linking, in our experiments the label-level local-smoothness used in [23] is not considered. [sent-365, score-0.715]
79 Ilustration of the output face clustering and tracklet linking from our method on (Top) Turni g and (Bot om) B T01. [sent-372, score-1.324]
80 The dashed curves represent linking of tracklets. [sent-375, score-0.423]
81 Results The experiment results on face tracklet linking and clustering are summarized in Table 3 and 4. [sent-378, score-1.324]
82 Since the room to improve is small, the tracklet linking fails to help further enhance the clustering result based on HMRFpc. [sent-380, score-1.212]
83 On the other hand, including the tracklet linking improves the clustering accuracies significantly by 22% and 25% in trackletlevel and face-level respectively. [sent-382, score-1.231]
84 This may be due to the fact that the classifiers trained on local appearance models from each pair of overlapped tracklets in [18] become less effective, as the appearances of frontal faces are sufficiently distinct in this video. [sent-386, score-0.644]
85 However, constraints originated from the clustering results aids the linking step and reduces errors due to the fast movement. [sent-387, score-0.618]
86 The presented frames show different challenges for clustering and tracklet linking, including: changes in poses, shots, backgrounds, camera movements and occlusions. [sent-397, score-0.85]
87 The computational cost of the proposed method consists of two parts, including constrained clustering and tracklet linking. [sent-399, score-0.856]
88 For one iteration of tracklet linking, the main cost is the Hungarian algorithm. [sent-411, score-0.645]
89 As such, its cost is similar with other linking methods. [sent-412, score-0.403]
90 The proposed method oftentimes converges in less than 10 iterations between clustering and linking in our experiments. [sent-413, score-0.567]
91 Conclusions and Discussions We describe a novel method that simultaneously clusters and associates faces of distinct humans in long video sequences for identity maintenance. [sent-415, score-0.271]
92 Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking relations. [sent-416, score-1.22]
93 We provide an efficient algorithm, based on constrained clustering with the simulated field algorithm and optimal matching for the simultaneous inference of cluster labels and tracklet associations. [sent-417, score-1.064]
94 We show improvements on the state-of-the-art results in face tracking and clustering performances on several challenging video datasets. [sent-418, score-0.377]
95 Consequently, the two tracklets in red color fail to be linked. [sent-421, score-0.347]
96 It demonstrates that the performance of the face clustering and tracklet linking can benefit with more sophisticated face detection methods that robust to pose, orientation or illumination changes. [sent-422, score-1.436]
97 Furthermore, we will also investigate more efficient optimization procedures of the constrained clustering and matching problems, and incorporating the simultaneous face clustering and linking into an overall system for video summarization. [sent-423, score-0.952]
98 Tracklet-level clustering means the cluster labels ofeach tracklet, whileface-level clustering represents the cluster labels of each face. [sent-432, score-0.592]
99 A mutual information based face clustering algorithm for movie content analysis. [sent-583, score-0.276]
100 Constrained clustering and its application to face clustering in videos. [sent-593, score-0.44]
wordName wordTfidf (topN-words)
[('tracklet', 0.645), ('linking', 0.403), ('tracklets', 0.347), ('oij', 0.186), ('vij', 0.184), ('clustering', 0.164), ('faces', 0.145), ('face', 0.112), ('hmrf', 0.105), ('yj', 0.102), ('uj', 0.096), ('ui', 0.091), ('yi', 0.09), ('cluster', 0.087), ('tracks', 0.081), ('turning', 0.062), ('oji', 0.06), ('hungarian', 0.055), ('constraints', 0.051), ('tracking', 0.049), ('constrained', 0.047), ('albany', 0.045), ('labels', 0.045), ('frontal', 0.044), ('detected', 0.042), ('jn', 0.041), ('overlapped', 0.04), ('lj', 0.037), ('simultaneous', 0.036), ('rpi', 0.035), ('linked', 0.033), ('frag', 0.033), ('log', 0.032), ('basiclinking', 0.03), ('casia', 0.03), ('ection', 0.03), ('filed', 0.03), ('siwei', 0.03), ('clusters', 0.029), ('appearances', 0.028), ('bipartite', 0.027), ('adjacent', 0.027), ('baoyuan', 0.027), ('csc', 0.027), ('performances', 0.026), ('video', 0.026), ('roth', 0.026), ('frequent', 0.025), ('penalized', 0.025), ('deleted', 0.025), ('bounding', 0.024), ('hu', 0.024), ('listing', 0.023), ('appearance', 0.023), ('markov', 0.023), ('videos', 0.023), ('rensselaer', 0.022), ('softly', 0.022), ('lyu', 0.022), ('qiang', 0.021), ('scholarship', 0.021), ('frame', 0.021), ('movements', 0.021), ('ji', 0.021), ('hidden', 0.021), ('simplified', 0.021), ('frames', 0.02), ('simulated', 0.02), ('inference', 0.02), ('dashed', 0.02), ('circles', 0.02), ('long', 0.019), ('accuracies', 0.019), ('dt', 0.019), ('highlight', 0.019), ('china', 0.019), ('dependencies', 0.019), ('ids', 0.018), ('longer', 0.018), ('similarities', 0.018), ('solid', 0.018), ('dm', 0.018), ('observable', 0.018), ('simultaneously', 0.018), ('short', 0.018), ('propagated', 0.018), ('ni', 0.018), ('polytechnic', 0.017), ('fragments', 0.017), ('distinct', 0.017), ('determined', 0.017), ('besides', 0.017), ('track', 0.017), ('dropping', 0.017), ('tv', 0.017), ('xj', 0.017), ('associates', 0.017), ('tinheg', 0.016), ('affinity', 0.016), ('equation', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 393 iccv-2013-Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos
Author: Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, Qiang Ji
Abstract: We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other. Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking associations . We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and tracklet associations. We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets.
2 0.28797081 418 iccv-2013-The Way They Move: Tracking Multiple Targets with Similar Appearance
Author: Caglayan Dicle, Octavia I. Camps, Mario Sznaier
Abstract: We introduce a computationally efficient algorithm for multi-object tracking by detection that addresses four main challenges: appearance similarity among targets, missing data due to targets being out of the field of view or occluded behind other objects, crossing trajectories, and camera motion. The proposed method uses motion dynamics as a cue to distinguish targets with similar appearance, minimize target mis-identification and recover missing data. Computational efficiency is achieved by using a Generalized Linear Assignment (GLA) coupled with efficient procedures to recover missing data and estimate the complexity of the underlying dynamics. The proposed approach works with tracklets of arbitrary length and does not assume a dynamical model a priori, yet it captures the overall motion dynamics of the targets. Experiments using challenging videos show that this framework can handle complex target motions, non-stationary cameras and long occlusions, on scenarios where appearance cues are not available or poor.
3 0.24872877 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
Author: Hongyi Zhang, Andreas Geiger, Raquel Urtasun
Abstract: In this paper, we are interested in understanding the semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-level semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches [10].
4 0.15999457 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
Author: Vignesh Ramanathan, Percy Liang, Li Fei-Fei
Abstract: Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-ofthe-art method on the TRECVID-MED11 event kit, despite weaker supervision.
5 0.12538023 120 iccv-2013-Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features
Author: K.C. Amit Kumar, Christophe De_Vleeschouwer
Abstract: Given a set of plausible detections, detected at each time instant independently, we investigate how to associate them across time. This is done by propagating labels on a set of graphs that capture how the spatio-temporal and the appearance cues promote the assignment of identical or distinct labels to a pair of nodes. The graph construction is driven by the locally linear embedding (LLE) of either the spatio-temporal or the appearance features associated to the detections. Interestingly, the neighborhood of a node in each appearance graph is defined to include all nodes for which the appearance feature is available (except the ones that coexist at the same time). This allows to connect the nodes that share the same appearance even if they are temporally distant, which gives our framework the uncommon ability to exploit the appearance features that are available only sporadically along the sequence of detections. Once the graphs have been defined, the multi-object tracking is formulated as the problem of finding a label assignment that is consistent with the constraints captured by each of the graphs. This results into a difference of convex program that can be efficiently solved. Experiments are performed on a basketball and several well-known pedestrian datasets in order to validate the effectiveness of the proposed solution.
6 0.11803623 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
7 0.10452343 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
8 0.09672498 157 iccv-2013-Fast Face Detector Training Using Tailored Views
9 0.081308089 180 iccv-2013-From Where and How to What We See
10 0.078157283 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
11 0.075240828 444 iccv-2013-Viewing Real-World Faces in 3D
12 0.07463599 232 iccv-2013-Latent Space Sparse Subspace Clustering
13 0.071274854 166 iccv-2013-Finding Actors and Actions in Movies
14 0.06819737 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
15 0.067717873 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
16 0.066877127 360 iccv-2013-Robust Subspace Clustering via Half-Quadratic Minimization
17 0.062856771 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context
18 0.06039181 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
19 0.058941089 134 iccv-2013-Efficient Higher-Order Clustering on the Grassmann Manifold
20 0.058755737 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
topicId topicWeight
[(0, 0.147), (1, 0.016), (2, -0.016), (3, 0.011), (4, -0.015), (5, -0.022), (6, 0.067), (7, 0.114), (8, 0.04), (9, 0.03), (10, -0.015), (11, -0.028), (12, -0.004), (13, 0.054), (14, -0.025), (15, 0.017), (16, -0.035), (17, -0.009), (18, -0.071), (19, -0.007), (20, -0.105), (21, -0.087), (22, 0.049), (23, -0.086), (24, 0.056), (25, 0.031), (26, 0.034), (27, -0.128), (28, -0.032), (29, 0.025), (30, 0.027), (31, -0.067), (32, 0.047), (33, 0.01), (34, 0.011), (35, 0.073), (36, 0.077), (37, -0.002), (38, -0.031), (39, -0.04), (40, 0.09), (41, -0.032), (42, -0.085), (43, -0.025), (44, -0.264), (45, 0.02), (46, 0.127), (47, -0.034), (48, -0.098), (49, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.90919167 393 iccv-2013-Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos
Author: Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, Qiang Ji
Abstract: We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other. Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking associations . We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and tracklet associations. We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets.
2 0.64721501 418 iccv-2013-The Way They Move: Tracking Multiple Targets with Similar Appearance
Author: Caglayan Dicle, Octavia I. Camps, Mario Sznaier
Abstract: We introduce a computationally efficient algorithm for multi-object tracking by detection that addresses four main challenges: appearance similarity among targets, missing data due to targets being out of the field of view or occluded behind other objects, crossing trajectories, and camera motion. The proposed method uses motion dynamics as a cue to distinguish targets with similar appearance, minimize target mis-identification and recover missing data. Computational efficiency is achieved by using a Generalized Linear Assignment (GLA) coupled with efficient procedures to recover missing data and estimate the complexity of the underlying dynamics. The proposed approach works with tracklets of arbitrary length and does not assume a dynamical model a priori, yet it captures the overall motion dynamics of the targets. Experiments using challenging videos show that this framework can handle complex target motions, non-stationary cameras and long occlusions, on scenarios where appearance cues are not available or poor.
3 0.6239236 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
Author: Hongyi Zhang, Andreas Geiger, Raquel Urtasun
Abstract: In this paper, we are interested in understanding the semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-level semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches [10].
4 0.55493486 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
Author: Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard
Abstract: Jinyan Guan† j guan1 @ emai l ari z ona . edu . Kyle Simek† ks imek@ emai l ari z ona . edu . Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . edu . ‡School of Information University of Arizona Kobus Barnard‡ kobus @ s i sta . ari z ona . edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
5 0.54646629 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
Author: Aleksandr V. Segal, Ian Reid
Abstract: We propose a novel parametrization of the data association problem for multi-target tracking. In our formulation, the number of targets is implicitly inferred together with the data association, effectively solving data association and model selection as a single inference problem. The novel formulation allows us to interpret data association and tracking as a single Switching Linear Dynamical System (SLDS). We compute an approximate posterior solution to this problem using a dynamic programming/message passing technique. This inference-based approach allows us to incorporate richer probabilistic models into the tracking system. In particular, we incorporate inference over inliers/outliers and track termination times into the system. We evaluate our approach on publicly available datasets and demonstrate results competitive with, and in some cases exceeding the state of the art.
6 0.48498419 120 iccv-2013-Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features
7 0.48376173 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context
8 0.47358948 87 iccv-2013-Conservation Tracking
9 0.44795758 166 iccv-2013-Finding Actors and Actions in Movies
10 0.44516152 167 iccv-2013-Finding Causal Interactions in Video Sequences
11 0.44382063 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
12 0.44112214 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking
13 0.41930708 154 iccv-2013-Face Recognition via Archetype Hull Ranking
14 0.40259176 272 iccv-2013-Modifying the Memorability of Face Photographs
15 0.36584526 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
16 0.36582801 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
17 0.35897288 182 iccv-2013-GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity
18 0.35713029 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
19 0.35494256 14 iccv-2013-A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding
20 0.34243602 350 iccv-2013-Relative Attributes for Large-Scale Abandoned Object Detection
topicId topicWeight
[(2, 0.075), (7, 0.02), (26, 0.084), (27, 0.011), (31, 0.062), (34, 0.018), (42, 0.087), (48, 0.017), (64, 0.087), (73, 0.032), (78, 0.012), (88, 0.23), (89, 0.134), (95, 0.016), (98, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.78697622 393 iccv-2013-Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos
Author: Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, Qiang Ji
Abstract: We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other. Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking associations . We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and tracklet associations. We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets.
2 0.70250511 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
3 0.68877345 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
Author: Dahua Lin, Sanja Fidler, Raquel Urtasun
Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.
4 0.67016035 180 iccv-2013-From Where and How to What We See
Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.
5 0.66638106 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
6 0.66603202 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
7 0.66364992 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
8 0.66364139 86 iccv-2013-Concurrent Action Detection with Structural Prediction
10 0.66289485 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
11 0.66120303 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
12 0.66070396 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
13 0.66011143 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
14 0.65996414 414 iccv-2013-Temporally Consistent Superpixels
15 0.65986282 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
16 0.65946305 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
17 0.65940106 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
18 0.65828681 338 iccv-2013-Randomized Ensemble Tracking
19 0.65809369 150 iccv-2013-Exemplar Cut
20 0.65798926 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow