cvpr cvpr2013 cvpr2013-292 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Suha Kwak, Bohyung Han, Joon Hee Han
Abstract: We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.
Reference: text
sentIndex sentText sentNum sentScore
1 kr Abstract We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. [sent-3, score-1.654]
2 Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. [sent-4, score-1.163]
3 Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. [sent-5, score-0.972]
4 Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. [sent-6, score-0.876]
5 We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents. [sent-7, score-0.193]
6 Introduction Event detection—identification of a predefined event in videos—typically suffers from an enormous combinatorial complexity. [sent-9, score-0.539]
7 It is aggravated by additional challenges such as the agents’ unknown roles and the extra agents who do not participate in the event. [sent-10, score-0.57]
8 We are interested in such challenging problem, event localization and role assignment, which refers to identifying the target event and recognizing its participants in the presence of non-participants, while estimating the roles of the participants automatically at the same time. [sent-11, score-1.842]
9 Although computer vision community has broad interest in the problems related to event detection, the joint estimation of event localization and role assignment has rarely been studied before. [sent-12, score-1.531]
10 This is particularly due to the computational complexity induced by various factors, for example, multiple agent participation in an event, temporallogical variations of an event or potential activities irrelevant to the target event. [sent-13, score-1.049]
11 In this paper, we introduce a novel algorithm for video event understanding, where the event localization problem ∗This work was supported by MEST Basic Science Research Program through the NRF of Korea (2012-0003697), the IT R&D; program of MKE/KEIT (10040246), and Samsung Electronics Co. [sent-14, score-1.165]
12 is solved jointly with the role assignment by formulating the problem as an optimization framework. [sent-16, score-0.391]
13 Our algorithm incorporates [10] to describe and detect a target event based on a scenario. [sent-17, score-0.637]
14 A na¨ ıve extension of [10] for event localization and role assignment however requires the consideration of all possible agent-role combinations, which may be intractable computationally. [sent-18, score-0.992]
15 For this purpose, we first generate agent groups, each of which is composed of agents potentially having mutual interactions defined in a scenario. [sent-20, score-0.597]
16 The confidence and time interval of each role are estimated for each agent by rolespecific event detection. [sent-21, score-1.302]
17 The event localization and role assignment are solved by assigning roles to agents optimally per group and selecting groups whose role assignments are feasible. [sent-22, score-1.883]
18 1) The event localization and role assignment problems are jointly managed in a single optimization framework. [sent-24, score-1.014]
19 2) Our framework handles some challenging situations in event detection in a principled way, e. [sent-25, score-0.584]
20 2 reviews previous work closely related to multi-agent event detection. [sent-31, score-0.539]
21 After we discuss the event detection framework based on the constraint flow in Sec. [sent-32, score-0.791]
22 3, our multi-agent event localization and role assignment algorithm is presented in Sec. [sent-33, score-0.992]
23 Related work The simplest tasks related to event detection include atomic action detection (e. [sent-38, score-0.669]
24 An atomic action is a short-term event generated by a single agent, and abnormality detection is a binary decision with respect to video contents; both approaches are insufficient to deduce semantic interpretations from videos. [sent-43, score-0.716]
25 In this paper, we consider an event as a composition of atomic actions, which we call primitives, under a temporal-logical structure. [sent-44, score-0.579]
26 The structure of 222666888200 the event is described by a scenario. [sent-45, score-0.539]
27 Recently, scenarios are often described by logic such as temporal interval algebra [1] and event logic [20] to represent various aspects of events more fluently. [sent-48, score-0.882]
28 Logic-based scenarios have been adopted by various event detection algorithms including hierarchical constraint satisfaction [17], multithread parsing [24], randomized stochastic local search [4], and probabilistic inference on constraint flow [10] and Markov logic network [15, 22]. [sent-49, score-0.979]
29 However, most of these approaches have focused on events with a single agent only [2, 12, 14, 19], or events that can be detected without role identification [4, 5, 15]. [sent-50, score-0.802]
30 Otherwise, roles should be initialized by prior information or human intervention [8, 10, 17, 24]. [sent-51, score-0.269]
31 A few recent approaches discuss role assignment in the detection of complex multi-agent events. [sent-52, score-0.436]
32 In [11], an MRF captures relationships between roles and performs per-video event categorization; the extensions to handle non-participants and localize events in spatio-temporal domain are not straightforward. [sent-53, score-0.96]
33 An MCMC technique is introduced to identify agents’ roles and detect an event at the same time in [18]. [sent-54, score-0.808]
34 However, it is a heuristic-based method presenting results in videos with a single event occurrence only. [sent-55, score-0.573]
35 Our method is formulated in a more principled way and can handle multiple occurrences of an event naturally. [sent-56, score-0.565]
36 Scenario-based event detection Our framework incorporates [10] to describe and detect target events based on scenarios. [sent-58, score-0.777]
37 In this section, we briefly review the scenario description method and event detection procedure using constraint flow presented in [10]. [sent-59, score-0.882]
38 Scenario description A scenario describes a target event based on primitives and their temporal-logical relationships. [sent-62, score-0.869]
39 The relationships constrain arrangements of time intervals of the primitives. [sent-63, score-0.153]
40 Let ρi and ρj be primitives organizing a target event. [sent-64, score-0.239]
41 Four temporal relationships define temporal orders between starting and ending times of two primitives: ρi < ρj ρi ∼ ρj ρi ∧ ρj ⇔ end(ρi) is earlier than start(ρj) . [sent-65, score-0.226]
42 The quinary conditions to specify flow vertices The last unary relationship is to describe an unknown number of recurrences of a specified primitive; k is an index for the number of recurrences. [sent-79, score-0.278]
43 ⇔ ρi ⇔ In principle, logical relationships take precedence over temporal relationships, but we can use parentheses to override the precedence of the relationships. [sent-82, score-0.232]
44 γdel and γrec indicate two roles in Delivery, deliverer and receiver, respectively. [sent-84, score-0.269]
45 A role associated with each primitive is specified in the attached bracket. [sent-85, score-0.376]
46 Constraint flow and event detection Constraint flow is a state transition machine characterizing the organization of a target event. [sent-88, score-0.918]
47 The state of a target event at a time step is represented by a combination of conditions of all primitives. [sent-89, score-0.663]
48 The condition of a primitive is active if the primitive occurs at the time step, and it is inactive otherwise; the inactive condition is subdivided into four hypotheses (Table 1) so that various temporal-logical relationships among primitives can be described. [sent-90, score-0.528]
49 The constraint flow is a directed graph, whose vertices are combinations of quinary conditions of the primitives (Fig. [sent-91, score-0.545]
50 When a scenario is given, the corresponding constraint flow is generated automatically by the scenario parsing technique described in Algorithm 1of [10]. [sent-93, score-0.389]
51 The objective of event detection is to find the best feasible interpretation of a given video. [sent-94, score-0.676]
52 Because a scenario and its constraint flow are equivalent representations to describe an event, we can generate feasible 222666888311 CBAD[ γ γ21 2] : ×r ×ra×ra ×fra×fr×fa×f BADC[ γ γ1 2 ] : :×r (ra )C×ora nstra×ifrantflo×wfr×fa×f Figure 1. [sent-97, score-0.339]
53 (a) There are two initial vertices (red boxes) and two final vertices (blue boxes) in the constraint flow. [sent-100, score-0.199]
54 (b) A flow traverse (T13) becomes a feasible interpretation (I13) by projecting quinary conditions to binary conditions; the projection is denoted by The time intervals of the primitives are marked black. [sent-101, score-0.603]
55 The time interval of the target event is obtained by β. [sent-102, score-0.709]
56 searching for the first and last activations in the interpretation; in this example, the time interval of the target event is [2, 12]. [sent-103, score-0.739]
57 Let Tt be a flow traverse up to time t and It = β(Tt) be the feasible interpretation obtained by the binary projection of Tt. [sent-106, score-0.261]
58 Multi-agent event localization The event detector described in Sec. [sent-125, score-1.14]
59 3 assumes that roles of all agents are given manually. [sent-126, score-0.529]
60 We call a group of agents involved in a target event with assigned roles a true lineup, and there are typically many lineup candidates (i. [sent-127, score-1.495]
61 Let m be the number of agents existing in a video, and n be the number of roles defined in a scenario. [sent-130, score-0.529]
62 We assume that an agent takes part in at most one target event in a video and that the agents and the roles are bijective in a target event. [sent-131, score-1.626]
63 It is impractical to × apply event detection algorithm to all the candidates. [sent-135, score-0.584]
64 We introduce a technique that efficiently localizes multiagent events by identifying true lineups in an optimization framework. [sent-136, score-0.2]
65 Our method exploits role confidences and time intervals of agents, which are obtained by role-specific event detection per agent (Sec. [sent-137, score-1.365]
66 Despite a huge number of potential lineups, we maintain efficiency by separating the event localization procedure into two steps: role estimation per group and group selection (Sec. [sent-140, score-0.98]
67 Agent grouping Suppose that there are n roles in a target event. [sent-146, score-0.367]
68 We first form groups with n agents, which are the candidates to be applied to our event detection algorithm. [sent-147, score-0.66]
69 An agent can be associated with multiple groups at this stage. [sent-148, score-0.372]
70 lineup candidates, where l denotes the number of groups1 , because n! [sent-150, score-0.236]
71 role configurations are available per group; it may be infeasible to identify few true lineups from such a large number of candidates. [sent-151, score-0.405]
72 The groups are represented by an m l binary matrix A, where [A]i,k = 1 if the i-th agent belongs to the kth group, and 0 otherwise. [sent-152, score-0.372]
73 Agent-wise role analysis The role assignment is essential for multi-agent event detection, but typically available after the completion of event detection. [sent-165, score-1.744]
74 To estimate the role of an agent, we evaluate how appropriate the properties of the agent for each role are, and also estimate potential time interval of each role. [sent-166, score-0.959]
75 In other words, we perform role-specific event detection for each agent; it is done by role-specific constraint flows, each of which focuses only on a single role. [sent-167, score-0.673]
76 The role-specific constraint flow is constructed from the original constraint flow by simply excluding primitives not associated with the target role and merging duplicate vertices, as shown in Fig. [sent-168, score-1.011]
77 We additionally construct the null constraint flow, which is generated by excluding all primitives (Fig. [sent-170, score-0.281]
78 2(d)) to handle outsiders, agents that do not participate in any target events. [sent-171, score-0.399]
79 For each agent in a video, we apply the role-specific event detectors to all primitive detections. [sent-172, score-0.977]
80 groups in principle, but can reduce the search space significantly in practice by integrating spatio-temporal or temporallogical proximity constraint between agents as described in Sec. [sent-175, score-0.436]
81 The scenario and its original constraint flow are the same with those of Fig. [sent-179, score-0.298]
82 (a–b) Procedure for generating γ1-specific constraint flow: We first exclude primitives irrelevant to the role we currently focus on (red); some vertices become identical after the exclusion. [sent-181, score-0.56]
83 The role-specific constraint flow is obtained by merging the duplicate vertices. [sent-182, score-0.265]
84 (c) γ2-specific constraint flow is obtained in the same manner. [sent-183, score-0.207]
85 (d) We additionally construct the null constraint flow to handle outsiders. [sent-184, score-0.233]
86 the i-th agent is for the j-th role based on the corresponding maximum log-posterior probability; the normalized confidence is given by ? [sent-195, score-0.612]
87 k=0 Also, the time interval where the i-th agent plays the jth role is obtained by searching for the first and last activations in β(T? [sent-201, score-0.714]
88 Also, the confidences for the outsider role are concatenated to c0 = [c1,0, . [sent-224, score-0.4]
89 In our framework, a lineup candidate is evaluated in terms of the role confidences and time intervals of the corresponding agent-role combination, which are precomputed in this role-specific event detection and can be reused to evaluate different lineup candidates. [sent-229, score-1.5]
90 Also, lineup candidates with different lengths of occurrence (i. [sent-231, score-0.311]
91 , different number of observations) can be evaluated fairly at the role level because role confidences are not affected by the length of occurrence due to the agent-wise normalization in Eq. [sent-233, score-0.657]
92 The agent-wise role analysis is significantly faster than the na¨ ıve extension of the original event detection because role-specific event detection evaluates each role of each agent independently. [sent-235, score-2.055]
93 However, interactions between roles are ignored in this step, which may result in erro- T? [sent-236, score-0.269]
94 Evaluating role assignment per group For each group, we evaluate the feasibility of the role assignment by considering how appropriate the agents in a group are for the assigned roles and how well their rolespecific time intervals satisfy the scenario constraints. [sent-243, score-1.681]
95 , indicates the role assignment of the i-th member in a binary form, i. [sent-256, score-0.467]
96 , = 1if the j-th role is assigned to the i-th member of the xik,j xik,n]? [sent-258, score-0.351]
97 ck = is a coefficient vector of role confidences corresponding to the members for n roles. [sent-266, score-0.409]
98 Lk is an n2 n2 coefficient matrix to measure the fidelity of the time intervals of the assigned roles to the scenario constraints: wherL∞kn=×⎢⎡⎣i−s∞Lan2k. [sent-267, score-0.456]
99 Specifically, the element of the matrix, [Lik1 ,i2]j1 ,j2 , is negatively proportional to the length of partial time-intervals violating the scenario constraints, between the j1-th role of the i1-th member and the j2-th role of the i2-th member, which is given by Lik1,i2=−α·? [sent-274, score-0.717]
100 Si and Ei are n n matrices whose columns are composed of sgk (i) and egk (i) , respectively, i. [sent-283, score-0.158]
wordName wordTfidf (topN-words)
[('event', 0.539), ('agent', 0.337), ('role', 0.275), ('roles', 0.269), ('agents', 0.26), ('lineup', 0.236), ('primitives', 0.141), ('flow', 0.118), ('assignment', 0.116), ('lineups', 0.105), ('primitive', 0.101), ('target', 0.098), ('intervals', 0.096), ('events', 0.095), ('scenario', 0.091), ('constraint', 0.089), ('rec', 0.087), ('egk', 0.079), ('holdobj', 0.079), ('quinary', 0.079), ('rolespecific', 0.079), ('sgk', 0.079), ('member', 0.076), ('confidences', 0.073), ('del', 0.072), ('interval', 0.072), ('delivery', 0.065), ('localization', 0.062), ('logic', 0.058), ('relationships', 0.057), ('vertices', 0.055), ('badc', 0.052), ('comeclose', 0.052), ('getaway', 0.052), ('getinto', 0.052), ('getoff', 0.052), ('inactive', 0.052), ('irfw', 0.052), ('outsider', 0.052), ('outsiders', 0.052), ('postech', 0.052), ('precedence', 0.052), ('temporallogical', 0.052), ('group', 0.052), ('interpretation', 0.051), ('traverse', 0.051), ('fa', 0.049), ('detection', 0.045), ('feasible', 0.041), ('candidates', 0.041), ('participate', 0.041), ('earlier', 0.041), ('atomic', 0.04), ('ra', 0.04), ('temporal', 0.039), ('members', 0.039), ('combinations', 0.037), ('duplicate', 0.037), ('groups', 0.035), ('abnormality', 0.035), ('occurrence', 0.034), ('gk', 0.034), ('korea', 0.033), ('start', 0.032), ('interpretations', 0.032), ('logical', 0.032), ('participants', 0.03), ('activations', 0.03), ('ending', 0.029), ('occurrences', 0.026), ('conditions', 0.026), ('null', 0.026), ('rs', 0.026), ('excluding', 0.025), ('video', 0.025), ('ei', 0.025), ('infeasible', 0.025), ('active', 0.024), ('tt', 0.024), ('obtai', 0.023), ('mest', 0.023), ('acd', 0.023), ('finished', 0.023), ('neous', 0.023), ('participation', 0.023), ('kwak', 0.023), ('andt', 0.023), ('wfr', 0.023), ('end', 0.023), ('ck', 0.022), ('managed', 0.022), ('nrf', 0.022), ('samsung', 0.022), ('orders', 0.021), ('merging', 0.021), ('programming', 0.021), ('scenarios', 0.021), ('waiting', 0.02), ('elementwise', 0.02), ('satisfaction', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
Author: Suha Kwak, Bohyung Han, Joon Hee Han
Abstract: We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.
2 0.35223502 402 cvpr-2013-Social Role Discovery in Human Events
Author: Vignesh Ramanathan, Bangpeng Yao, Li Fei-Fei
Abstract: We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.
3 0.29548645 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann
Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.
4 0.218779 172 cvpr-2013-Finding Group Interactions in Social Clutter
Author: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
5 0.19714576 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
Author: Jérôme Revaud, Matthijs Douze, Cordelia Schmid, Hervé Jégou
Abstract: This paper presents an approach for large-scale event retrieval. Given a video clip of a specific event, e.g., the wedding of Prince William and Kate Middleton, the goal is to retrieve other videos representing the same event from a dataset of over 100k videos. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. Furthermore, we extend product quantization to complex vectors in order to compress our descriptors, and to compare them in the compressed domain. Our method outperforms the state of the art both in search quality and query time on two large-scale video benchmarks for copy detection, TRECVID and CCWEB. Finally, we introduce a challenging dataset for event retrieval, EVVE, and report the performance on this dataset.
7 0.1309067 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
8 0.11916425 356 cvpr-2013-Representing and Discovering Adversarial Team Behaviors Using Player Roles
9 0.088953212 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition
10 0.087271117 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
11 0.074854195 123 cvpr-2013-Detection of Manipulation Action Consequences (MAC)
12 0.071043588 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
14 0.064677954 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation
15 0.063224815 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
16 0.060113475 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
17 0.056143094 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
18 0.055611126 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
19 0.055567488 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
20 0.053105682 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
topicId topicWeight
[(0, 0.127), (1, -0.035), (2, -0.019), (3, -0.081), (4, -0.05), (5, 0.002), (6, -0.071), (7, -0.047), (8, -0.02), (9, 0.101), (10, 0.079), (11, -0.032), (12, 0.051), (13, -0.028), (14, -0.002), (15, 0.004), (16, 0.041), (17, 0.064), (18, 0.025), (19, -0.164), (20, -0.152), (21, -0.018), (22, -0.015), (23, -0.03), (24, 0.004), (25, 0.012), (26, 0.053), (27, -0.166), (28, -0.004), (29, -0.047), (30, 0.087), (31, -0.04), (32, 0.023), (33, -0.136), (34, -0.093), (35, 0.053), (36, 0.121), (37, 0.097), (38, 0.124), (39, 0.118), (40, 0.168), (41, -0.095), (42, -0.131), (43, -0.069), (44, 0.226), (45, 0.024), (46, -0.139), (47, -0.018), (48, 0.041), (49, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.98183286 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
Author: Suha Kwak, Bohyung Han, Joon Hee Han
Abstract: We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.
2 0.84978658 402 cvpr-2013-Social Role Discovery in Human Events
Author: Vignesh Ramanathan, Bangpeng Yao, Li Fei-Fei
Abstract: We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.
3 0.62209165 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
Author: Jérôme Revaud, Matthijs Douze, Cordelia Schmid, Hervé Jégou
Abstract: This paper presents an approach for large-scale event retrieval. Given a video clip of a specific event, e.g., the wedding of Prince William and Kate Middleton, the goal is to retrieve other videos representing the same event from a dataset of over 100k videos. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. Furthermore, we extend product quantization to complex vectors in order to compress our descriptors, and to compare them in the compressed domain. Our method outperforms the state of the art both in search quality and query time on two large-scale video benchmarks for copy detection, TRECVID and CCWEB. Finally, we introduce a challenging dataset for event retrieval, EVVE, and report the performance on this dataset.
4 0.60754299 172 cvpr-2013-Finding Group Interactions in Social Clutter
Author: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
5 0.59327656 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann
Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.
7 0.40552655 103 cvpr-2013-Decoding Children's Social Behavior
8 0.38946223 356 cvpr-2013-Representing and Discovering Adversarial Team Behaviors Using Player Roles
9 0.3704558 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit
10 0.35974127 413 cvpr-2013-Story-Driven Summarization for Egocentric Video
11 0.34819859 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
12 0.33874482 243 cvpr-2013-Large-Scale Video Summarization Using Web-Image Priors
13 0.335706 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos
15 0.29655907 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
16 0.28982449 41 cvpr-2013-An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision
17 0.28354278 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
18 0.27842215 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition
20 0.26569036 6 cvpr-2013-A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems
topicId topicWeight
[(10, 0.088), (13, 0.172), (16, 0.033), (26, 0.046), (33, 0.21), (39, 0.02), (67, 0.042), (69, 0.17), (77, 0.052), (87, 0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.86428893 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
Author: Suha Kwak, Bohyung Han, Joon Hee Han
Abstract: We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.
2 0.8425979 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
3 0.84029734 172 cvpr-2013-Finding Group Interactions in Social Clutter
Author: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
4 0.82836568 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
Author: Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu
Abstract: In this paper we present an inference procedure for the semantic segmentation of images. Differentfrom many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm is capable of recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.
5 0.82821774 114 cvpr-2013-Depth Acquisition from Density Modulated Binary Patterns
Author: Zhe Yang, Zhiwei Xiong, Yueyi Zhang, Jiao Wang, Feng Wu
Abstract: This paper proposes novel density modulated binary patterns for depth acquisition. Similar to Kinect, the illumination patterns do not need a projector for generation and can be emitted by infrared lasers and diffraction gratings. Our key idea is to use the density of light spots in the patterns to carry phase information. Two technical problems are addressed here. First, we propose an algorithm to design the patterns to carry more phase information without compromising the depth reconstruction from a single captured image as with Kinect. Second, since the carried phase is not strictly sinusoidal, the depth reconstructed from the phase contains a systematic error. We further propose a pixelbased phase matching algorithm to reduce the error. Experimental results show that the depth quality can be greatly improved using the phase carried by the density of light spots. Furthermore, our scheme can achieve 20 fps depth reconstruction with GPU assistance.
7 0.8261807 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
8 0.80905849 135 cvpr-2013-Discriminative Subspace Clustering
9 0.80495113 392 cvpr-2013-Separable Dictionary Learning
10 0.79594541 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
11 0.7869491 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
12 0.77923906 402 cvpr-2013-Social Role Discovery in Human Events
13 0.77571303 347 cvpr-2013-Recognize Human Activities from Partially Observed Videos
14 0.76713115 364 cvpr-2013-Robust Object Co-detection
15 0.76530117 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
16 0.7647199 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
17 0.76417536 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
18 0.76212025 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
19 0.75671011 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
20 0.75608265 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases