iccv iccv2013 iccv2013-216 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dan Xie, Sinisa Todorovic, Song-Chun Zhu
Abstract: This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they “attract” people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or “repel” people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. To detect “dark matter” and infer their “dark energy ” field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage.
Reference: text
sentIndex sentText sentNum sentScore
1 ucl a Abstract This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. [sent-10, score-0.348]
2 Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. [sent-11, score-0.212]
3 , vending machines could quench thirst), or “repel” people to avoid them (e. [sent-14, score-0.282]
4 Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. [sent-17, score-0.683]
5 People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. [sent-19, score-0.67]
6 We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. [sent-20, score-0.312]
7 A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. [sent-21, score-0.654]
8 Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage. [sent-23, score-0.898]
9 Introduction This paper considers the problem oflocalizing functional objects and scene surfaces in surveillance videos of public spaces, such as courtyards and squares. [sent-25, score-0.504]
10 For instance, people may move toward certain objects (e. [sent-27, score-0.267]
11 In our low-resolution surveillance videos, these functional objects and surfaces cannot be reliably recognized by their appearance and shape. [sent-37, score-0.331]
12 Therefore, by analogy to cosmology, we regard these unrecognizable functional objects as sources of “dark energy”, i. [sent-39, score-0.296]
13 , “dark matter”, which exert attraction and repulsion forces on people. [sent-41, score-0.257]
14 Recognizing functional objects is a long standing problem in vision, with slower progress in the past decade, in contrast to impressive advances in appearance-based recognition. [sent-42, score-0.234]
15 Instead, we analyze human behavior in the video by predicting people’s intents and motion trajectories, and thus localize sources of “dark energy” that drive the scene dynamics. [sent-45, score-0.529]
16 To approach this problem, we leverage the Lagrangian mechanics (LM) by treating the scene as a physical system. [sent-46, score-0.236]
17 In such a system, people can be viewed as charged particles moving along a mixture of repulsion and attraction energy fields generated by “dark matter”. [sent-47, score-0.572]
18 The classical LM, however, provides a poor model of human behavior, because it wrongly predicts that people always move toward the closest “dark matter”, by the principle of least action. [sent-48, score-0.327]
19 Specifically, we make the assumption that people intentionally approach functional objects (to satisfy their needs). [sent-50, score-0.381]
20 This amounts to enabling the charged particles in ALM to become agents who can personalize the strengths of “dark energy” fields by appropriately weighting them. [sent-51, score-0.42]
21 We analyze human latent intents and trajectories to localize “dark matter”. [sent-55, score-0.555]
22 For some people (bottom right person) we observe only an initial part of their trajectory (green). [sent-56, score-0.363]
23 Since our focus is on videos of wide public spaces, we expect that people know the layout of obstacles, walkable, and non-walkable areas in the scene, either from previous experience or simply by observing the scene. [sent-59, score-0.339]
24 This allows the agents to globally optimize their trajectories in the attraction energy field of their choice. [sent-60, score-0.762]
25 In each iteration, MCMC samples the number and locations of functional objects and people’s goals. [sent-71, score-0.269]
26 Since people are assumed to know the scene layout, every person’s full trajectory can be predicted as a globally optimal Dijkstra path on the scene lattice. [sent-73, score-0.581]
27 We present experimental evaluation on surveillance videos from the VIRAT [16] and UCLA Courtyard [3] datasets, as well as on our two webcam videos of public squares. [sent-75, score-0.207]
28 We also compare our predictions of human trajectories with those of existing approaches. [sent-77, score-0.264]
29 We instead predict a person’s goal and full trajectory to localize functional objects. [sent-93, score-0.419]
30 For prediction of human trajectories, [11] uses a deterministic vector field of people’s movements, while our “dark energy” fields are stochastic. [sent-96, score-0.222]
31 A linear dynamic system of [27, 28] models smooth trajectories of pedestrians in crowded scenes, and thus cannot handle sudden turns and detours caused by obstacles, as required in our setting. [sent-97, score-0.215]
32 In graphics, relatively simplistic models of agents are used to simulate people’s trajectories in a virtual crowd [14, 15, 18]. [sent-98, score-0.583]
33 The Lagrangian particle dynamics of crowd flows [1, 2] and the optical-flow based dynamics of crowd behaviors [22] do not account for individual human intents. [sent-100, score-0.3]
34 [7] reconstructs an unobserved trajectory part between two observed parts by finding the shortest path. [sent-101, score-0.291]
35 Optimal path search of [21], and reinforcement learning and inverse reinforcement learning of [4, 12] explicitly reason about people’s goals for predicting human trajectories. [sent-103, score-0.238]
36 , people may walk on one grass lawn, but are forbidden to step on the other). [sent-111, score-0.198]
37 Background: Lagrangian Mechanics The Lagrangian mechanics (LM) studies particles with mass, m, and velocity, x˙ (t), in time t, at positions x(t) = (x(t) , y(t)) in a force field affecting the motion of the particles. [sent-116, score-0.379]
38 In ALM, the physical system consists of a set of force sources. [sent-133, score-0.197]
39 Our first extension enables the particles to become agents with free will to select a particular force source from the set which can drive their motion. [sent-134, score-0.503]
40 Our second extension endows the agents with knowledge about the layout map of the physical system. [sent-135, score-0.375]
41 Consequently, they can globally plan their trajectories so as to efficiently navigate toward the selected force source, by the Principle of Least Action, avoiding known obstacles along the way. [sent-136, score-0.608]
42 (2) Given (x(t)), we use the Dijkstra algorithm for finding a globally optimal solution of (2), since the agents can globally plan their trajectories. [sent-154, score-0.346]
43 It follows that estimating the agents’ intents and trajectories can be readily used for estimating the functional map of the physical system. [sent-156, score-0.752]
44 c Txh)e = =all −ow1,e idf xloc isat nioonnsof agents in the scene are ΛC = {x : x ∈ Λ, c(x)=1}. [sent-172, score-0.371]
45 ∈TΛh,ex sources of “dark energy”, sj ∈ S, and characteriz? [sent-182, score-0.258]
46 ed by sj = (μj , Σj), where μj ∈ Λ i∈s t She, location of sj, and Σj is a 2 2 spatial covaria∈nce Λ m isat trhiex of sj ’s force field. [sent-183, score-0.579]
47 s ηEa >ch 0 agent ai ∈ ,A an can pursue only one goal, i. [sent-188, score-0.204]
48 , move toward one source sj c∈a nS p, uarts a eti omnely. [sent-190, score-0.322]
49 T onhee agents cannot change their goals until∈ they rte aac thim eth. [sent-191, score-0.312]
50 If ai ∈ A wants to reach sj ∈ S, we spec- ify that their relationship rij t=s or( raeia, sj h) s =∈ ∈1 ;S o, wtheer swpiesec,rij = 0. [sent-193, score-0.605]
51 The end-moments of these intervals can be identified when ai arrives at or leaves from sj . [sent-195, score-0.289]
52 f selecting sj ∈ S, and each sj ∈ S can be selected bj times to serve as a goal dndes teiancahtion, bj = 1(rij = 1), j = 1, . [sent-205, score-0.392]
53 The sum of all these Gaussian force fields on Λ forms the joint repulsion force field, Attraction Forces. [sent-212, score-0.434]
54 Each sj ∈ S generates an attraction Gaussian force field, where the force magnitude, ? [sent-213, score-0.589]
55 When ai ∈ A |selec(txs a particular sj ∈ S, ai is affected by the corresponding cumulative force f∈ield S:, F? [sent-219, score-0.529]
56 (5) Note that by the classical LM, all the agents would be affected by a sum of all force fields: = + F? [sent-223, score-0.449]
57 Note that an instantiation of latent variables C, S, R uniquely defines the force field F? [sent-229, score-0.272]
58 If rij = 1then ai moves toward sj along trajectory Γij = [xi, . [sent-232, score-0.68]
59 , xj], where xi is ai’s starting location, and xj is sj ’s location. [sent-235, score-0.196]
60 3, the agents can globally optimize Athesi re paths, db e icnau Sseec they are afagemniltisa rc awnit ghl othbescene map. [sent-239, score-0.303]
61 But other hypothetical trajectories in the vector field may also get non-zero likelihoods. [sent-253, score-0.273]
62 When ai is far away from sj, the total energy needed to cover that trajectory is bound to be large, and consequently uncertainty about ai’s trajectory is large. [sent-254, score-0.612]
63 Conversely, as ai gets closer to sj, uncertainty about the trajectory reduces. [sent-255, score-0.309]
64 Given a video, observable random variables include a set of appearance features, I, and a set of partially observed, noisy human trajectories Γ(0) . [sent-268, score-0.317]
65 Inference Given {I, Γ(0)}, we infer W = {C, S, R, Γ} – namely, we Gesivtiemna {teI ,thΓe co}n, swtrea iinntf map, t=he { Cnu,mSb,eRr, Γan}d layout loy,f dark matter, hidden human intents, and predict human full trajectories until they reach their goal destinations in the unobserved video parts. [sent-277, score-0.796]
66 The rows show in raster scan the progression of proposals of the constraint map C (the white regions indicate obstacles), sources S, relationships R, and trajectory estimates (color indicates P(Γij |C, S, R)) of the same person considered in Fig. [sent-286, score-0.339]
67 In the last iter|aCtio,nS (bottom right), e M pCerMsoCn estimates that the person’s goal is to approach the top-left of the scene, and finds two equally likely trajectories to this goal. [sent-288, score-0.215]
68 1 with the overlaid trajectory predictions of a person who starts at the top-left of the scene, and wants to reach the dark matter in the middle-right of the scene (the food truck). [sent-291, score-0.908]
69 2 (on the left) and λ = 1(on the right) of the likelihood P(Γij |C, S, R) gives similar trajectory predictions. [sent-293, score-0.216]
70 m lTyh es itntiitnigal c n(xu)mb =er − −N1 of sources in S is probabilistically sampled from the Poisson distribution of (3), while their layout is estimated as N most frequent stopping locations in Γ(0) . [sent-322, score-0.216]
71 The deathjump randomly chooses an existing source sj ∈ S and removes it from S, resulting in S? [sent-341, score-0.279]
72 is exclusively governed by the Poisson prior of (3), and trajectory likelihoods P(Γij |C? [sent-347, score-0.257]
73 The Proposal of R’ randomly chooses one person ai ∈ A with goal sj, and randomly changes ai’s goal to sk ∈∈ S. [sent-350, score-0.2]
74 We present three types of results: (a) localization of “dark matter” S, (b) estimation of human intents R, and (c) trajectory prediction Annotating ground truth of constraint map C in a scene is difficult, since human annotators provide inconsistent subjective estimates. [sent-372, score-0.663]
75 Negative Log-Likelihood (NLL) and Modified Hausdorff Distance (MHD) are measured to evaluate trajectory prediction. [sent-378, score-0.216]
76 P(x(t+1) |x(t) ) is given by (7), NLL of a 22222288 true trajectory X = {x(1), ··· , x(T)} is defined as NLLP(X) = −T −1 1t? [sent-379, score-0.216]
77 T=−11log(P(x(t+1)|x(t) ) (10) MHD between true trajectory X and our sampled trajectory Y = {y(1), · · · , y(T)} is defined as MHDd((XX,,YY)) == m|X1|a? [sent-380, score-0.432]
78 x(dx∈(XX,mYi)n,yd∈(YY||,xX −) y| (11) We present the average MHD between the true trajectory and our 5000 trajectory prediction samples. [sent-381, score-0.486]
79 For evaluating detection of S, we use the standard overlap criterion of our detection and ground-truth bounding box around the functional object of interest. [sent-382, score-0.203]
80 For evaluation of predicting human intents R, we allow our inference access to an initial part of the video footage, in which R is not observable, and then compare our results with ground-truth outcomes of R in the remaining (unobserved) video parts. [sent-385, score-0.409]
81 a rTkh imsa btatesre”li naet which the observed people trajectories in Γ(0) ended, and people stayed still at that location longer than 5sec before changing their trajectory. [sent-389, score-0.509]
82 (1 al)s Soh uosrete thste path (SP) estimates the trajectory as a straight line, disregarding obstacles in the scene. [sent-393, score-0.397]
83 So we compare only with the state-of-the-art method for trajectory prediction [12]. [sent-399, score-0.27]
84 In our setting, the first 50% of a video is observed, and human trajectories in the entire video is to be predicted. [sent-401, score-0.358]
85 Second row is number of agents |A|, first column is number of sources |S| . [sent-430, score-0.322]
86 The scene complexity is defined in terms of the number of agents in the scene and the number of sources. [sent-436, score-0.402]
87 The toy example is in a rectangle random layout, the ratio of obstacle pixels over all pixels is about 15%, the ratio of observed part of trajectories is about 50%. [sent-438, score-0.28]
88 it Wiale (partial) 50% of the video footage, 300 trajectories in ? [sent-458, score-0.262]
89 The 2nd column is the estimated layout of obstacles (the white masks) and dark matter (the Gaussians). [sent-464, score-0.727]
90 87a 5%t193hes dark matter can be recognized through human activities. [sent-485, score-0.599]
91 The trajectory prediction (NLL and MHD) is more accurate is constrained scene ( ? [sent-487, score-0.341]
92 (2) We handle chal- lenging scenes with arbitrary layouts of dark matter, both in the middle of the scene and at its boundaries. [sent-504, score-0.405]
93 These noisy results are significantly im- bad trajectory prediction. [sent-512, score-0.216]
94 Also, in these cases, the predicted trajectories are also not far away from the true trajectories measured by MHD and NLL. [sent-518, score-0.43]
95 Conclusion We have addressed a new problem, that of localizing functional objects in surveillance videos without using training examples of objects. [sent-524, score-0.348]
96 Instead of appearance features, human behavior is analyzed for identifying the functional map of the scene. [sent-525, score-0.286]
97 We have extended the classical Lagrangian mechanics to model the scene as a physical system wherein: i) functional objects exert attraction forces on people’s motions, and ii) people are not inanimate particles but agents who can have intents to approach particular func- tional objects. [sent-526, score-1.42]
98 Given a small excerpt from the video, our approach estimates the constraint map of non-walkable locations in the scene, the number and layout of functional objects, and human intents, as well as predicts human trajectories in the unobserved parts of the video footage. [sent-527, score-0.77]
99 Observing humanobject interactions: using spatial and functional compatibility for recognition. [sent-591, score-0.203]
100 Group behavior from video : A data-driven approach to crowd simulation. [sent-630, score-0.189]
wordName wordTfidf (topN-words)
[('dark', 0.296), ('agents', 0.26), ('intents', 0.224), ('matter', 0.218), ('trajectory', 0.216), ('trajectories', 0.215), ('functional', 0.203), ('sj', 0.196), ('ij', 0.18), ('obstacles', 0.148), ('force', 0.147), ('people', 0.147), ('mhd', 0.134), ('lagrangian', 0.134), ('rij', 0.12), ('mechanics', 0.115), ('agent', 0.111), ('mcmc', 0.109), ('crowd', 0.108), ('attraction', 0.099), ('ai', 0.093), ('proposal', 0.091), ('lm', 0.09), ('vending', 0.09), ('energy', 0.087), ('repulsion', 0.079), ('courtyard', 0.079), ('dijkstra', 0.079), ('unobserved', 0.075), ('scene', 0.071), ('lawns', 0.067), ('nll', 0.067), ('latent', 0.067), ('toy', 0.065), ('layout', 0.065), ('alm', 0.064), ('sources', 0.062), ('person', 0.061), ('fields', 0.061), ('surveillance', 0.061), ('birth', 0.06), ('truck', 0.06), ('particles', 0.059), ('field', 0.058), ('toward', 0.055), ('probabilistically', 0.054), ('prediction', 0.054), ('observable', 0.053), ('videos', 0.053), ('goals', 0.052), ('grass', 0.051), ('physical', 0.05), ('human', 0.049), ('gm', 0.047), ('video', 0.047), ('yy', 0.046), ('chooses', 0.046), ('monte', 0.046), ('functionality', 0.046), ('food', 0.046), ('courtyards', 0.045), ('downgrades', 0.045), ('exert', 0.045), ('quench', 0.045), ('thirst', 0.045), ('virat', 0.045), ('walkable', 0.045), ('intent', 0.044), ('globally', 0.043), ('predicting', 0.042), ('acceptance', 0.042), ('jump', 0.042), ('classical', 0.042), ('governed', 0.041), ('public', 0.04), ('charged', 0.04), ('inanimate', 0.04), ('isat', 0.04), ('scenes', 0.038), ('source', 0.037), ('death', 0.037), ('sinisa', 0.037), ('recognized', 0.036), ('behaviors', 0.035), ('locations', 0.035), ('forces', 0.034), ('behavior', 0.034), ('move', 0.034), ('areas', 0.034), ('path', 0.033), ('isc', 0.033), ('action', 0.033), ('specifies', 0.032), ('excerpt', 0.032), ('objects', 0.031), ('gall', 0.031), ('sca', 0.031), ('reinforcement', 0.031), ('estimating', 0.03), ('activity', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
Author: Dan Xie, Sinisa Todorovic, Song-Chun Zhu
Abstract: This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they “attract” people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or “repel” people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. To detect “dark matter” and infer their “dark energy ” field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage.
2 0.21830054 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
3 0.18445881 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
Author: Ali Elqursh, Ahmed Elgammal
Abstract: The vast majority of work on motion segmentation adopts the affine camera model due to its simplicity. Under the affine model, the motion segmentation problem becomes that of subspace separation. Due to this assumption, such methods are mainly offline and exhibit poor performance when the assumption is not satisfied. This is made evident in state-of-the-art methods that relax this assumption by using piecewise affine spaces and spectral clustering techniques to achieve better results. In this paper, we formulate the problem of motion segmentation as that of manifold separation. We then show how label propagation can be used in an online framework to achieve manifold separation. The performance of our framework is evaluated on a benchmark dataset and achieves competitive performance while being online.
4 0.17259824 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
Author: Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Abstract: Due to occlusions and objects ’ non-rigid deformation in the scene, the obtained motion trajectories from common trackers may contain a number of missing or mis-associated entries. To cluster such corrupted point based trajectories into multiple motions is still a hard problem. In this paper, we present an approach that exploits temporal and spatial characteristics from tracked points to facilitate segmentation of incomplete and corrupted trajectories, thereby obtain highly robust results against severe data missing and noises. Our method first uses the Discrete Cosine Transform (DCT) bases as a temporal smoothness constraint on trajectory projection to ensure the validity of resulting components to repair pathological trajectories. Then, based on an observation that the trajectories of foreground and background in a scene may have different spatial distributions, we propose a two-stage clustering strategy that first performs foreground-background separation then segments remaining foreground trajectories. We show that, with this new clustering strategy, sequences with complex motions can be accurately segmented by even using a simple trans- lational model. Finally, a series of experiments on Hopkins 155 dataset andBerkeley motion segmentation dataset show the advantage of our method over other state-of-the-art motion segmentation algorithms in terms of both effectiveness and robustness.
5 0.17000645 68 iccv-2013-Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
Author: Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
Abstract: This paper addresses the novel and challenging problem of aligning camera views that are unsynchronized by low and/or variable frame rates using object trajectories. Unlike existing trajectory-based alignment methods, our method does not require frame-to-frame synchronization. Instead, we propose using the intersections of corresponding object trajectories to match views. To find these intersections, we introduce a novel trajectory matching algorithm based on matching Spatio-Temporal Context Graphs (STCGs). These graphs represent the distances between trajectories in time and space within a view, and are matched to an STCG from another view to find the corresponding trajectories. To the best of our knowledge, this is one of the first attempts to align views that are unsynchronized with variable frame rates. The results on simulated and real-world datasets show trajectory intersections area viablefeatureforcamera alignment, and that the trajectory matching method performs well in real-world scenarios.
6 0.16232693 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
7 0.14849614 39 iccv-2013-Action Recognition with Improved Trajectories
8 0.14048827 263 iccv-2013-Measuring Flow Complexity in Videos
9 0.1300163 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
10 0.1058419 395 iccv-2013-Slice Sampling Particle Belief Propagation
11 0.10535735 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
12 0.10386221 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
13 0.10149293 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
14 0.098701313 167 iccv-2013-Finding Causal Interactions in Video Sequences
15 0.098117046 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
16 0.089196511 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
17 0.088890634 418 iccv-2013-The Way They Move: Tracking Multiple Targets with Similar Appearance
18 0.087819152 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
19 0.086936444 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
20 0.083058164 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
topicId topicWeight
[(0, 0.211), (1, 0.004), (2, 0.046), (3, 0.13), (4, 0.035), (5, 0.046), (6, -0.016), (7, 0.063), (8, 0.098), (9, 0.043), (10, -0.001), (11, -0.012), (12, -0.014), (13, 0.029), (14, -0.004), (15, -0.0), (16, -0.026), (17, 0.058), (18, -0.017), (19, 0.006), (20, -0.151), (21, -0.049), (22, 0.155), (23, 0.122), (24, 0.047), (25, 0.092), (26, 0.1), (27, -0.047), (28, 0.06), (29, 0.039), (30, 0.028), (31, 0.029), (32, 0.048), (33, 0.047), (34, 0.123), (35, -0.069), (36, -0.12), (37, 0.038), (38, 0.045), (39, -0.0), (40, 0.051), (41, 0.013), (42, 0.025), (43, -0.016), (44, -0.012), (45, -0.016), (46, -0.088), (47, 0.026), (48, -0.039), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.95698273 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
Author: Dan Xie, Sinisa Todorovic, Song-Chun Zhu
Abstract: This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they “attract” people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or “repel” people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. To detect “dark matter” and infer their “dark energy ” field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage.
2 0.71342033 68 iccv-2013-Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
Author: Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
Abstract: This paper addresses the novel and challenging problem of aligning camera views that are unsynchronized by low and/or variable frame rates using object trajectories. Unlike existing trajectory-based alignment methods, our method does not require frame-to-frame synchronization. Instead, we propose using the intersections of corresponding object trajectories to match views. To find these intersections, we introduce a novel trajectory matching algorithm based on matching Spatio-Temporal Context Graphs (STCGs). These graphs represent the distances between trajectories in time and space within a view, and are matched to an STCG from another view to find the corresponding trajectories. To the best of our knowledge, this is one of the first attempts to align views that are unsynchronized with variable frame rates. The results on simulated and real-world datasets show trajectory intersections area viablefeatureforcamera alignment, and that the trajectory matching method performs well in real-world scenarios.
3 0.70289093 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
Author: Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Abstract: Due to occlusions and objects ’ non-rigid deformation in the scene, the obtained motion trajectories from common trackers may contain a number of missing or mis-associated entries. To cluster such corrupted point based trajectories into multiple motions is still a hard problem. In this paper, we present an approach that exploits temporal and spatial characteristics from tracked points to facilitate segmentation of incomplete and corrupted trajectories, thereby obtain highly robust results against severe data missing and noises. Our method first uses the Discrete Cosine Transform (DCT) bases as a temporal smoothness constraint on trajectory projection to ensure the validity of resulting components to repair pathological trajectories. Then, based on an observation that the trajectories of foreground and background in a scene may have different spatial distributions, we propose a two-stage clustering strategy that first performs foreground-background separation then segments remaining foreground trajectories. We show that, with this new clustering strategy, sequences with complex motions can be accurately segmented by even using a simple trans- lational model. Finally, a series of experiments on Hopkins 155 dataset andBerkeley motion segmentation dataset show the advantage of our method over other state-of-the-art motion segmentation algorithms in terms of both effectiveness and robustness.
4 0.68295103 263 iccv-2013-Measuring Flow Complexity in Videos
Author: Saad Ali
Abstract: In this paper a notion of flow complexity that measures the amount of interaction among objects is introduced and an approach to compute it directly from a video sequence is proposed. The approach employs particle trajectories as the input representation of motion and maps it into a ‘braid’ based representation. The mapping is based on the observation that 2D trajectories of particles take the form of a braid in space-time due to the intermingling among particles over time. As a result of this mapping, the problem of estimating the flow complexity from particle trajectories becomes the problem of estimating braid complexity, which in turn can be computed by measuring the topological entropy of a braid. For this purpose recently developed mathematical tools from braid theory are employed which allow rapid computation of topological entropy of braids. The approach is evaluated on a dataset consisting of open source videos depicting variations in terms of types of moving objects, scene layout, camera view angle, motion patterns, and object densities. The results show that the proposed approach is able to quantify the complexity of the flow, and at the same time provides useful insights about the sources of the complexity.
5 0.6604808 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
Author: Ali Elqursh, Ahmed Elgammal
Abstract: The vast majority of work on motion segmentation adopts the affine camera model due to its simplicity. Under the affine model, the motion segmentation problem becomes that of subspace separation. Due to this assumption, such methods are mainly offline and exhibit poor performance when the assumption is not satisfied. This is made evident in state-of-the-art methods that relax this assumption by using piecewise affine spaces and spectral clustering techniques to achieve better results. In this paper, we formulate the problem of motion segmentation as that of manifold separation. We then show how label propagation can be used in an online framework to achieve manifold separation. The performance of our framework is evaluated on a benchmark dataset and achieves competitive performance while being online.
6 0.6375894 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
7 0.57953751 418 iccv-2013-The Way They Move: Tracking Multiple Targets with Similar Appearance
8 0.56621158 226 iccv-2013-Joint Subspace Stabilization for Stereoscopic Video
9 0.5538035 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
10 0.5336808 39 iccv-2013-Action Recognition with Improved Trajectories
11 0.52975672 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
12 0.52484089 145 iccv-2013-Estimating the Material Properties of Fabric from Video
13 0.49440336 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
14 0.47270972 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
15 0.45590743 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
16 0.44294989 395 iccv-2013-Slice Sampling Particle Belief Propagation
17 0.44233924 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering
18 0.44048283 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
19 0.43802375 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
20 0.43605703 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
topicId topicWeight
[(2, 0.03), (12, 0.012), (26, 0.058), (31, 0.034), (42, 0.078), (48, 0.013), (64, 0.033), (73, 0.025), (89, 0.613)]
simIndex simValue paperId paperTitle
1 0.99892795 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
Author: Qian-Yi Zhou, Stephen Miller, Vladlen Koltun
Abstract: We present an approach to reconstruction of detailed scene geometry from range video. Range data produced by commodity handheld cameras suffers from high-frequency errors and low-frequency distortion. Our approach deals with both sources of error by reconstructing locally smooth scene fragments and letting these fragments deform in order to align to each other. We develop a volumetric registration formulation that leverages the smoothness of the deformation to make optimization practical for large scenes. Experimental results demonstrate that our approach substantially increases the fidelity of complex scene geometry reconstructed with commodity handheld cameras.
2 0.99755359 81 iccv-2013-Combining the Right Features for Complex Event Recognition
Author: Kevin Tang, Bangpeng Yao, Li Fei-Fei, Daphne Koller
Abstract: In this paper, we tackle the problem of combining features extracted from video for complex event recognition. Feature combination is an especially relevant task in video data, as there are many features we can extract, ranging from image features computed from individual frames to video features that take temporal information into account. To combine features effectively, we propose a method that is able to be selective of different subsets of features, as some features or feature combinations may be uninformative for certain classes. We introduce a hierarchical method for combining features based on the AND/OR graph structure, where nodes in the graph represent combinations of different sets of features. Our method automatically learns the structure of the AND/OR graph using score-based structure learning, and we introduce an inference procedure that is able to efficiently compute structure scores. We present promising results and analysis on the difficult and large-scale 2011 TRECVID Multimedia Event Detection dataset [17].
same-paper 3 0.99749267 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
Author: Dan Xie, Sinisa Todorovic, Song-Chun Zhu
Abstract: This paper presents an approach to localizing functional objects in surveillance videos without domain knowledge about semantic object classes that may appear in the scene. Functional objects do not have discriminative appearance and shape, but they affect behavior of people in the scene. For example, they “attract” people to approach them for satisfying certain needs (e.g., vending machines could quench thirst), or “repel” people to avoid them (e.g., grass lawns). Therefore, functional objects can be viewed as “dark matter”, emanating “dark energy ” that affects people ’s trajectories in the video. To detect “dark matter” and infer their “dark energy ” field, we extend the Lagrangian mechanics. People are treated as particle-agents with latent intents to approach “dark matter” and thus satisfy their needs, where their motions are subject to a composite “dark energy ” field of all functional objects in the scene. We make the assumption that people take globally optimal paths toward the intended “dark matter” while avoiding latent obstacles. A Bayesian framework is used to probabilistically model: people ’s trajectories and intents, constraint map of the scene, and locations of functional objects. A data-driven Markov Chain Monte Carlo (MCMC) process is used for inference. Our evaluation on videos of public squares and courtyards demonstrates our effectiveness in localizing functional objects and predicting people ’s trajectories in unobserved parts of the video footage.
4 0.9950875 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
5 0.99505633 103 iccv-2013-Deblurring by Example Using Dense Correspondence
Author: Yoav Hacohen, Eli Shechtman, Dani Lischinski
Abstract: This paper presents a new method for deblurring photos using a sharp reference example that contains some shared content with the blurry photo. Most previous deblurring methods that exploit information from other photos require an accurately registered photo of the same static scene. In contrast, our method aims to exploit reference images where the shared content may have undergone substantial photometric and non-rigid geometric transformations, as these are the kind of reference images most likely to be found in personal photo albums. Our approach builds upon a recent method for examplebased deblurring using non-rigid dense correspondence (NRDC) [11] and extends it in two ways. First, we suggest exploiting information from the reference image not only for blur kernel estimation, but also as a powerful local prior for the non-blind deconvolution step. Second, we introduce a simple yet robust technique for spatially varying blur estimation, rather than assuming spatially uniform blur. Unlike the aboveprevious method, which hasproven successful only with simple deblurring scenarios, we demonstrate that our method succeeds on a variety of real-world examples. We provide quantitative and qualitative evaluation of our method and show that it outperforms the state-of-the-art.
6 0.99338639 302 iccv-2013-Optimization Problems for Fast AAM Fitting in-the-Wild
7 0.99333692 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
8 0.990372 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
9 0.98567826 2 iccv-2013-3D Scene Understanding by Voxel-CRF
10 0.98305023 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
11 0.97697097 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
12 0.96573526 129 iccv-2013-Dynamic Scene Deblurring
13 0.96308351 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
14 0.96230817 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
15 0.96008003 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
16 0.95878619 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
17 0.95817053 317 iccv-2013-Piecewise Rigid Scene Flow
18 0.95618534 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
19 0.9554714 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
20 0.95474005 226 iccv-2013-Joint Subspace Stabilization for Stereoscopic Video