iccv iccv2013 iccv2013-34 knowledge-graph by maker-knowledge-mining

34 iccv-2013-Abnormal Event Detection at 150 FPS in MATLAB


Source: pdf

Author: Cewu Lu, Jianping Shi, Jiaya Jia

Abstract: Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140∼150 frames per soenc obnednc on average wsehtesn a computing on an ordinary desktop PC using MATLAB.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk a} Abstract Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. [sent-4, score-0.645]

2 Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. [sent-5, score-0.261]

3 It achieves decent performance in the detection phase without compromising result quality. [sent-6, score-0.153]

4 The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. [sent-7, score-0.061]

5 Our method reaches high detection rates on benchmark datasets at a speed of 140∼150 frames per soenc obnednc on average wsehtesn a computing on an ordinary desktop PC using MATLAB. [sent-8, score-0.129]

6 Introduction With the increasing demand of security, surveillance cameras are commonly deployed. [sent-10, score-0.162]

7 Detecting abnormal events is one critical task based on what cameras capture, which is traditionally labor-intensive and requires non-stop human attention. [sent-11, score-0.396]

8 What makes this interminable and boring process worse is that abnormal events generally happen with a frustratingly small chance, making over 99. [sent-12, score-0.533]

9 This predicament catalyzes important research in computer vision, aiming to find abnormal events automatically [8, 1, 13, 23, 16, 21, 11, 22, 5]. [sent-14, score-0.44]

10 Research in this area commonly follows the line that normal patterns are first learned from training videos, and are then used to detect events deviated from this representation. [sent-16, score-0.333]

11 Specifically, in [8, 20], extracted trajectories were utilized by tracking object-of-interest to represent normal patterns. [sent-17, score-0.115]

12 Another line is to learn normal low-level video feature distributions, such as exponential [1], multivariate Gaussian mixture [13] or clustering as a special case [23, 16]. [sent-19, score-0.115]

13 Graph model normal event representations were proposed in [21, 11, 10, 3, 19, 15, 6, 2], which utilize co-occurrence patterns. [sent-20, score-0.242]

14 Among these methods, normal patterns were fitted in a spacetime Markov random field in [21, 11, 10, 3]. [sent-21, score-0.159]

15 Kwon and Lee [9] used a graph editing framework for abnormal event detection. [sent-22, score-0.445]

16 Recently, sparse representation [12, 17] has attracted much attention and sparsity-based abnormality detection models [22, 5] achieved state-of-theart performance reported in many datasets. [sent-24, score-0.238]

17 Although realtime processing is a key criterion to an practically employable system given continuously captured videos, most sparsity-based methods cannot be performed fast enough. [sent-25, score-0.048]

18 The major obstruction to high efficiency is the inherently intensive computation to build the sparse representation. [sent-26, score-0.084]

19 Note a slow process could delay alarm and postpone response to special events. [sent-27, score-0.039]

20 We provide brief analysis below about this issue with respect to the general sparsity strategies and present our new framework with an effective representation. [sent-28, score-0.1]

21 It fits the structure of surveillance video data and leads to an extremely cheap testing cost. [sent-29, score-0.202]

22 Sparsity Based Abnormality Detection Sparsity is a general constraint [22, 5] to model normal event patterns as a linear combination of a set of basis atoms. [sent-32, score-0.524]

23 We analyze abnormality detection in one local region to show that this process is computationally expensive by nature. [sent-33, score-0.154]

24 , xn] extracted from the history video sequence in a region, a normal pattern dictionary D ∈ Rp×q is learned with a sparsity prior. [sent-37, score-0.299]

25 In tthioen testing phase for a new feature x, we reconstruct it by sparsely combining elements in D, expressed as mβin? [sent-38, score-0.152]

26 0≤ s (1) where β ∈ Rq×1 contains sparse coefficients. [sent-44, score-0.084]

27 p0a riss eth ceo sparsity regularization term; a dnadta s i(tt? [sent-49, score-0.1]

28 With ttherism representation, an paabnraomrmetearl pattern can baers naturally defined as one with large error resulted from ? [sent-51, score-0.057]

29 Ei }is atrhee corresponding lneaatsito square trhec eoancshtru Sctio∈n error. [sent-62, score-0.061]

30 Previous work verified that this form can lead to high detection accuracy. [sent-66, score-0.076]

31 Efficiency Problem A high testing cost is inevitable when adopting Eq. [sent-67, score-0.215]

32 (1), which aims to find the suitable basis vectors (with scale s) from the dictionary (with scale q) to represent testing data x. [sent-68, score-0.294]

33 The search space is very large, as (sq) different combinations exist. [sent-69, score-0.203]

34 Although much effort has been put to reducing the dictionary size [5] and adopting fast sparse coding solvers [22], in general, seconds are needed to process a frame as reported in prior papers. [sent-70, score-0.349]

35 A realtime process needs to be 100 times faster than the current fastest sparsity-based methods, which is difficult without tremendous hardware advancement. [sent-72, score-0.048]

36 Our method yields decent performance and naturally accelerates sparse coding by 400+ times even using MATLAB implementation. [sent-74, score-0.246]

37 Our Contribution We propose sparse combination learning for detection. [sent-77, score-0.213]

38 With high structure redundancy in surveillance videos, instead of coding sparsity by finding an s basis combination from D in Eq. [sent-78, score-0.55]

39 (1), we code it directly as a set of possible combinations of basis vectors. [sent-79, score-0.312]

40 Each combination here corresponds to a set with s dictionary bases in Eq. [sent-80, score-0.252]

41 With this change, other than searching s bases from p of them for each testing feature, we only need to find the most suitable combination by evaluating the least square error. [sent-82, score-0.33]

42 This framework is efficient since only small-scale least square optimization is required in detection with simple matrix projection. [sent-85, score-0.099]

43 In our experiments, testing is on a small number of combinations, each takes 10−6 ∼ 10−7 second in MATLAB. [sent-86, score-0.101]

44 The effectiveness of our approach is well guaranteed by the inherent sparsity constraint on the combination size. [sent-87, score-0.229]

45 Compared to original sparse coding, our model is more faithful to the input data. [sent-90, score-0.119]

46 When freely selecting s basis vectors from a total of q vectors by Eq. [sent-91, score-0.109]

47 But in our trained combinations, it is unlikely to happen, since each combination finds its corresponding input data, better constraining reconstruction quality. [sent-93, score-0.284]

48 Our method therefore is robust to distinguish between normal and abnormal patterns. [sent-94, score-0.433]

49 We have verified our model on a large set of surveillance videos in Sec. [sent-95, score-0.193]

50 We also benchmark it on existing datasets for abnormal event detection. [sent-98, score-0.445]

51 It reaches 140∼150 FdPatSa using a desktop lw eitvhe n3t. [sent-99, score-0.091]

52 Method We describe our framework that learns sparse basis combinations. [sent-103, score-0.193]

53 Only features at the same spatial location in the video frames are used together for training and testing. [sent-112, score-0.059]

54 Learning Combinations on Training Data For each cube location, 3D gradient features in all frames are denoted as X = {x1, . [sent-115, score-0.056]

55 xOur goal is} }to ∈ ∈find R a sparse basis combination set S = {S1, . [sent-119, score-0.322]

56 , SK} with each Si ∈ Rp× containing s dictionary basis vectors, forming a unique combination, where s ? [sent-122, score-0.193]

57 Each Si belongs to a closed, s convex aatniodn b,o wuhnedreed s set, w q. [sent-124, score-0.044]

58 The first goal effective representation is to find K basis combinations, which enjoy a small reconstruction error t. [sent-127, score-0.315]

59 Each γji indicates whether or not the di tγh combination Si is chosen for data xj . [sent-147, score-0.284]

60 is the corresponding coefficient set for representing xj with combination Si. [sent-148, score-0.284]

61 =The { objective fiurenc thtioant monalykes o neaec hco training ocnub ise x constructible by at least one basis combination in S. [sent-152, score-0.334]

62 Tohnset sreuccotibndle goal its etoa mst oaknee th baes tiost aclo mnubminbaetir oKn ionf cSo. [sent-153, score-0.088]

63 It is natural and inevitable because a very large K could possibly make the reconstruction error t in Eq. [sent-155, score-0.237]

64 (2) always close to zero, even for abnormal events. [sent-156, score-0.318]

65 Optimization for Training Our two objectives contradict each other in a sense. [sent-159, score-0.035]

66 It automatically finds K while not wildly increasing the reconstruction error t. [sent-163, score-0.253]

67 In fact, error t for each training feature is upper bounded in our method. [sent-164, score-0.157]

68 We obtain a set of combinations with a small K by setting a reconstruction error upper bound λ uniformly for all elements in S. [sent-165, score-0.413]

69 If the reconstruction error for each feature iasl ls emleamlleern ttsh ainn Sλ,. [sent-166, score-0.257]

70 Ithf eth coding rsetrsuucltt oisn nw erirtho good quality. [sent-167, score-0.063]

71 In each pass, we update only one combination, making it represent as many training data as possible. [sent-180, score-0.112]

72 This process can quickly find the dominating combinations encoding important and most common features. [sent-181, score-0.203]

73 Remaining training cube features that cannot be well represented by this combination are sent to the next round to gather residual maximum commonness. [sent-182, score-0.244]

74 This process ends until all training data are computed and bounded. [sent-183, score-0.059]

75 The size of combination K reflects how informative the training data are. [sent-184, score-0.188]

76 Specifically, in the ith pass, given the leftover training data Xc ⊆ X that cannot be represented by previous dcaotmabi Xnati⊆ons X{S t1h , . [sent-185, score-0.103]

77 It is easy to prove that this cost satisfiiess hcoen idnidteioxn s (e3t) f aonrd X the resulting Si can represent most data. [sent-199, score-0.088]

78 22 − λ < 0, complying with cosnhdoituilodn b b(e3) 1. [sent-205, score-0.044]

79 (4) by dividing it into two steps to iteratively update {Sis , β} and γ using th ite i following procedure. [sent-207, score-0.053]

80 ∈Ωc Following the traditional procedure, we optimize β while = fixing Si for all γji 0 and then optimize Si using blockcoordinate descent =[14 0]. [sent-214, score-0.13]

81 denotes projecting the basis to a unit icso sleutm ton 1. [sent-221, score-0.109]

82 Therefore, the total energy for L(β, Si) decreases in each iteration, guaranteeing convergence. [sent-225, score-0.037]

83 22< λ (9) The update step guarantees condition (3). [sent-234, score-0.053]

84 2722 Algorithm 1 Training for Sparse Combination Learning Algorithm 1Training for Sparse Combination Learning Input: X, current training features Xc= X iInniptiualt:iz eX S, c u=r r∅e nant dtr aii =nin 1g repeat repeat Optimize {Si, β} with Eqs. [sent-235, score-0.251]

85 (z5e) { converges Add Si to set S Removet computed features xj with γji = 0 from Xc i= i+ 1 until Xc = ∅ Ounuttipl Xut: S= Output: S Algorithm Summary and Analysis In each pass, we learn one Si. [sent-239, score-0.155]

86 We repeat this process to obtain a few combinations until the training data set Xc is empty. [sent-240, score-0.299]

87 The initial dictionary Si in each pass is calculated by clustering training data Xc via K-means with s centers. [sent-243, score-0.237]

88 iOniunrg algorithm is controlled by λ, the upper bound of reconstruction errors. [sent-244, score-0.153]

89 Our approach is expressive because all training normal event patterns are represented with controllable reconstruction errors under condition (3). [sent-246, score-0.494]

90 Testing With the learned sparse combinations S = {S1 . [sent-251, score-0.287]

91 SK}, in tWhei testing phase w spithar new dmatbai x, we sch Sec =k i {fS there exist}s, a combination in S fitting the reconstruction error upper bao cuonmd. [sent-254, score-0.535]

92 b nIta can i bne quickly a cthheie rveecdo by checking tohre u lpepaestr square error for each Si: mβini? [sent-255, score-0.118]

93 To further simplify computation, we d×e pfine id an auxiliary xm. [sent-271, score-0.039]

94 (13) Reconstruction error for Si is accordingly ? [sent-273, score-0.057]

95 If it is small, x is regarded as a noirsm aaclc oervdeinngt pattern. [sent-276, score-0.037]

96 The final testing scheme is summarized in Algorithm 2. [sent-277, score-0.101]

97 Algorithm 2 Testing with Sparse Combinations Algorithm 2 Testing with Sparse Combinations Input: x, auxiliary matrices {R1,. [sent-278, score-0.039]

98 ,RK} and threshoInldp uTt for j = 1→ K do rif j ? [sent-281, score-0.044]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ji', 0.469), ('abnormal', 0.318), ('si', 0.316), ('combinations', 0.203), ('sitsi', 0.2), ('xj', 0.155), ('xc', 0.148), ('combination', 0.129), ('event', 0.127), ('abnormality', 0.116), ('normal', 0.115), ('reconstruction', 0.112), ('basis', 0.109), ('surveillance', 0.101), ('testing', 0.101), ('sparsity', 0.1), ('rp', 0.096), ('pass', 0.094), ('sparse', 0.084), ('dictionary', 0.084), ('ij', 0.083), ('events', 0.078), ('inevitable', 0.068), ('sk', 0.067), ('decent', 0.064), ('coding', 0.063), ('ses', 0.062), ('square', 0.061), ('demand', 0.061), ('training', 0.059), ('error', 0.057), ('cube', 0.056), ('happen', 0.055), ('videos', 0.054), ('update', 0.053), ('phase', 0.051), ('desktop', 0.051), ('sis', 0.049), ('realtime', 0.048), ('redundancy', 0.048), ('adopting', 0.046), ('hcoen', 0.044), ('aatniodn', 0.044), ('aclo', 0.044), ('aonrd', 0.044), ('baes', 0.044), ('blockcoordinate', 0.044), ('combi', 0.044), ('complying', 0.044), ('iasl', 0.044), ('ithf', 0.044), ('iuc', 0.044), ('jianping', 0.044), ('leftover', 0.044), ('nant', 0.044), ('okn', 0.044), ('predicament', 0.044), ('rif', 0.044), ('spithar', 0.044), ('ttsh', 0.044), ('patterns', 0.044), ('optimize', 0.043), ('finds', 0.043), ('upper', 0.041), ('boring', 0.041), ('frustratingly', 0.041), ('htheree', 0.041), ('rq', 0.041), ('wildly', 0.041), ('matlab', 0.04), ('reaches', 0.04), ('auxiliary', 0.039), ('bases', 0.039), ('dfo', 0.039), ('ars', 0.039), ('contrarily', 0.039), ('jiaya', 0.039), ('postpone', 0.039), ('sil', 0.039), ('ip', 0.038), ('detection', 0.038), ('verified', 0.038), ('repeat', 0.037), ('regarded', 0.037), ('pc', 0.037), ('hco', 0.037), ('guaranteeing', 0.037), ('gq', 0.037), ('enjoy', 0.037), ('aii', 0.037), ('controllable', 0.037), ('cubes', 0.037), ('deviated', 0.037), ('dtr', 0.037), ('reducing', 0.036), ('effort', 0.036), ('accelerates', 0.035), ('coarsely', 0.035), ('contradict', 0.035), ('faithful', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 34 iccv-2013-Abnormal Event Detection at 150 FPS in MATLAB

Author: Cewu Lu, Jianping Shi, Jiaya Jia

Abstract: Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140∼150 frames per soenc obnednc on average wsehtesn a computing on an ordinary desktop PC using MATLAB.

2 0.12812781 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

Author: Hua Wang, Feiping Nie, Weidong Cai, Heng Huang

Abstract: Representing the raw input of a data set by a set of relevant codes is crucial to many computer vision applications. Due to the intrinsic sparse property of real-world data, dictionary learning, in which the linear decomposition of a data point uses a set of learned dictionary bases, i.e., codes, has demonstrated state-of-the-art performance. However, traditional dictionary learning methods suffer from three weaknesses: sensitivity to noisy and outlier samples, difficulty to determine the optimal dictionary size, and incapability to incorporate supervision information. In this paper, we address these weaknesses by learning a Semi-Supervised Robust Dictionary (SSR-D). Specifically, we use the ℓ2,0+ norm as the loss function to improve the robustness against outliers, and develop a new structured sparse regularization com, , tom. . cai@sydney . edu . au , heng@uta .edu make the learning tasks easier to deal with and reduce the computational cost. For example, in image tagging, instead of using the raw pixel-wise features, semi-local or patch- based features, such as SIFT and geometric blur, are usually more desirable to achieve better performance. In practice, finding a set of compact features bases, also referred to as dictionary, with enhanced representative and discriminative power, plays a significant role in building a successful computer vision system. In this paper, we explore this important problem by proposing a novel formulation and its solution for learning Semi-Supervised Robust Dictionary (SSRD), where we examine the challenges in dictionary learning, and seek opportunities to overcome them and improve the dictionary qualities. 1.1. Challenges in Dictionary Learning to incorporate the supervision information in dictionary learning, without incurring additional parameters. Moreover, the optimal dictionary size is automatically learned from the input data. Minimizing the derived objective function is challenging because it involves many non-smooth ℓ2,0+ -norm terms. We present an efficient algorithm to solve the problem with a rigorous proof of the convergence of the algorithm. Extensive experiments are presented to show the superior performance of the proposed method.

3 0.11862504 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration

Author: Chenglong Bao, Jian-Feng Cai, Hui Ji

Abstract: In recent years, how to learn a dictionary from input images for sparse modelling has been one very active topic in image processing and recognition. Most existing dictionary learning methods consider an over-complete dictionary, e.g. the K-SVD method. Often they require solving some minimization problem that is very challenging in terms of computational feasibility and efficiency. However, if the correlations among dictionary atoms are not well constrained, the redundancy of the dictionary does not necessarily improve the performance of sparse coding. This paper proposed a fast orthogonal dictionary learning method for sparse image representation. With comparable performance on several image restoration tasks, the proposed method is much more computationally efficient than the over-complete dictionary based learning methods.

4 0.1161482 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition

Author: Zhuoyuan Chen, Ying Wu

Abstract: Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionaryfrom clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, , further analysis reveals the connection between our approach and the “partial” dictionary learning approach, updating only part of the prototypes (or informative codewords) with remaining (or noisy codewords) fixed. Experiments on synthetic data as well as real applications have shown satisfactory per- formance of this new robust dictionary learning approach.

5 0.11162654 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints

Author: Yifan Zhang, Qiang Ji, Hanqing Lu

Abstract: In complex scenes with multiple atomic events happening sequentially or in parallel, detecting each individual event separately may not always obtain robust and reliable result. It is essential to detect them in a holistic way which incorporates the causality and temporal dependency among them to compensate the limitation of current computer vision techniques. In this paper, we propose an interval temporal constrained dynamic Bayesian network to extendAllen ’s interval algebra network (IAN) [2]from a deterministic static model to a probabilistic dynamic system, which can not only capture the complex interval temporal relationships, but also model the evolution dynamics and handle the uncertainty from the noisy visual observation. In the model, the topology of the IAN on each time slice and the interlinks between the time slices are discovered by an advanced structure learning method. The duration of the event and the unsynchronized time lags between two correlated event intervals are captured by a duration model, so that we can better determine the temporal boundary of the event. Empirical results on two real world datasets show the power of the proposed interval temporal constrained model.

6 0.10157552 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition

7 0.10097216 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps

8 0.099083386 81 iccv-2013-Combining the Right Features for Complex Event Recognition

9 0.098053455 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?

10 0.091999859 127 iccv-2013-Dynamic Pooling for Complex Event Recognition

11 0.091869973 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM

12 0.089814954 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection

13 0.08939226 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition

14 0.083642676 98 iccv-2013-Cross-Field Joint Image Restoration via Scale Map

15 0.081466511 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding

16 0.081457786 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

17 0.080647647 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects

18 0.079195969 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

19 0.079006359 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces

20 0.078093156 372 iccv-2013-Saliency Detection via Dense and Sparse Reconstruction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.184), (1, 0.043), (2, -0.034), (3, 0.023), (4, -0.111), (5, -0.001), (6, -0.007), (7, -0.027), (8, -0.03), (9, -0.066), (10, -0.075), (11, -0.041), (12, -0.044), (13, 0.09), (14, -0.092), (15, -0.015), (16, 0.018), (17, 0.059), (18, 0.003), (19, 0.048), (20, 0.022), (21, -0.009), (22, 0.022), (23, 0.025), (24, -0.012), (25, 0.022), (26, 0.028), (27, 0.03), (28, 0.052), (29, -0.043), (30, 0.009), (31, 0.029), (32, -0.001), (33, -0.021), (34, 0.035), (35, 0.03), (36, -0.013), (37, 0.037), (38, 0.03), (39, 0.009), (40, -0.035), (41, 0.002), (42, -0.112), (43, 0.012), (44, -0.066), (45, 0.003), (46, -0.005), (47, -0.007), (48, -0.01), (49, -0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94899148 34 iccv-2013-Abnormal Event Detection at 150 FPS in MATLAB

Author: Cewu Lu, Jianping Shi, Jiaya Jia

Abstract: Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140∼150 frames per soenc obnednc on average wsehtesn a computing on an ordinary desktop PC using MATLAB.

2 0.64647514 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition

Author: Zhuoyuan Chen, Ying Wu

Abstract: Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionaryfrom clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, , further analysis reveals the connection between our approach and the “partial” dictionary learning approach, updating only part of the prototypes (or informative codewords) with remaining (or noisy codewords) fixed. Experiments on synthetic data as well as real applications have shown satisfactory per- formance of this new robust dictionary learning approach.

3 0.63575935 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints

Author: Yifan Zhang, Qiang Ji, Hanqing Lu

Abstract: In complex scenes with multiple atomic events happening sequentially or in parallel, detecting each individual event separately may not always obtain robust and reliable result. It is essential to detect them in a holistic way which incorporates the causality and temporal dependency among them to compensate the limitation of current computer vision techniques. In this paper, we propose an interval temporal constrained dynamic Bayesian network to extendAllen ’s interval algebra network (IAN) [2]from a deterministic static model to a probabilistic dynamic system, which can not only capture the complex interval temporal relationships, but also model the evolution dynamics and handle the uncertainty from the noisy visual observation. In the model, the topology of the IAN on each time slice and the interlinks between the time slices are discovered by an advanced structure learning method. The duration of the event and the unsynchronized time lags between two correlated event intervals are captured by a duration model, so that we can better determine the temporal boundary of the event. Empirical results on two real world datasets show the power of the proposed interval temporal constrained model.

4 0.62760538 167 iccv-2013-Finding Causal Interactions in Video Sequences

Author: Mustafa Ayazoglu, Burak Yilmaz, Mario Sznaier, Octavia Camps

Abstract: This paper considers the problem of detecting causal interactions in video clips. Specifically, the goal is to detect whether the actions of a given target can be explained in terms of the past actions of a collection of other agents. We propose to solve this problem by recasting it into a directed graph topology identification, where each node corresponds to the observed motion of a given target, and each link indicates the presence of a causal correlation. As shown in the paper, this leads to a block-sparsification problem that can be efficiently solved using a modified Group-Lasso type approach, capable of handling missing data and outliers (due for instance to occlusion and mis-identified correspondences). Moreover, this approach also identifies time instants where the interactions between agents change, thus providing event detection capabilities. These results are illustrated with several examples involving non–trivial interactions amongst several human subjects. 1. Introduction and Motivation The problem of identifying causal interactions amongst targets in a video sequence has been the focus of considerable attention in the past few years. A large portion of the existing body of work in this field uses human annotated video to build a storyline that includes both recognizing the activities involved and the causal relationships between them (see for instance [10] and references therein). While these methods are powerful and work well when suitably annotated data is available, annotating video clips is expensive and parsing relevant actions requires domain knowledge which may not be readily available. Indeed, in many situations, unveiling potentially hidden causal relationships is a first step towards building such knowledge. In this paper we consider the problem of identifying causal interactions amongst targets, not necessarily human, ∗This work was supported by NSF grants IIS–0713003, IIS-1318145, and ECCS–0901433, AFOSR grant FA9559–12–1–0271, and the Alert DHS Center of Excellence under Award Number 2008-ST-061-ED0001 . from unannotated video sequences and without prior domain knowledge. Our approach exploits the concept of “Granger Causality” [9], that formalizes the intuitive idea that ifa time series {x(t)} is causally related to a second one {thya(tt)if}a, ttihmene knowledge }oifs tchaeu past vrealluateesd otfo a{yse}c1to should l{eya(dt t}o, a ebnett kern prediction o thf efu ptuasret vvaalulueess ooff {{yx}}tt+k. In [l1ea4d], Pora ab bheatktearr eprt.e aicl.t successfully vuasleude a frequency domain reformulation of this concept to uncover pairwise interactions in scenarios involving repeating events, such as social games. This technique was later extended in [17] to model causal correlations between human joints and applied to the problem of activity classification. However, since this approach is based upon estimating the crosscovariance density function between events, it cannot handle situations where these events are non repeating, are too rare to provide an accurate estimate, or where these estimates are biased by outliers or missing data. Further, estimating a pairwise measure of causal correlation requires a spectral factorization of the cross-covariance, followed by numerical integration and statistical thresholding, limiting the approach to moderately large problems. To circumvent these problems, in this paper we propose an alternative approach based upon recasting the problem into that of identifying the topology of a sparse (directed) graph, where each node corresponds to the time traces of relevant features of a target, and each link corresponds to a regressor. The situation is illustrated in Fig. 1 using as an example the problem of finding causal relations amongst 4 tennis players, leading to a graph with 4 nodes, and potentially 12 (directed) links. Note that in general, the problem of identifying causal relationships is ill posed (unless one wants to identify the set of all individuals that could possibly have causal connections), due to the existence of secondary interactions. To illustrate this point, consider a very simplistic scenario with three actors A, B, and C, where A copies (with some delay) the actions of B, which in turn mimics C, also with some delay. In this situation, the ac- tions of A can be explained in terms of either those of B delayed one time sample, or those of C delayed by two samples. Thus, an algorithm based upon a statistical analysis 33556758 would identify a causal connection between A and C, even though there is no direct link between them. Further, if the actions of C can be explained by some simple autoregressive model of the form: = C(t) ?aiC(t − i) then it follows that the acti?ons of A can be explained by the same model, e.g. = A(t) ?aiA(t − i) Hence, multiple graphs topologies, some of which include self-loops, can explain the same set of time-series. On the other hand, note that in this situation, the sparsest graph (in the sense of having the fewest links) is the one that correctly captures the causality relations: the most direct cause of A is B and that of B is C, with C potentially being explained by a self-loop. To capture this feature and regularize the problem, in the sequel we will seek to find the sparsest graph, in the sense of having the least number of interconnections, that explains the observed data, reflecting the fact that, when alternative models are possible, often the most parsimonious is the correct one. Our main result shows that the problem of identifying sparse graph structures from observed noisy data can be reduced to a convex optimization problem (via the use of Group Lasso type arguments) that can be efficiently solved. The advantages of the proposed methods are: • • • • Its ability to handle complex scenarios involving nonrepeating events, een cvoimropnlmeexn stcael changes, clvoillnegct nioonnsof targets that do not necessarily split into well defined groups, outliers and missing data. The ability to identify the sparsest interaction structure tThhaet explains th idee nobtifseyr tvheed s dpaartas e(stthu inst avoiding labeling as causal connections those indirect correlations mediated only by an intermediary), together with a sparse “indicator” function whose support set indicates time instants where the interactions between agents change. Since the approach is not based on semantic analysis, iSt can bt hee applied ctoh ti she n moto btiaosne dof o arbitrary targets, sniost, necessarily humans (indeed, it applies to arbitrary time series including for instance economic or genetic data). From a computational standpoint, the resulting optiFmriozmatio an c problems nhaalve s a specific fthoerm re asmuletinnagbl oep ttiobe solved by a class of iterative algorithms [5, 3], that require at each step only a combination of thresholding and least-squares approximations. These algorithms have been shown to substantially outperform conventional convex-optimization solvers both in terms of memory and computation time requirements. The remainder of the paper is organized as follows. In section 2 we provide a formal reformulation of the problem of finding causal relationships between agents as a sparse graph identification problem. In section 3, we show that this problem can be efficiently solved using a re-weighted Group Lasso approach. Moreover, as shown there, the resulting problem can be solved one node at a time using first order methods, which allows for handling situations involving a large number of agents. Finally, the effectiveness of the proposed method is illustrated in section 4 using both simple scenarios (for which ground truth is readily available) and video clips of sports, involving complex, nonrepeating interactions amongst many agents. Figure 1. Finding causal interactions as a graph identification problem. Top: sample frame from a doubles tennis sequence. Bottom: Representation of this sequence as a graph, where each node represents the time series associated with the position of each player and the links are vector regressive models. Causal interactions exist when one of the time series can be explained as a combination of past values of the others. 2. Preliminaries For ease of reference, in this section we summarize the notation used in the paper and give a formal definition of the problem under consideration. 2.1. Notation (M) ?M? ??MM??F ?M?1 ?M?o σi ∗ ◦ ith largest singular value of the matrix M. nuclear norm: ?M? ?i σ?i (M). Fnruocbleeanrio nours norm: ??M?2F? ?i,j Mi2j ?1 norm: ?M? 1 ?i,j |Mij? ?|. ?o quasi-norm: ?M?o number of non-zero ?eleme?nMts i?n M. Hadamard product of matrices: (A ◦ ∗ =.: =. =. =. B)i,j = Ai,jBi,j. 33556769 2.2. Statement of the Problem Next, we formalize the problem under consideration. Consider a scenario with P moving agents, and denote by the 3D homogenous coordinates of the pth individual at time t. Motivated by the idea of Granger Causality, we will say that the actions of this agent depend causally from those in a set Ip (which can possibly contain p itself), if can be written as: Q˜p(t) Q˜p(t) Q˜p(t) ?N = ? ?ajp(n)Q˜j(t − n) +˜ η p(t) +˜ u p(t) (1) j? ?∈Ip ?n=0 Here ajp are unknown coefficients, and ˜η p(t) and up(t) represent measurement noise and a piecewise constant signal that is intended to account for relatively rare events that cannot be explained by the (past) actions of other agents. Examples include interactions of an agent with the environment, for instance to avoid obstacles, or changes in the interactions between agents. Since these events are infrequent, we will model as a signal that has (component-wise) a sparse derivative. Note in passing that since (1) involves homogeneous coordinates, the coefficients aj,p(.) satisfy the following constraint1 u ?N ? ?ajp(n) j? ?∈Ip ?n=0 =1 (2) Our goal is to identify causal relationships using as data 2D measurements qp(t) in F frames of the affine projections of the 3D coordinates Q˜p(t) of the targets. Note that, under the affine camera assumption, the 2D coordinates are related exactly by the same regressor parameters [2]. Thus, (1) holds if and only if: ?N qp(t) = ? ?ajp(n)qj(t − n) + u˜ p(t) + ηp(t) (3) j?∈Ip ?n=0 In this context, the problem can be precisely stated as: Given qp(t) (in F number of frames) and some a-priori bound N on the order of the regressors (that is the “memory” of the interactions), find the sparsest set of equations of the form (3) that explains the data, that is: aj,pm,ηinp,up?nIp (4) subject to? ?(2) and: = ? ?ajp(n)qj(t − n) + ?N qp(t) j? ?∈Ip ?n=0 up(t) + ηp(t) , p = 1 . . . , P and t = 1, ..F 1This follows by considering the third coordinate in (1) (5) where nIp denotes the cardinality of the set Ip. Rewriting (5) in matrixd efnoormtes yields: [xp; yp] = [Bp, I][apTuxTpuyTp]T + ηp (6) where qp(t) up(t) ηp(t) xp yp ap aip uxp uyp Bp Xp = [xp(t)Typ(t)T]T = [uTxp(t)uyTp(t)]T = [ηxp(t)Tηyp(t)T]T = = [xp(F)xp(F − 1)...xp(1)]T = [yp(F)yp(F − 1)...yp(1)]T [aT1p, a2Tp, ..., aTPp]T = [aip(0), aip(1), ..., aip(N)]T = [uxp(F)uxp(F−1)...uxp(1)]T = [uyp(F)uyp(F−1)...uyp(1)]T = = [Xp; Yp] [hankel(x1 , N) , ..., hankel(xP, N)] Yp = [hankel(y1, N), ..., hankel(yP, N)] and where, for a sequence z(t), hankel(z, N) denotes its associated Hankel matrix: hankel(z, N) = Itfolw⎛⎜⎝ sz t(hNzFa(t. +−a)d1 2e)scrzip(tF io(N. n− )o231f)al· t h· einzt(Frac−zti(.o1N.n)s−a)m12o)⎟ ⎞⎠ ngst uηaq= ? ηuqa1 T ,ηqau2 T ,ηaqu3 T ,· ·, ηauqP T ? T (8) Thus,inthBisc=on⎢⎣⎡teBx0t.1,theB0p.r2ob·le.·m·ofB0 i.nPte⎦⎥r ⎤estcanbeforagents (that is the complete graph structure) is captured by a matrix equation of the form: q = [B, I][aTuT]T + η (7) where and malized as finding the block–sparsest solution to the set of linear equations (2) and (7). 33557770 The problem of identifying a graph structure subject to sparsity constraints, has been the subject of intense research in the past few years. For instance, [1] proposed a Lasso type algorithm to identify a sparse network where each link corresponds to a VAR process. The main idea underlying this method is to exploit the fact that penalizing the ?1 norm of the vector of regression coefficients tends to produce sparse solutions. However, enforcing sparsity of the entire vector of regressor coefficients does not necessarily result in a sparse graph structure, since the resulting solution can consist of many links, each with a few coefficients. This difficulty can be circumvented by resorting to group Lasso type approaches [18], which seek to enforce block sparsity by using a combination of ?1 and ?2 norm constraints on the coefficients of the regressor. While this approach was shown to work well with artificial data in [11], exact recovery of the underlying network can be only guaranteed when the data satisfies suitable “incoherence” type conditions [4]. Finally, a different approach was pursued in [13], based on the use of a modified Orthogonal Least Squares algorithm, Cyclic Orthogonal Least Squares. However, this approach requires enforcing an a-priori limit on the number of links allowed to point to a single node, and such information may not be readily available, specially in cases where this number has high variability amongst nodes. To address these difficulties, in the next section we develop a convex optimization based approach to the problem of identifying sparse graph structures from observed noisy data. This method is closest in spirit to that in [11], in the sense that it is also based on a group Lasso type argument. The main differences consist in the ability to handle the unknown inputs up(t), needed to model exogenous disturbances affecting the agents, and in a reformulation of the problem, that allows for using a re-weighted iterative type algorithm, leading to substantially sparser solutions, even when the conditions in [4] fail. 3. Causality Identification Algorithm In this section we present the main result of this paper, an algorithm to search for block-sparse solutions to (7). For each fixed p, the algorithm searches for sparse solutions to (6) by solving (iteratively) the following problem (suggested by the re-weighted heuristic proposed in [7]) ?P ap,muxipn,uypi?=1wja(?aip?2) + λ??diag(wu)[Δuxp;Δuyp]??1 subject to: ?ηp ? ≤ p = 1, . . , P. ∞ ?P ?, ?N ??aip(n) i?= ?1 ?n=0 ?. = 1, p = 1,...,P. (9) where [Δuxp ; Δuyp] represents the first order differences of the exogenous input vector [uxp ; uyp], Wa and Wu are weighting matrices, and λ is a Lagrange multiplier that plays the role of a tuning parameter between graph sparsity and event sensitivity. Intuitively, for a fixed set of weights w, the algorithm attempts to find a block sparse solution to (6) and a set of sparse inp?uts Δuxp ; Δuyp , by exploiting the facts that minimizing ?i ?aip ?2 (the ?2,1 norm of the vector sequence {aip}) te?nds? tao m?aximize block-sparsity [18], while minimizing et?nhed s? 1t norm mmaizxeim blizoceks sparsity [ [1168]]. wOhniclee t mheisnesolutions are found, the weights w are adjusted to penalize those elements of the sequences with small values, so that in the next iteration solutions that set these elements to zero (hence further increasing sparsity) are favored. Note however, that proceeding in this way, requires solving at each iteration a problem with n = P(Pnr + F) variables, where P and F denote the number of agents and frames, respectively, and where nr is a bound on the regressor order. On the other hand, it is easily seen that both the objective function and the constraints in (9) can be partitioned into P groups, with the pth group involving only the variables related to the pth node. It follows then that problem (9) can be solved by solving P smaller problems of the form: ?P ap,muxipn,uypi?=1wja(?aip?2) + λ??diag(wu)[Δuxp;Δuyp]??1 ?P subject to: ?ηp?∞ ?N ≤ ? and ??aip(n) i?= ?1 ?n=0 leading to the algorithm given below: =1 (10) Algorithm 1: REWEIGHTEDCAUSALITYALGORITHM for each p wa = [1, 1, ..., 1] = [1, 1, ..., 1] S > 1(self loop weight) s = [1, 1, ..., S, ..., 1] (p’th element is S) while not converged do 1. solve (9) 2. wja = 1/( ?aip ?2 + δ) 3. wja = wja ◦ s (Penalization self loops) 4. = 1./(abs([Δuxp ; Δuyp]) + δ) end while 5. At this point ajp(.) , Ip and up(t) have been identified end for wu wu It is worth emphasizing that, since the computational complexity of standard interior point methods grows as n3, solving these smaller P problems leads to roughly a O(P2) 33557781 reduction in computational time over solving a single, larger optimization. Thus, this approach can handle moderately large problems using standard, interior-point based, semidefinite optimization solvers. Larger problems can be accommodated by noting that the special form of the objective and constraints allow for using iterative Augmented La- grangian Type Methods (ALM), based upon computing, at each step, the closed form solution to suitable intermediate optimization problems. While a complete derivation of such an algorithm is beyond the scope of this paper, using results from [12] it can be shown that each step requires only a combination of thresholding and least-squares approximations. Moreover, it can be shown that such an algorithm converges Q-superlinearly. 4. Handling Outliers and Missing Data The algorithm outlined above assumes an ideal situation where the data matrix B is perfectly known. However, in practice many of its elements may be outliers (due to misidentified correspondences) or missing (due to occlusion). As we briefly show next, these situations can be efficiently handled by performing a structured robust PCA step [3] to obtain a “clean” data matrix, prior to applying Algorithm 1. From equation (6) it follows that, in the absence of exogenous inputs and noise: ?xy11.. . .yxPP? = ?XY11.. . .YXPP? ?a1...aP? (11) Since xi ∈ {col(Xj)} and yi ∈ {col(Yj }), it follows that the sets {∈co {l(cXoli(X)} a)n}d a n{dco yl(Y∈i) { }c? are self-ex?pressive, or, ?equivalently?, Xthe }ma atnridce {sc oXl( =.) }? aXre1 . . . fX-eNxp? eanssdiv eY, ?Y1 ...YN? are mraantkri cdeesfic Xient. ?Consider no?w the case =.r, w?here some ?elements xi, yi of X and Y are missing. From ?the self-expressive property ooff {Xco aln(Xd Yi)} a raen dm i{scsoinlg(Y. Fi)ro} mit tfhoello swelsf tehxaptr ethsessieve missing eyle omf {encotsl are given by: xi = argmin rank(X) , yi x = argmin rank(Y) (12) y Similarly, in the presence of outliers, X, Y can be decomposed irnlyto, itnhe t sum oesfe a lcoew o fra onkut mlieartsr,ix X (,thYe ccalenan b eda dtae)c oamnda sparse one (the outliers) by solving a problem of the form minrank?YXoo?+ λ????EEYX????os. t.: ?XYoo?+?EEYX?=?YX? From the reasoning? abov?e it follows that in the presence of noise and exogenous outputs, the clean data record can be recovered from the corrupted, partial measurements by solving the following optimization problem: s+muλibn3je? ? ? cYXtM ot ? Y?X:∗◦ +Ξ λYX1? ? ? FM XY◦ E XY? ?1+λ2? ?M YX◦ Δ U YX? ?1 ?YX?=?XYoo?+?EEXY?+?UUYX?+?ΞΞYX? (13) where we have used the standard convex relaxations of rank and cardinality2. Here Ξ and U denote noise and piecewise constant exogenous matrices, ΔU denotes the matrix obtained by taking the difference between consecutive elements in U, and MX (MY) is a “mask” matrix, with mi,j = 0 if the element (i, j) in X ( Y) is missing, mi,j = 1 otherw=i0s e, i tuhseed e etom aenvtoi (di, penalizing )e lisem miesnstisn gin, mE, Ξ, U corresponding to missing data. Problem (13) is a structured robust PCA problem (due to the Hankel structure of X, Y) trhobatu can C bAe efficiently suoelv teod t using tkheel fsitrrsut oturrdeer o mf Xeth,oYd) proposed in [3], slightly modified to handle the terms containing ΔU. 5. Experimental Results In this section we illustrate the effectiveness of the proposed approach using several video clips (provided as supplemental material). The results of the experiments are displayed using graphs embedded on the video frames: An arrow indicates causal correlation between agents, with the point of the arrow indicating the agent whose actions are affected by the agent at its tail. The internal parameters of the algorithm were experimentally tuned, leading to the values ? = 0.1, = 0.05, self loop weights S = 10. The algorithm is fairly insensitive to the value of the regularization parameters and S, which could be adjusted up or down by an order of magnitude without affecting the structure of the resulting graph. Finally, we used regressor order N=2 for the first three examples and N=4 for the last one, a choice that is consistent with the frame rate and the complexity of λ λ the actions taking place in each clip. 5.1. Clips from the UT-Interaction Data Set We considered two video clips from the UT Human Interaction Data Set [15] (sequences 6 and 16). Figures 2 and 5 compare the results obtained applying the proposed algorithm versus Group Lasso (GL) [11] and Group Lasso combined with the reweighted heuristic described in (9) (GLRW). In all cases, the inputs to the algorithm were the (approximate) coordinates of the heads of each of the agents, normalized to the interval [−1, 1], artificially corrupted ,w niothrm m10al%iz eodut tloie trhs.e Notably, [t−he1 proposed algorithm 2As shown in [6, 8] under suitable conditions these relaxations the exact minimum rank solution. 33557792 recover Figure 2. Sample frames from the UT sequence 6 with the identified causal connections superimposed. Top: Proposed Method. Center: Reweighted Group Lasso. Bottom: Group Lasso. Only the proposed method identifies the correct connections. was able to correctly identify the correlations between the agents from this very limited amount of information, while the others failed to do so. Note in passing that in both cases none of the algorithms were directly applicable, due to some of the individuals leaving the field of view or being occluded. As illustrated in Fig. 3, the missing data was recovered by solving an RPCA problem prior to applying Algorithm 1. Finally, Fig. 4 sheds more insight on the key role played by the sparse signal u. As shown there, changes in u correspond exactly to time instants when the behavior of the corresponding agent deviates from the general pattern followed during most of the clip. Figure 3. Time traces of the individual heads in the UT sequence 6, artificially corrupted with 10 % outliers. The outliers were removed and the missing data due to targets leaving the field of view was estimated solving a modified RPCA problem. Frame number Figure 4. Sample (derivative sparse) exogenous signals in the UT sequence 6. The changes correspond to the instants when the second person starts moving towards the first, who remains stationary, and when the two persons merge in an embrace. Figure 5. Sample frames from the UT sequence 16. Top: Correct correlations identified by the Proposed Method. Center and Bottom: Reweighted Group Lasso and Group Lasso (circles indicate self-loops). 5.2. Doubles Tennis Experiment This experiment considers a non-staged real-life scenario. The data consists of 230 frames of a video clip from the Australian Open Tennis Doubles Final games. The goal here is to identify causal relationships between the different players using time traces of the respective centroid positions. Note that in this case the ground truth is not available. Nevertheless, since players from the same team usually look at their opponents and react to their motions, we expect a strong causality connection between members of 33557803 opposite teams. This intuition is matched by the correlations unveiled by the algorithm, shown in Fig. 6. The identified sparse input corresponding to the vertical direction is shown in Fig. 7 (similar results for the horizontal component are omitted due to space reasons.) Figure 6. Sample frames from the tennis sequence. Top: The proposed method correctly identifies interactions between opposite team members. Center: Reweighted Group Lasso misses the interaction between the two rear-most individuals of opposite teams, generating self loops instead (denoted by the disks). Bottom: Group Lasso yields an almost complete graph. Figure 7. Exogenous signal corresponding to the vertical axis for the tennis sequence. The change in one component corresponds to the instant when the leftmost player in the bottom team moves from the line towards the net, remaining closer to it from then on. 5.3. Basketball Game Experiment This experiment considers the interactions amongst players in a basketball game. As in the case ofthe tennis players, since the data comes from a real life scenario, the ground truth is not available. However, contrary to the tennis game, this scenario involves complex interactions amongst many players, and causality is hard to discern by inspection. Nevertheless, the results shown in Fig. 8, obtained using the position of the centroids as inputs to our algorithm, match our intuition. Firstly, one would expect a strong cause/effect connection between the actions of the player with the ball and the two defending opponents facing him. These connections (denoted by the yellow arrows) were indeed successfully identified by the algorithm. The next set of causal correlations is represented by the (blue, light green) and (black, white) arrow pairs showing the defending and the opponent players on the far side of the field and under the hoop. An important, counterintuitive, connection identified by the algorithm is represented by the magenta arrows be- tween the right winger of the white team with two of his teammates: the one holding the ball and the one running behind all players. While at first sight this connection is not as obvious as the others, it becomes apparent towards the end of the sequence, when the right winger player is signaling with a raised arm. Notably, our algorithm was able to unveil this signaling without the need to perform a semantic analysis (a very difficult task here, since this signaling is apparent only in the last few frames). Rather, it used the fact that the causal correlation was encapsulated in the dynamics of the relative motions of these players. 6. Conclusions In this paper we propose a new method for detecting causal interactions between agents using video data. The main idea is to recast this problem into a blind directed graph topology identification, where each node corresponds to the observed motion of a given target, each link indicates the presence of a causal correlation and the unknown inputs account for changes in the interaction patterns. In turn, this problem can be reduced to that of finding block-sparse solutions to a set of linear equations, which can be efficiently accomplished using an iterative re-weighted Group-Lasso approach. The ability of the algorithm to correctly identify causal correlations, even in cases where portions of the data record are missing or corrupted by outliers, and the key role played by the unknown exogenous input were illustrated with several examples involving non–trivial inter- actions amongst several human subjects. Remarkably, the proposed algorithm was able to identify both the correct interactions and the time instants when interactions amongst agents changed, based on minimal motion information: in all cases we used just a single time trace per person. This success indicates that in many scenarios, the dynamic information contained in the motion pattern of a single feature associated with a target is rich enough to enable identifying complex interaction patterns, without the need to track multiple features, perform a semantic analysis or use additional domain knowledge. 33557814 Figure 8. Sample frames from a Basketball game. Top: proposed method. Center: Reweighted Group the signaling player and his teammates. Bottom: Group Lasso yields an almost complete graph. Lasso misses the interaction between References [1] A. Arnold, Y. Liu, and N. Abe. Estimating brain functional connectivity with sparse multivariate autoregression. In Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 66–75, 2007. 4 [2] M. Ayazoglu, B. Li, C. Dicle, M. Sznaier, and O. Camps. Dynamic subspace-based coordinated multicamera tracking. In 2011 IEEE ICCV, pages 2462–2469, 2011. 3 [3] M. Ayazoglu, M. Sznaier, and O. Camps. Fast algorithms for structured robust principal component analysis. In 2012 IEEE CVPR, pages 1704–171 1, June 2012. 2, 5 [4] A. Bolstad, B. Van Veen, and R. Nowak. Causal network inference via group sparse regularization. IEEE Transactions on Signal Processing, 59(6):2628–2641, 2011. 4 [5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Dis- [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] tributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1):1–122, Jan. 2011. 2 E. Candes, X. Li, Y. Ma, and J.Wright. Robust principal component analysis? J. ACM, (3), 2011. 5 E. J. Candes, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted l1minimization. Journal of Fourier Analysis and Applications, 14(5):877–905, December 2008. 4 V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky. Rank-sparsity incoherence for matrix decomposition. Siam J. Optim., (2):572–596, 2011. 5 C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, pages 424–438l, 1969. 1 A. Gupta, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots: Learning a visually grounded storyline model from annotated videos. In 2009 IEEE CVPR, pages 2012–2019, 2009. 1 S. Haufe, G. Nolte, K. R. Muller, and N. Kramer. Sparse causal discovery in multivariate time series. In Neural Information Processing Systems, 2009. 4, 5 G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In ICML, pages 1663–670, 2010. 5 D. Materassi, G. Innocenti, and L. Giarre. Reduced complexity models in identification of dynamical networks: Links with sparsification problems. In 48th IEEE Conference on Decision and Control, pages 4796–4801, 2009. 4 K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg. Temporal causality for the analysis ofvisual events. In IEEE Conf Comp. Vision and Pattern Recog. (CVPR)., pages 1967– 1974, 2010. 1 M. S. Ryoo and J. K. Aggarwal. UT Interaction Dataset, ICPR contest on Semantic Description of Human Activities. http://cvrc.ece.utexas.edu/SDHA2010/Human Interaction.html, 2010. 5 [16] J. Tropp. Just relax: convex programming methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 52(3): 1030–1051, 2006. 4 [17] S. Yi and V. Pavlovic. Sparse granger causality graphs for human action classification. In 2012 ICPR, pages 3374–3377. 1 [18] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68(1):49–67, 2006. 4 33557825

5 0.6227957 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation

Author: Xiatian Zhu, Chen Change Loy, Shaogang Gong

Abstract: Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambiguity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capable of correlating visual data and independent non-visual auxiliary information to better describe and summarise subtlephysical events in complex scenes. Specifically, our unsupervised framework is capable of seamlessly uncovering latent correlations among heterogeneous types of data sources, despite the non-trivial heteroscedasticity and dimensionality discrepancy problems. Additionally, the proposed model is robust to partial or missing non-visual information. We demonstrate the effectiveness of our framework on two crowded public surveillance datasets.

6 0.61813867 45 iccv-2013-Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications

7 0.61415261 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM

8 0.60129917 20 iccv-2013-A Max-Margin Perspective on Sparse Representation-Based Classification

9 0.60046804 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

10 0.59606183 258 iccv-2013-Low-Rank Sparse Coding for Image Classification

11 0.59496087 4 iccv-2013-ACTIVE: Activity Concept Transitions in Video Event Classification

12 0.59098023 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?

13 0.58572686 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

14 0.57953578 127 iccv-2013-Dynamic Pooling for Complex Event Recognition

15 0.57613724 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition

16 0.55672765 163 iccv-2013-Feature Weighting via Optimal Thresholding for Video Analysis

17 0.55483967 14 iccv-2013-A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding

18 0.55305153 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration

19 0.55193001 401 iccv-2013-Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology

20 0.54769027 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.066), (7, 0.026), (12, 0.013), (26, 0.066), (31, 0.053), (35, 0.014), (41, 0.208), (42, 0.12), (48, 0.027), (64, 0.036), (73, 0.036), (89, 0.25)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90278721 257 iccv-2013-Log-Euclidean Kernels for Sparse Representation and Dictionary Learning

Author: Peihua Li, Qilong Wang, Wangmeng Zuo, Lei Zhang

Abstract: The symmetric positive de?nite (SPD) matrices have been widely used in image and vision problems. Recently there are growing interests in studying sparse representation (SR) of SPD matrices, motivated by the great success of SR for vector data. Though the space of SPD matrices is well-known to form a Lie group that is a Riemannian manifold, existing work fails to take full advantage of its geometric structure. This paper attempts to tackle this problem by proposing a kernel based method for SR and dictionary learning (DL) of SPD matrices. We disclose that the space of SPD matrices, with the operations of logarithmic multiplication and scalar logarithmic multiplication de?ned in the Log-Euclidean framework, is a complete inner product space. We can thus develop a broad family of kernels that satis?es Mercer’s condition. These kernels characterize the geodesic distance and can be computed ef?ciently. We also consider the geometric structure in the DL process by updating atom matrices in the Riemannian space instead of in the Euclidean space. The proposed method is evaluated with various vision problems and shows notable per- formance gains over state-of-the-arts.

same-paper 2 0.88104033 34 iccv-2013-Abnormal Event Detection at 150 FPS in MATLAB

Author: Cewu Lu, Jianping Shi, Jiaya Jia

Abstract: Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140∼150 frames per soenc obnednc on average wsehtesn a computing on an ordinary desktop PC using MATLAB.

3 0.87120342 157 iccv-2013-Fast Face Detector Training Using Tailored Views

Author: Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel

Abstract: Face detection is an important task in computer vision and often serves as the first step for a variety of applications. State-of-the-art approaches use efficient learning algorithms and train on large amounts of manually labeled imagery. Acquiring appropriate training images, however, is very time-consuming and does not guarantee that the collected training data is representative in terms of data variability. Moreover, available data sets are often acquired under controlled settings, restricting, for example, scene illumination or 3D head pose to a narrow range. This paper takes a look into the automated generation of adaptive training samples from a 3D morphable face model. Using statistical insights, the tailored training data guarantees full data variability and is enriched by arbitrary facial attributes such as age or body weight. Moreover, it can automatically adapt to environmental constraints, such as illumination or viewing angle of recorded video footage from surveillance cameras. We use the tailored imagery to train a new many-core imple- mentation of Viola Jones ’ AdaBoost object detection framework. The new implementation is not only faster but also enables the use of multiple feature channels such as color features at training time. In our experiments we trained seven view-dependent face detectors and evaluate these on the Face Detection Data Set and Benchmark (FDDB). Our experiments show that the use of tailored training imagery outperforms state-of-the-art approaches on this challenging dataset.

4 0.84778416 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary

Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu

Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.

5 0.82645768 220 iccv-2013-Joint Deep Learning for Pedestrian Detection

Author: Wanli Ouyang, Xiaogang Wang

Abstract: Feature extraction, deformation handling, occlusion handling, and classi?cation are four important components in pedestrian detection. Existing methods learn or design these components either individually or sequentially. The interaction among these components is not yet well explored. This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture1. By establishing automatic, mutual interaction among components, the deep model achieves a 9% reduction in the average miss rate compared with the current best-performing pedestrian detection approaches on the largest Caltech benchmark dataset.

6 0.81492758 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

7 0.81427753 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose

8 0.81386471 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework

9 0.81350893 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering

10 0.81345421 189 iccv-2013-HOGgles: Visualizing Object Detection Features

11 0.81274438 349 iccv-2013-Regionlets for Generic Object Detection

12 0.8126967 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs

13 0.81261849 436 iccv-2013-Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach

14 0.81259716 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

15 0.81258893 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation

16 0.81248271 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal

17 0.81239843 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

18 0.81221777 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

19 0.81213075 199 iccv-2013-High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination

20 0.81209403 444 iccv-2013-Viewing Real-World Faces in 3D