iccv iccv2013 iccv2013-439 iccv2013-439-reference knowledge-graph by maker-knowledge-mining

439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction

Source: pdf

Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou

Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.

reference text

Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: people detection and articulated pose estimation. In CVPR, 2009. 1

[1] M.

[2] Y. Boykov, O. Veksler, and R. Zabih. via graph cuts. TPAMI, 23(1 1):

[3] Fast approximate energy minimization 1222–1239, 2001. 5 T. Brox and J. Malik. Object segmentation by long term analysis of point tra- 2010. 1 W. Chu, F. Zhou, and F. D. Torre. Unsupervised temporal commonality discovery. In ECCV, 2012. 6, 7 jectories. In ECCV,

[4] 22223388 Basketba(l1) Sho tingFen(2c)ingHorse( 3R)idngJumpi(n4g) RopeLu(n5g)es Clean a(6n)d JerkBench(7 )Pres Sw(8i)ngSk(i9in)gSkate (B10oa)rding Figure 8. Results of ten video pair examples from the human action dataset. In each example, from top to bottom: two image frames from the pair, and the co-segmentation results. Blue denotes the background trajectories detected in the initial background subtraction step; green denotes the detected action outliers; red denotes the detected common action. The yellow bounding boxes are the given annotations that indicate the interesting regions. The corresponding tags of the videos are overlaid on the top of each example. multiple frames of the two input videos (separated by the bold black line in the middle) are arranged in time order. The active and the non-active frames are bordered in red and green respectively. The corresponding tags are overlaid on the top-left of each example.

[5] N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. In ECCV, 2006. 2, 3

[6] P. Dollor, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatiotemporal features. In VS-PETS, 2005. 2

[7] K. Fragkiadaki and J. Shi. Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement. In CVPR, 201 1. 2

[8] M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints. In ICCV, 2005. 4, 5

[9] H. Li and K. N. Ngan. A co-saliency model of image pairs. TIP, 20(12):3365– 3375, 2011. 2

[10] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001. 4

[11] P. Ochs and T. Brox. Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions. In ICCV, 2011. 6

[12] E. Rahtu, J. Kannala, M. Salo, and J. Heikkil a¨. Segmenting salient objects from images and videos. In ECCV, 2010. 1, 2

[13] M. Raptis, I. Kokkinos, and S. Soatto. Discovering discriminative action parts from mid-level video representations. In CVPR, 2012. 1, 4, 6

[14] K. K. Reddy and M. Shah. Recognizing 50 human action ctegories of web videos. Machine Vision and Applications Journal, 2012. 6

[15] C. Rother, V. Kolmogorov, T. Minka, and A. Blake. Cosegmentation of image pairs by histogram matching - incorporating a global constraint into mrfs. In

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26] CVPR, 2006. 2 J. C. Rubio, J. Serrat, and A. L ´opez. Video co-segmentation. In ACCV, 2012. 2, 6 S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In ICCV, 2009. 1 Y. Sheikh, O. Javed, and T. Kanade. Background subtraction for freely moving camera. In ICCV, 2009. 2 N. Sundaram, T. Brox, and K. Keutzer. Dense point trajectories by gpuaccelerated large displacement optical flow. In ECCV, 2008. 2, 7 F. Tiburzi, M. Escudero, J. Besc o´s, and J. M. M. Sanchez. A ground truth for motionbased video-object segmentation. In ICIP, 2008. 6 A. Toshev, J. Shi, and K. Daniilidis. Image matching via saliency region correspondences. In CVPR, 2007. 2 K. N. Tran, I. A. Kakadiaris, and S. H. Shah. Modeling motion of body parts for action recognition. In BMVC, 2011. 1 H. Wang, A. Kl¨ aser, C. Schmid, and C. Liu. Action recognition by dense trajectories. In ICCV, 2009. 2, 3 M. Wang, B. Ni, H. X., and T. Chua. Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Computing Surveys, 44(4), 2012. 1 L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In CVPR, 2011. 4 J. Yan and M. Pollefeys. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006. 2 22223399