iccv iccv2013 iccv2013-442 iccv2013-442-reference knowledge-graph by maker-knowledge-mining

442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

Source: pdf

Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg

Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.

reference text

[1] M. Andriluka, S. Roth, and B. Schiele. People-tracking-by-detection and people-detection-by-tracking. In CVPR, 2008. 2

[2] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. Semantic segmentation using regions and parts. In CVPR, 2012. 1

[3] B. Babenko, M.-H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. PAMI, 33: 1619–1632, 2011. 1, 2

[4] X. Bai and G. Sapiro. Geodesic matting: A framework for fast interactive image and video segmentation and matting. IJCV, 82: 113– 132, 2009. 2

[5] C. Bibby and I. Reid. Real-time tracking of multiple occluding objects using level sets. In CVPR, pages 1307–1314, 2010. 2

[6] W. Brendel and S. Todorovic. Video object segmentation by tracking regions. In ICCV, pages 833 –840, 2009. 1, 2

[7] T. Brox and J. Malik. Object segmentation by long term analysis of point trajectories. In ECCV, 2010. 1, 2

[8] T. Brox and J. Malik. Large displacement optical flow: descriptor matching in variational motion estimation. PAMI, 33(3):500–5 13, 2011. 1

[9] I. Budvytis, V. Badrinarayanan, and R. Cipolla. Semi-supervised video segmentation using tree structured graphical models. In CVPR, pages 2257 –2264, 2011. 2

[10] J. Carreira, F. Li, and C. Sminchisescu. Object Recognition by Sequential Figure-Ground Ranking. IJCV, 2012. 1

[11] J. Carreira and C. Sminchisescu. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts. PAMI, 2012. 1, 3, 6

[12] H.-T. Cheng and N. Ahuja. Exploiting nonlocal spatiotemporal structure for video segmentation. In CVPR, 2012. 2

[13] D. Cremers. Dynamical statistical shape priors for level set-based tracking. PAMI, 28(8):1262 –1273, 2006. 2

[14] I. Endres and D. Hoiem. Category independent object proposals. In ECCV, pages 575–588, 2010. 1, 3

[15] M. Grundmann, V. Kwatra, M. Han, and I. Essa. Efficient hierarchical graph based video segmentation. In CVPR, 2010. 1, 2, 7

[16] Y. Huang, Q. Liu, and D. Metaxas. Video object segmentation by hypergraph cut. In CVPR, 2009. 2

[17] V. Kolmogorov, Y. Boykov, and C. Rother. Applications of parametric maxflow in computer vision. In ICCV, 2007. 1, 3

[18] J. Lee, S. Kwak, B. Han, and S. Choi. Online video segmentation by bayesian split-merge clustering. In ECCV, pages 856–869. 2012. 2

[19] Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012. 1

[20] Y. J. Lee, J. Kim, and K. Grauman. Key-segments for video object segmentation. In ICCV, 2011. 1, 2, 7

[21] M. Leordeanu, R. Sukthankar, and C. Sminchisescu. Efficient closedform solution to generalized boundary detection. In ECCV, 2012. 3

[22] F. Li, J. Carreira, G. Lebanon, and C. Sminchisescu. Composite statistical inference for semantic segmentation. In CVPR, 2013. 2, 5, 6

[23] F. Li, J. Carreira, and C. Sminchisescu. Object recognition as ranking holistic figure-ground hypotheses. In CVPR, 2010. 1, 3

[24] F. Li, G. Lebanon, and C. Sminchisescu. A linear approximation to the chi2 kernel with geometric convergence. Technical report, arXiv:1206.4074, 2013. 4, 6

[25] T. Ma and L. J. Latecki. Maximum weight cliques with mutex constraints for video object segmentation. In CVPR, 2012. 2

[26] H. Pirsiavash, D. Ramanan, and C. Fowlkes. Globally-optimal greedy algorithms for tracking a variable number of objects. In CVPR, 2011. 2, 7

[27] B. Price, B. Morse, and S. Cohen. Livecut: Learning-based interactive video segmentation by evaluation of multiple propagated cues. In ICCV, 2009. 2

[28] Y. Rathi, N. Vaswani, A. Tannenbaum, and A. Yezzi. Tracking deforming objects using particle filtering for geometric active contours.

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38] PAMI, 29(8): 1470–1475, 2007. 2 X. Ren and J. Malik. Tracking as repeated figure/ground segmentation. In CVPR, 2007. 2 D. Sun, S. Roth, and M. J. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. 6 D. Tsai, M. Flagg, and J. M.Rehg. Motion coherent tracking with multi-label mrf optimization. In BMVC, 2010. 1, 2, 6 M. Unger, M. Werlberger, T. Pock, and H. Bischof. Joint motion estimation and segmentation of complex scenes with label costs and occlusion modeling. In CVPR, 2012. 2 K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 9: 1582– 1596, 2010. 4 A. Vazquez-Reina, S. Avidan, H. Pfister, and E. Miller. Multiple hypothesis video segmentation from superpixel flows. In ECCV, 2010. 2 A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. PAMI, 34, 2012. 4 T. Wang and J. Collomosse. Probabilistic motion diffusion of labeling priors for coherent video segmentation. Multimedia, IEEE Transactions on, 14(2):389 –400, april 2012. 2 C. Xu, C. Xiong, and J. J. Corso. Streaming hierarchical video segmentation. In ECCV, 2012. 2 J. Yuen, B. Russell, C. Liu, and A. Torralba. Labelme video: Building a video database with human annotations. In ICCV, 2009. 2 2199