iccv iccv2013 iccv2013-76 iccv2013-76-reference knowledge-graph by maker-knowledge-mining

76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees


Source: pdf

Author: Aastha Jain, Shuanak Chatterjee, René Vidal

Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.


reference text

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9] igraph. http://igraph.sourceforge.net/. 6 Libsvx. http://www.cse.buffalo.edu/ jcorso/r/supervoxels/. 6 Python graph. http://code.google.com/p/python-graph/. 6 D. Batra and P. Kohli. Making the right moves: Guiding alpha-expansion using local primal-dual gaps. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1865–1872. IEEE, 2011. 3 E. Boros and P. L. Hammer. Pseudo-boolean optimization. Discrete Appl. Math., 123(1-3): 155–225, Nov. 2002. 5 Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(1 1): 1222–1239, 2001. 4, 6 G. J. Brostow, J. Fauqueur, and R. Cipolla. Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 2008. 2, 6 S. Chatterjee and S. Russell. A temporally abstracted Viterbi algorithm. In UAI, pages 96–104, 2011. 2, 4, 5 A. Chen and J. Corso. Propagating multi-class pixel labels throughout video frames. In Western New York Image Processing Workshop (WNYIPW), pages 14 –17, 2010. 1, 2, 6

[10] J. J.. Corso, A. Yuille, and Z. Tu. Graph-Shifts: Natural Image Labeling by Dynamic Hierarchical Computing. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2008. 3

[11] D. Cremers and S. Soatto. Motion competition: A variational framework for piecewise parametric motion segmentation. Int. Journal of Computer Vision, 62(3):249–265, 2005. 1

[12] T. Darrel and A. Pentland. Robust estimation of a multilayered motion representation. In IEEE Workshop on Visual Motion, pages 173–178, 1991. 1

[13] P. Felzenszwalb and D. Huttenlocher. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, 59(2): 167–181, 2004. 2, 3, 4, 6

[14] P. Felzenszwalb and D. Huttenlocher. Efficient belief propagation for early vision. International Journal of Computer Vision, 70:41–54, 2006. 10. 1007/s1 1263-006-7899-4. 2, 3

[15] G. Floros and B. Leibe. Joint 2d-3d temporally consistent semantic segmentation of street scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012. 1

[16] B. Fulkerson, A. Vedaldi, and S. Soatto. Class segmentation and object localization with superpixel neighborhoods. In IEEE Int. Conf. on Computer Vision, 2009. 3

[17] E. Galmar, T. Athanasiadis, B. Huet, and Y. S. Avrithis. Spatiotemporal semantic video segmentation. In MMSP, pages 574–579. IEEE Signal Processing Society, 2008. 1

[18] M. Grundmann, V. Kwatra, M. Han, and I. Essa. Efficient hierarchical graph-based video segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, 2010. 1, 2, 3, 6

[19] A. Jain, L. Zappella, P. McClure, and R. Vidal. Visual dictionary learning for joint object categorization and segmentation. In European Conference on Computer Vision, 2012. 3

[20] P. Kohli, L. Ladicky, and P. H. S. Torr. Robust higher order potentials for enforcing label consistency. In IEEE Conf. on Computer Vision and Pattern Recognition, 2008. 3

[21] M. Kumar and D. Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In Proceedings of the Twenty-fifth Conference on Uncertainty in AI (UAI), 2009. 2, 3

[22] L. Ladicky, C. Russell, P. Kohli, and P. Torr. Associative hierarchical CRFs for object class image segmentation. In IEEE Int. Conf. on Computer Vision, 2009. 2

[23] I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2-3):107–123, 2005. 6

[24] I. Laptev and T. Lindeberg. Space-time interest points. In IEEE International Conference on Computer Vision, pages 432–439, 2003. 6

[25] V. S. Lempitsky, A. Vedaldi, and A. Zisserman. A pylon model for semantic segmentation. In Neural Information Processing Systems, 2011. 2

[26] C. Raphael. Coarse-to-Fine Dynamic Programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23: 1379–1390, 2001. 2

[27] S. J. Russell and P. Norvig. Artificial Intelligence - A Modern Approach. Pearson Education, 3rd edition, 2010. 4

[28] J. Shi and J. Malik. Motion segmentation and tracking using normalized cuts. In IEEE Int. Conf. on Computer Vision, pages 1154–1 160, 1998. 1

[29] D. Singaraju and R. Vidal. Using global bag of features mod- els in random fields for joint categorization and segmentation of objects. In IEEE Conference on Computer Vision and Pattern Recognition, 2011. 3

[30] R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using PowerFactorization and GPCA. International Journal of Computer Vision, 79(1):85– 105, August 2008. 1 [3 1] C. Xu and J. Corso. Evaluation of super-voxel methods for early video processing. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. 1, 3 11887722