iccv iccv2013 iccv2013-443 iccv2013-443-reference knowledge-graph by maker-knowledge-mining

443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation


Source: pdf

Author: Xiatian Zhu, Chen Change Loy, Shaogang Gong

Abstract: Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambiguity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capable of correlating visual data and independent non-visual auxiliary information to better describe and summarise subtlephysical events in complex scenes. Specifically, our unsupervised framework is capable of seamlessly uncovering latent correlations among heterogeneous types of data sources, despite the non-trivial heteroscedasticity and dimensionality discrepancy problems. Additionally, the proposed model is robust to partial or missing non-visual information. We demonstrate the effectiveness of our framework on two crowded public surveillance datasets.


reference text

[1] L. Breiman. Random forests. ML, 45(1):5–32, 2001. 3, 5

[2] L. Breiman, J. Friedman, C. Stone, and R. Olshen. Classification and regression trees. Chapman & Hall/CRC, 1984. 3

[3] A. Criminisi and J. Shotton. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends?R in Computer Graphics and Vision, 7(2-3):81–227, 2012. 3, 5

[4] R. Duin and M. Loog. Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion. TPAMI, 26(6):732 –739, 2004. 1

[5] P. F. Felzenszwalb, R. B. Girshick, D. A. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 32(9): 1627–1645, 2010. 5

[6] S. Feng, Z. Lei, D. Yi, and S. Z. Li. Online content-aware video condensation. In CVPR, pages 2082–2087, 2012. 1, 2

[7] D. Goldman, B. Curless, D. Salesin, and S. Seitz. Schematic storyboards for video editing and visualization. In SIGGRAPH, volume 25, pages 862–871, 2006. 2

[8] S. Gong, C. C. Loy, and T. Xiang. Security and surveillance. In

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] Visual Analysis of Humans, pages 455–472. Springer, 2011. 1 Y. Gong. Summarizing audiovisual contents of a video program. EURASIP J. Appl. Signal Process., 2003: 160–169, 2003. 2 H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen. Affinity aggregation for spectral clustering. In CVPR, pages 773–780, 2012. 2, 5 H. Kang, X. Chen, Y. Matsushita, and X. Tang. Space-time video montage. In CVPR, pages 133 1–1338, 2006. 1 Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, pages 1346–1353, 2012. 1, 2 B. Liu, Y. Xia, and P. S. Yu. Clustering through decision tree construction. In CIKM, pages 20–29, 2000. 3 Y.-F. Ma, L. Lu, H.-J. Zhang, and M. Li. A user attention model for video summarization. In ACM MM, pages 533–542, 2002. 7 T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. TPAMI, 24(7):971–987, 2002. 5 A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42: 145–175, 2001. 5 Y. Pritch, A. Rav-Acha, and S. Peleg. Nonchronological video synopsis and indexing. TPAMI, 30(1 1): 1971–1984, 2008. 1, 2 A. Rav-Acha, Y. Pritch, and S. Peleg. Making a long video short: Dynamic video synopsis. In CVPR, pages 435–441, 2006. 2 C. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. Delp. Automated video program summarization using speech transcripts. TMM, 8(4):775 –791, 2006. 2 G. Toderici, H. Aradhye, M. Pasca, L. Sbaiz, and J. Yagnik. Finding meaning on YouTube: Tag recommendation and category discovery. In CVPR, pages 3447–3454, 2010. 2 B. T. Truong and S. Venkatesh. Video abstraction: A systematic review and classification. ACM TOMCCAP, 3(1):3, 2007. 2 Z. Wang, M. Zhao, Y. Song, S. Kumar, and B. Li. YouTubeCat: Learning to categorize wild web videos. In CVPR, pages 879–886, 2010. 2 H. Yang and I. Patras. Sieving regression forest votes for facial feature detection in the wild. In ICCV, 2013. 3

[24] L. Zelnik-manor and P. Perona. Self-tuning spectral clustering. In NIPS, pages 1601–1608, 2004. 4, 6

[25] Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. ML, 55(3):3 11 331, 2004. 6