cvpr cvpr2013 cvpr2013-413 cvpr2013-413-reference knowledge-graph by maker-knowledge-mining

413 cvpr-2013-Story-Driven Summarization for Egocentric Video


Source: pdf

Author: Zheng Lu, Kristen Grauman

Abstract: We present a video summarization approach that discovers the story of an egocentric video. Given a long input video, our method selects a short chain of video subshots depicting the essential events. Inspired by work in text analysis that links news articles over time, we define a randomwalk based metric of influence between subshots that reflects how visual objects contribute to the progression of events. Using this influence metric, we define an objective for the optimal k-subshot summary. Whereas traditional methods optimize a summary ’s diversity or representativeness, ours explicitly accounts for how one sub-event “leads to ” another—which, critically, captures event connectivity beyond simple object co-occurrence. As a result, our summaries provide a better sense of story. We apply our approach to over 12 hours of daily activity video taken from 23 unique camera wearers, and systematically evaluate its quality compared to multiple baselines with 34 human subjects.


reference text

file for videos.

[1] O. Aghazadeh, J. Sullivan, and S. Carlsson. Novelty detection from an egocentric perspective. In CVPR, 2011.

[2] B. Alexe, T. Deselaers, and V. Ferrari. What is an object? In CVPR, 2010.

[3] A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In CIVR, 2007.

[4] F. Crete-Roffet, T. Dolmiere, P. Ladret, and M. Nicolas. The blur effect: Perception and estimation with a new no-reference perceptual blur metric. In SPIE, 2007.

[5] A. Doherty, D. Byrne, A. Smeaton, G. Jones, and M. Hughes. Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In CIVR, 2008.

[6] M. Ellouze, N. Boujemaa, and A. M. Alimi. Im(s)2: Interactive movie summarization system. J VCIR, 21(4):283–294, 2010.

[7] A. Fathi, A. Farhadi, and J. Rehg. Understanding egocentric activities. In ICCV, 2011.

[8] A. Fathi, J. K. Hodgins, and J. M. Rehg. Social interactions: A firstperson perspective. In CVPR, 2012.

[9] A. Fathi, Y. Li, and J. M. Rehg. Learning to recognize daily action using gaze. In ECCV, 2012.

[10] D. B. Goldman, B. Curless, and S. M. Seitz. Schematic storyboarding for video visualization and editing. In SIGGRAPH, 2006.

[11] N. Jojic, A. Perina, and V. Murino. Structural epitome: A way to summarize one’s visual experience. In NIPS, 2010.

[12] K. Kitani, T. Okabe, Y. Sato, and A. Sugimoto. Fast unsupervised ego-action learning for first-person sports video. In CVPR, 2011.

[13] R. Laganiere, R. Bacco, A. Hocevar, P. Lambert, G. Pais, and B. E. Ionescu. Video summarization from spatio-temporal features. In Proc of ACM TRECVid Video Summarization Wkshp, 2008.

[14] Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012.

[15] C. Liu. Beyond pixels: Exploring new representations and applications for motion analysis. In MIT press, 2009.

[16] D. Liu, G. Hua, and T. Chen. A hierarchical visual model for video object summarization. PAMI, 32(12):2178–2190, 2010.

[17] T. Liu and J. R. Kender. Optimization algorithms for the selection of key frame sequences of variable length. In ECCV, 2002.

[18] J. Nam and A. H. Tewfik. Event-driven video abstraction and visualization. Multimedia Tools Application, 16(1):55–77, 2002.

[19] C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. Automatic video summarization by graph modeling. In ICCV, 2003.

[20] H. Pirsiavash and D. Rmanan. Detecting activities of daily living in first-person camera views. In CVPR, 2012.

[21] S. Pongnumkul, J. Wang, and M. Cohen. Creating map-based storyboards for browsing tour videos. In UIST, 2008.

[22] Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg. Webcam synopsis: Peeking around the world. In ICCV, 2007.

[23] X. Ren and C. Gu. Figure-ground segmentation improves handled object recognition in egocentric video. In CVPR, 2010.

[24] D. Shahaf and C. Guestrin. Connecting the dots between news articles. In KDD, 2010.

[25] E. Spriggs, F. D. la Torre, and M. Hebert. Temporal segmentation and activity classification from first-person sensing. In CVPR Wkshp on Egocentric Vision, 2009.

[26] W. Wolf. Key frame selection by motion analysis. In ICASSP, 1996.

[27] H.-J. Zhang, J. Wu, D. Zhong, and S. W. Smoliar. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 30(4):643–658, 1997. 222777112199