cvpr cvpr2013 cvpr2013-243 cvpr2013-243-reference knowledge-graph by maker-knowledge-mining

243 cvpr-2013-Large-Scale Video Summarization Using Web-Image Priors

Source: pdf

Author: Aditya Khosla, Raffay Hamid, Chih-Jen Lin, Neel Sundaresan

Abstract: Given the enormous growth in user-generated videos, it is becoming increasingly important to be able to navigate them efficiently. As these videos are generally of poor quality, summarization methods designed for well-produced videos do not generalize to them. To address this challenge, we propose to use web-images as a prior to facilitate summarization of user-generated videos. Our main intuition is that people tend to take pictures of objects to capture them in a maximally informative way. Such images could therefore be used as prior information to summarize videos containing a similar set of objects. In this work, we apply our novel insight to develop a summarization algorithm that uses the web-image based prior information in an unsupervised manner. Moreover, to automatically evaluate summarization algorithms on a large scale, we propose a framework that relies on multiple summaries obtained through crowdsourcing. We demonstrate the effectiveness of our evaluation framework by comparing its performance to that ofmultiple human evaluators. Finally, wepresent resultsfor our framework tested on hundreds of user-generated videos.

reference text

[1] W. Abd-Almageed. Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracing. In ICIP, pages 3200–3203. IEEE, 2008. 4

[2] J. Almeida, N. Leite, and R. Torres. Vison: Video summarization for online applications. Pattern Recognition Letters, 2011. 7, 8

[3] A. Aner and J. Kender. Video summaries through mosaic-based shot and scene clustering. ECCV, pages 45–49, 2006. 7

[4] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. ECCV, pages 404–417, 2006. 5

[5] M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, and S. Moon. Itube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In ACM SIGCOMM, 2007. 1

[6] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2:265–292, 2002. 3

[7] E. Dumont and B. M ´erialdo. Automatic evaluation method for rushes summary content. In ICME, pages 666–669. IEEE, 2009. 8

[8] M. Everingham, A. Zisserman, C. Williams, and L. Van Gool. The Pascal visual object classes challenge. 2006. 5

[9] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. Liblinear: A library for large linear classification. JMLR, 9: 1871–1874, 2008. 5

[10] C. Gianluigi and S. Raimondo. An innovative algorithm for key frame extraction in video summarization. JRTIP, 2006. 4

[11] F. Glover. Maximum matching in a convex bipartite graph. Naval Research Logistics Quarterly, 14(3):3 13–316, 1967. 4

[12] D. Goldman, B. Curless, D. Salesin, and S. Seitz. Schematic storyboarding for video visualization and editing. ACM TOG, 2006. 7

[13] Y. Gong and X. Liu. Video summarization using singular value decomposition. In CVPR, volume 2, pages 174–180. IEEE, 2000. 1

[14] M. Huang, A. Mahajan, and D. DeMenthon. Automatic performance evaluation for video summarization. Technical report, 2004. 4

[15] R. Jiang, A. Sadka, and D. Crookes. Hierarchical video summarization in reference subspace. Consumer Electronics, 2009. 1, 7

[16] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169–2178. IEEE, 2006. 5

[17] Y. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. CVPR, 2012. 7

[18] Y. Li and B. Merialdo. Vert: automatic evaluation of video sum-

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35] maries. In Multimedia, pages 851–854. ACM, 2010. 8 C. Lin. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: ACL Workshop, 2004. 8 C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. PAMI, pages 978–994, 2011. 4, 5 Y. Ma, L. Lu, H. Zhang, and M. Li. A user attention model for video summarization. In ACM MM, pages 533–542, 2002. 7 C. Ngo, Y. Ma, and H. Zhang. Video summarization and scene detection by graph modeling. IEEE CSVT, 2005. 1, 7 P. Over, A. Smeaton, and P. Kelly. The trecvid 2007 bbc rushes summarization evaluation pilot. In TRECVID Workshop, 2007. 8 K. Papineni, S. Roukos, T. Ward, and W. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002. 8 Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg. Webcam synopsis: Peeking around the world. In ICCV. IEEE, 2007. 7 A. Rav-Acha, Y. Pritch, and S. Peleg. Making a long video short: Dynamic video synopsis. In CVPR. IEEE, 2006. 1, 7 A. Sorokin and D. Forsyth. Utility data annotation with amazon mechanical turk. In CVPR. IEEE, 2008. 8 B. Truong and S. Venkatesh. Video abstraction: A systematic review and classification. TOMCCAP, 3(1):3, 2007. 7 V. Valdes and J. Martinez. Automatic evaluation of video summaries. ACM TOMCCAP, 8(3):25, 2012. 8 C. Vondrick, D. Ramanan, and D. Patterson. Efficiently scaling up video annotation with crowdsourced marketplaces. ECCV, 2010. 8 J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Localityconstrained linear coding for image classification. CVPR, 2010. 5 M. Wang, R. Hong, G. Li, Z. Zha, S. Yan, and T. Chua. Event driven web video summarization by tag localization and key-shot identification. Multimedia, IEEE Transactions on, (99), 2011. 7 X. Wang, G. Hua, and T. X. Han. Detection by detections: Nonparametric detector adaptation for a video. In CVPR, 2012. 7 W. Wolf. Key frame selection by motion analysis. ICASSP, 1996. 7 J. Yuen, B. Russell, C. Liu, and A. Torralba. Labelme video: Building a video database with human annotations. In CVPR, 2009. 8 222777000533