nips nips2009 nips2009-46 nips2009-46-reference knowledge-graph by maker-knowledge-mining

46 nips-2009-Bilinear classifiers for visual recognition


Source: pdf

Author: Hamed Pirsiavash, Deva Ramanan, Charless C. Fowlkes

Abstract: We describe an algorithm for learning bilinear SVMs. Bilinear classifiers are a discriminative variant of bilinear models, which capture the dependence of data on multiple factors. Such models are particularly appropriate for visual data that is better represented as a matrix or tensor, rather than a vector. Matrix encodings allow for more natural regularization through rank restriction. For example, a rank-one scanning-window classifier yields a separable filter. Low-rank models have fewer parameters and so are easier to regularize and faster to score at run-time. We learn low-rank models with bilinear classifiers. We also use bilinear classifiers for transfer learning by sharing linear factors between different classification tasks. Bilinear classifiers are trained with biconvex programs. Such programs are optimized with coordinate descent, where each coordinate step requires solving a convex program - in our case, we use a standard off-the-shelf SVM solver. We demonstrate bilinear SVMs on difficult problems of people detection in video sequences and action classification of video sequences, achieving state-of-the-art results in both. 1


reference text

[1] F.A. Al-Khayyal and J.E. Falk. Jointly constrained biconvex programming. Mathematics of Operations Research, pages 273–286, 1983. 8

[2] R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6:1817–1853, 2005.

[3] S.P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.

[4] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

[5] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265–292, 2002.

[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, volume 1, 2005.

[7] N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. Lecture Notes in Computer Science, 3952:428, 2006.

[8] Navneet Dalal. Finding People in Images and Video. PhD thesis, Institut National Polytechnique de Grenoble / INRIA Grenoble, July 2006.

[9] P. Doll´ r, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A benchmark. In CVPR, June 2009. a

[10] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascalnetwork.org/challenges/VOC/voc2008/workshop/index.html.

[11] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, In submission.

[12] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. Computer Vision and Pattern Recognition, Anchorage, USA, June, 2008.

[13] V. Franc and S. Sonnenburg. Optimized cutting plane algorithm for support vector machines. In Proceedings of the 25th international conference on Machine learning, pages 320–327. ACM New York, NY, USA, 2008.

[14] J. Gorski, F. Pfeuffer, and K. Klamroth. Biconvex sets and optimization with biconvex functions: a survey and extensions. Mathematical Methods of Operations Research, 66(3):373–407, 2007.

[15] L.D. Lathauwer, B.D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl, 1995.

[16] N. Loeff and A. Farhadi. Scene Discovery by Matrix Factorization. In Proceedings of the 10th European Conference on Computer Vision: Part IV, pages 451–464. Springer-Verlag Berlin, Heidelberg, 2008.

[17] J.D.M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In International Conference on Machine Learning, volume 22, page 713, 2005.

[18] M.D. Rodriguez, J. Ahmed, and M. Shah. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pages 1–8, 2008.

[19] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of th e17th International Conference on, volume 3, 2004.

[20] A. Shashua and T. Hazan. Non-negative tensor factorization with applications to statistics and computer vision. In International Conference on Machine Learning, volume 22, page 793, 2005.

[21] N. Srebro, J.D.M. Rennie, and T.S. Jaakkola. Maximum-margin matrix factorization. Advances in Neural Information Processing Systems, 17:1329–1336, 2005.

[22] D. Tao, X. Li, X. Wu, and S.J. Maybank. General tensor discriminant analysis and Gabor features for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10):1700, 2007.

[23] J.B. Tenenbaum and W.T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247–1283, 2000.

[24] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006.

[25] M.A.O. Vasilescu and D. Terzopoulos. Multilinear analysis of image ensembles: Tensorfaces. Lecture Notes in Computer Science, pages 447–460, 2002.

[26] L. Wolf, H. Jhuang, and T. Hazan. Modeling appearances with low-rank SVM. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–6. Citeseer, 2007.

[27] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H.J. Zhang. Discriminant analysis with tensor representation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, page 526. Citeseer, 2005. 9