nips nips2010 nips2010-137 nips2010-137-reference knowledge-graph by maker-knowledge-mining

137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

Source: pdf

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classiﬁcation. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efﬁciently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

reference text

[1] P. Arbel´ ez and L. Cohen. Constrained image segmentation from hierarchical boundaries. In CVPR, a 2008.

[2] I. Biederman. On the semantics of a glance at a scene. Perceptual Organization, 213–253, 1981.

[3] D. Blei and J. Lafferty. Correlated topic models. In NIPS, 2006.

[4] D. Blei and J.D. McAuliffe. Supervised topic models. In NIPS, 2007.

[5] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, (3):993–1022, 2003.

[6] L.-L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classiﬁcation of objects and scenes. In ICCV, 2007.

[7] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, (2):265–292, 2001.

[8] L. Du, L. Ren, D. Dunson, and L. Carin. A bayesian model for simultaneous image cluster, annotation and object segmentation. In NIPS, 2009.

[9] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In CVPR, 2005.

[10] A. Friedman. Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 108(3):316–355, 1979.

[11] T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1):27–59, 2009.

[12] S. Lacoste-Jullien, F. Sha, and M. Jordan. DiscLDA: Discriminative learning for dimensionality reduction and classiﬁcation. In NIPS, 2008.

[13] L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In CVPR, 2007.

[14] L.-J. Li, R. Socher, and L. Fei-Fei. Towards total scene understanding: Classiﬁcation, annotation and segmentation in an automatic framework. In CVPR, 2009.

[15] D.C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, (45):503–528, 1989.

[16] D.G. Lowe. Object recognition from local scale-invariant features. In ICCV, 1999.

[17] K. Murphy, A. Torralba, and W. Freeman. Using the forest to see the trees: A graphical model relating features, objects, and scenes. In NIPS, 2003.

[18] D. Navon. Forest before trees: The precedence of global features in visual perception. Perception and Psychophysics, 5:197–200, 1969.

[19] A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42(3):145–175, 2001.

[20] A. Quattoni and A. Torralba. Recognizing indoor scenes. In CVPR, 2009.

[21] B. Sch¨lkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimizao tion, and Beyond. MIT Press, 2001.

[22] J. Sivic, B.C. Russell, A. Efros, A. Zisserman, and W.T. Freeman. Discovering objects and their locatioins in images. In ICCV, 2005.

[23] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. Learning hierarchical models of scenes, objects, and parts. In CVPR, 2005.

[24] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.

[25] C. Wang, D. Blei, and L. Fei-Fei. Simultaneous image classiﬁcation and annotation. In CVPR, 2009.

[26] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding forimage classiﬁcation. In CVPR, 2009.

[27] J. Zhu, A. Ahmed, and E.P. Xing. MedLDA: Maximum margin supervised topic models for regression and classiﬁcation. In ICML, 2009. 9