nips nips2007 nips2007-143 nips2007-143-reference knowledge-graph by maker-knowledge-mining

143 nips-2007-Object Recognition by Scene Alignment

Source: pdf

Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman

Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1

reference text

[1] A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence. In CVPR, volume 1, pages 26–33, June 2005.

[2] M. Everingham, A. Zisserman, C.K.I. Williams, and L. Van Gool. The pascal visual object classes challenge 2006 (voc 2006) results. Technical report, September 2006. The PASCAL2006 dataset can be downloaded at http : //www.pascal − network.org/challenges/VOC/voc2006/. 7 tree (531) 0 −3 10 −2 10 −1 car (138) 0 −3 10 −2 10 10 −1 10 road (232) −3 10 −2 10 −1 screen (268) 0 10 10 −2 10 −1 sky (144) 0 −3 10 −2 10 10 −1 10 bookshelf (47) −4 10 −3 10 −2 10 −1 10 motorbike (40) 0 10 −1 −4 10 −3 10 −2 10 −1 10 10 keyboard (154) 0 10 10 10 −1 −4 10 0 10 −3 10 10 −1 −4 10 −1 −4 10 10 −1 10 −1 −4 10 0 10 10 10 10 sidewalk (196) 0 10 −1 −4 10 person (113) 0 10 −1 10 building (547) 0 10 −4 10 −3 10 −1 10 wall (69) 0 10 −2 10 10 SVM No clustering Clustering −1 10 −1 −4 10 −3 10 −2 10 −1 10 10 −1 −4 10 −3 10 −2 10 10 −1 10 −1 −4 10 −3 10 −2 10 −1 10 10 −4 10 −3 10 −2 10 −1 10 Figure 7: Comparison of full system against local appearance only detector (SVM). Detection rate for a number of object categories tested at a ﬁxed false positive per window rate of 2e-04 (0.8 false positives per image per object class). The number of test examples appear in parenthesis next to the category name. We plot performance for a number of classes for the baseline SVM object detector (blue), the detector of Section 3 using no clustering (red), and the full system (green). Notice that detectors taking into account context performs better in most cases than using local appearance alone. Also, clustering does as well, and sometimes exceeds no clustering. Notable exceptions are for some indoor object categories. This is due to poor retrieval set matching, which causes a poor context model to be learned.

[3] C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998.

[4] P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition. Intl. J. Computer Vision, 61(1), 2005.

[5] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In CVPR, 2003.

[6] James Hays and Alexei Efros. Scene completion using millions of photographs. In ”SIGGRAPH”, 2007.

[7] D. Hoiem, A. Efros, and M. Hebert. Putting objects in perspective. In CVPR, 2006.

[8] H. Ishwaran and M. Zarepour. Exact and approximate sum-representations for the dirichlet process. Canadian Journal of Statistics, 30:269–283, 2002.

[9] David G. Lowe. Distinctive image features from scale-invariant keypoints. Intl. J. Computer Vision, 60(2):91–110, 2004.

[10] J. McAuliffe, D. Blei, and M. Jordan. Nonparametric empirical bayes for the Dirichlet process mixture model. Statistics and Computing, 16:5–14, 2006.

[11] R. M. Neal. Density modeling and clustering using Dirichlet diffusion trees. In Bayesian Statistics, 7:619–629, 2003.

[12] A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. Intl. J. Computer Vision, 42(3):145–175, 2001.

[13] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In IEEE Intl. Conf. on Computer Vision, 2007.

[14] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme: a database and web-based tool for image annotation. Technical Report AIM-2005-025, MIT AI Lab Memo, September, 2005.

[15] E. Sudderth, A. Torralba, W. T. Freeman, and W. Willsky. Learning hierarchical models of scenes, objects, and parts. In IEEE Intl. Conf. on Computer Vision, 2005.

[16] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006.

[17] Y. W. Teh, D. Newman, and Welling M. A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In Advances in Neural Info. Proc. Systems, 2006.

[18] A. Torralba. Contextual priming for object detection. Intl. J. Computer Vision, 53(2):153–167, 2003.

[19] A. Torralba, R. Fergus, and W.T. Freeman. Tiny images. Technical Report AIM-2005-025, MIT AI Lab Memo, September, 2005.

[20] A. Torralba, K. Murphy, W. Freeman, and M. Rubin. Context-based vision system for place and object recognition. In Intl. Conf. Computer Vision, 2003. 8