nips nips2007 nips2007-183 nips2007-183-reference knowledge-graph by maker-knowledge-mining

183 nips-2007-Spatial Latent Dirichlet Allocation

Source: pdf

Author: Xiaogang Wang, Eric Grimson

Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision ﬁeld. However, many of these applications have difﬁculty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be ﬂexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1

reference text

[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[2] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In Proc. ICCV, 2005.

[3] B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In Proc. CVPR, 2006.

[4] L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent object segmentation and classiﬁcation. In Proc. ICCV, 2007.

[5] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proc. CVPR, 2005.

[6] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatialtemporal words. In Proc. BMVC, 2006.

[7] X. Wang, X. Ma, and E. Grimson. Unsupervised activity perception by hierarchical bayesian models. In Proc. CVPR, 2007.

[8] M. Rosen-Zvi, T. Grifﬁths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proc. of Uncertainty in Artiﬁcial Intelligence, 2004.

[9] D. Blei and J. Lafferty. Dynamic topic models. In Proc. ICML, 2006.

[10] D. Blei and J. Lafferty. Correlated topic models. In Proc. NIPS, 2006.

[11] E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Learning hierarchical models of scenes, objects, and parts. In Proc. ICCV, 2005.

[12] J. Verbeek and B. Triggs. Region classiﬁcation with markov ﬁeld aspect models. In Proc. CVPR, 2007. 7 (a) (b) (c) Figure 6: Examples of experimental results on the MSRC image data set. (a): original images; (b): LDA results; (c) SLDA results.

[13] J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In Proc. ICCV, 2005.

[14] T. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. In Proc. of the National Academy of Sciences, 2004.

[15] http://people.csail.mit.edu/xgwang/slda.html. 8