cvpr cvpr2013 cvpr2013-5 cvpr2013-5-reference knowledge-graph by maker-knowledge-mining

5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning

Source: pdf

Author: Go Irie, Dong Liu, Zhenguo Li, Shih-Fu Chang

Abstract: Despite significant progress, most existing visual dictionary learning methods rely on image descriptors alone or together with class labels. However, Web images are often associated with text data which may carry substantial information regarding image semantics, and may be exploited for visual dictionary learning. This paper explores this idea by leveraging relational information between image descriptors and textual words via co-clustering, in addition to information of image descriptors. Existing co-clustering methods are not optimal for this problem because they ignore the structure of image descriptors in the continuous space, which is crucial for capturing visual characteristics of images. We propose a novel Bayesian co-clustering model to jointly estimate the underlying distributions of the continuous image descriptors as well as the relationship between such distributions and the textual words through a unified Bayesian inference. Extensive experiments on image categorization and retrieval have validated the substantial value of the proposed joint modeling in improving visual dictionary learning, where our model shows superior performance over several recent methods.

reference text

[1] M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm for designing overcomplete dictionries for sparse representation. IEEE Trans. on Signal Processing, 54:431 1–4322, 2006. 2, 7

[2] D. Blei and M. Jordan. Modeling annotated data. In SIGIR, 2003. 3, 6

[3] I. Dhillon. Coclustering documents and words using bipartite spectral graph partitioning. In KDD, 2001. 2, 5, 7

[4] I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003. 2, 5, 7

[5] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix trifactorizations for clustering. In KDD, 2006. 2, 5, 7

[6] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In CVPR, 2005. 6

[7] M. Guillaumin, J. Verbeek, and C. Schmid. Multimodal semi-supervised learning for image classification. In CVPR, 2010. 1, 2, 7

[8] Y. Jia, M. Saizmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, 2011. 3

[9] Z. Jiang, Z. Lin, and L. Davis. Learning a discriminative dictionary for sparse coding via label consistent k-svd. In CVPR, 2011. 1, 2

[10] Z. Jiang, G. Zhang, and L. Davis. Submodular dictionary learning for sparse coding. In CVPR, 2012. 1, 2

[11] D. Kim, M. Hughes, and E. Sudderth. The nonparametric metadata dependent relational model. In ICML, 2012. 2

[12] S. Lazebnik and M. Raginsky. Supervised learning of quantizer codebooks by information loss minimization. TPAMI, 31:1294–1309, 2009. 1, 2, 7

[13] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006. 7

[14] L. Li, M. Zhou, G. Sapiro, and L. Carin. On the integration of topic modeling and dictionary learning. In ICML, 2011. 3, 5, 6

[15] L.-J. Li, R. Socher, and L. Fei-Fei. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In CVPR, 2009. 3

[16] X.-C. Lian, Z. Li, C. Wang, B.-L. Lu, and L. Zhang. Probabilistic models for supervised dictionary learning. In CVPR, 2010. 2

[17] J. Mairal, F. Bach, and J. Ponce. Task-driven dictionary learning. In TPAMI, 2012. 1, 2

[18] E. Meeds, Z. Ghahramani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent factors. In NIPS, 2006. 2

[19] D. Putthividhy, H. Attias, and S. Nagarajan. Topic regressionmulti-modal latent dirichlet allocation for image annotation. In CVPR, 2010. 3

[20] A. Quattoni, M. Collins, and T. Darrell. Learning visual representations using images with captions. In CVPR, 2007. 2

[21] N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, 2010. 3, 7

[22] H. Shan and A. Banerjee. Bayesian co-clustering. In ICDM, 2008. 2, 5, 7

[23] Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. J. American Stat. Assoc., 101:1566–1581, 2006. 5

[24] C. Wang, D. Blei, and L. Fei-Fei. Simultaneous image classification and annotation. In CVPR, 2009. 3, 5, 6

[25] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, 2010. 1, 2, 4, 7

[26] P. Wang, K. Laskey, C. Domeniconi, and M. Jordan. Nonparametric bayesian co-clustering ensembles. In SDM, 2011. 2

[27] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009. 4, 7

[28] M. Yang, L. Zhang, X. Feng, and D. Zhang. Fisher discrimination dictionary learning for sparse representation. In ICCV, 2011. 2

[29] Q. Yang, Y. Chen, G.-R. Xue, W. Dai, and Y. Yu. Heterogeneous transfer learning for image clustering via the socialweb. In ACL, 2009. 3

[30] Q. Zhang and B. Li. Discriminative k-svd for dictionary learning in face recognition. In CVPR, 2010. 1, 2, 7

[31] N. Zhou, Y. Shen, J. Peng, and J. Fan. Learning inter-related visual dictionary for object recognition. In CVPR, 2012. 2

[32] Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang. Heterogeneous transfer learning for image classification. In AAAI, 2011. 3 333333666