iccv iccv2013 iccv2013-235 iccv2013-235-reference knowledge-graph by maker-knowledge-mining

235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching

Source: pdf

Author: Kaiye Wang, Ran He, Wei Wang, Liang Wang, Tieniu Tan

Abstract: Cross-modal matching has recently drawn much attention due to the widespread existence of multimodal data. It aims to match data from different modalities, and generally involves two basic problems: the measure of relevance and coupled feature selection. Most previous works mainly focus on solving the first problem. In this paper, we propose a novel coupled linear regression framework to deal with both problems. Our method learns two projection matrices to map multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the ?21-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, which enhances the relevance of different modal data with connections. We also present an iterative algorithm based on halfquadratic minimization to solve the proposed regularized linear regression problem. The experimental results on two challenging cross-modal datasets demonstrate that the proposed method outperforms the state-of-the-art approaches.

reference text

[1] R. Angst, C. Zach, and M. Pollefeys. The generalized tracenorm and its application to structure-from-motion problems. In ICCV, pages 2502–2509, 2011.

[2] Y. Chen, L. Wang, W. Wang, and Z. Zhang. Continuum

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] regression for cross-modal multimedia retrieval. In ICIP, pages 1949–1952, 2012. M. Fornasier, H. Rauhut, and R. Ward. Low-rank matrix recovery via iteratively reweighted least squares minimizaition. SIAM Journal on Optimization, 21(4): 1614–1640, 2011. E. Grave, G. Obozinski, and F. Bach. Trace lasso: a trace norm regularization for correlated designs. In NIPS, pages 2187–2195, 2011. Q. Gu, Z. Li, and J. Han. Joint feature selecion and subspace learning. In IJCAI, pages 1294–1299, 2011. Z. Harchaoui, M. Douze, M. Paulin, M. Dudik, and J. Malick. Large-scale image classification with trace-norm regularization. In CVPR, pages 3386–3393, 2012. D. Hardoon, S. Szedmak, and J. Shawe-Taylor. Cannonical correlation analysis: an overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. R. He, T. N. Tan, L. Wang, and W. Zheng. ?21 regularized correntropy for robust feature selection. In CVPR, pages 2504–251 1, 2012. R. He, W. Zheng, and B. Hu. Maximum correntropy criterion for robust face recognition. IEEE TPAMI, 33(8): 1561–1576, 2011. Z. Huang, S. Shan, H. Zhang, S. Lao, and X. Chen. Crossview graph embedding. In ACCV, 2012. S. Hwang and K. Grauman. Reading between the lines: object localization using implicit cues from image tags. IEEE TPAMI, 34(6): 1145–1 158, 2012. Z. Lei and S. Z. Li. Coupled spectral regression for matching heterogeneous faces. In CVPR, pages 1123–1 128, 2009. A. Li, S. Shan, X. Chen, and W. Gao. Maximizing intra-

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] individual correlations for face recognition across pose differences. In CVPR, pages 605–61 1, 2009. A. Li, S. Shan, X. Chen, and W. Gao. Face recognition based on non-corresponding region matching. In ICCV, pages 1060–1067, 2011. F. Nie, H. Huang, X. Cai, and C. Ding. Efficient and robust feature selction via joint ?21-norms minimization. In NIPS, pages 1813–1821, 2010. M. Nikolova and M.K.Ng. Analysis of half-quadratic minimization methods for signal and image recovery. SIAM Journal on Scientific Computing, 27(3):937–966, 2005. N. Quadrianto and C. H. Lampert. Learning multi-view neighborhood preserving projections. In ICML, pages 425– 432, 2011. N. Rasiwasia, P. J. Moreno, and N. Vasconcelos. Bridging the gap: query by semantic example. IEEE TMM, 9(5):923– 938, 2007. N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to crossmodal multimedia retrieval. In ACM MM, pages 25 1–260, 2010. R. Rosipal and N. Kramer. Overview and recent advances in partial least squares. LNCS, pages 34–51, 2006. A. Sharma and D. W. Jacobs. Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In CVPR, pages 593–600, 2011. A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs. Generalized multiview analysis: a discriminative latent space. In CVPR, pages 2160–2167, 2012. L. Sun, S. Ji, and J. Ye. A least squares formulation for canonical correlation analysis. In ICML, pages 1024–103 1, 2008.

[24] J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6): 1247–1283, 2000.

[25] R. Udupa and M. Khapra. Improving the multilingual user experience of wikipedia using cross-language name search. NACACL-HLT, pages 492–500, 2010.

[26] F. Wu, Y. Yuan, X. Liu, J. Shao, Y. Zhuang, and Z. Zhang. The heterogeneous feature selection with structual sparsity for multimedia annotation and hashing: a survey. IJMIR, 1(1):3–15, 2012. 2095