jmlr jmlr2013 jmlr2013-60 jmlr2013-60-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wei Wu, Zhengdong Lu, Hang Li
Abstract: The task of matching data from two heterogeneous domains naturally arises in various areas such as web search, collaborative filtering, and drug design. In web search, existing work has designed relevance models to match queries and documents by exploiting either user clicks or content of queries and documents. To the best of our knowledge, however, there has been little work on principled approaches to leveraging both clicks and content to learn a matching model for search. In this paper, we propose a framework for learning to match heterogeneous objects. The framework learns two linear mappings for two objects respectively, and matches them via the dot product of their images after mapping. Moreover, when different regularizations are enforced, the framework renders a rich family of matching models. With orthonormal constraints on mapping functions, the framework subsumes Partial Least Squares (PLS) as a special case. Alternatively, with a ℓ1 +ℓ2 regularization, we obtain a new model called Regularized Mapping to Latent Structures (RMLS). RMLS enjoys many advantages over PLS, including lower time complexity and easy parallelization. To further understand the matching framework, we conduct generalization analysis and apply the result to both PLS and RMLS. We apply the framework to web search and implement both PLS and RMLS using a click-through bipartite with metadata representing features of queries and documents. We test the efficacy and scalability of RMLS and PLS on large scale web search problems. The results show that both PLS and RMLS can significantly outperform baseline methods, while RMLS substantially speeds up the learning process. Keywords: web search, partial least squares, regularized mapping to latent structures, generalization analysis
J. Abernethy, F. Bach, T. Evgeniou, and J.P. Vert. A new approach to collaborative filtering: Operator estimation with spectral regularization. JMLR ’09, 10:803–826, 2009. S. Agarwal and P. Niyogi. Stability and generalization of bipartite ranking algorithms. In COLT’05, pages 32–47, 2005. F. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS’08, pages 105–112, 2008. F. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In ICML’04, 2004. R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. In SIGKDD’07, pages 76–85, 2007. B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger. Supervised semantic indexing. In CIKM’09, pages 187–196, 2009. P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR’02, 3:463–482, 2002. C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. In NIPS’06, pages 395–402. 2006. 2545 W U , L U AND L I Y. Cao, J. Xu, T.Y. Liu, H. Li, Y. Huang, and H.W. Hon. Adapting ranking svm to document retrieval. In SIGIR ’06, pages 186–193, 2006. Z. Cao, T. Qin, T.Y. Liu, M.F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML ’07, pages 129–136, 2007. T.Q. Chen, W.N. Zhang, Q.X. Lu, K.L. Chen, Z. Zhao, and Y. Yu. Svdfeature: A toolkit for featurebased collaborative filtering. JMLR’12, 13, 2012. W. Chen, T.Y. Liu, and Z. Ma. Two-layer generalization analysis for ranking using rademacher average. NIPS’10, 23:370–378, 2010. C. Cortes. Invited talk: Can learning kernels help performance? In ICML’09, page 161, 2009. K. Crammer and Y. Singer. Pranking with ranking. In NIPS’01, pages 641–647, 2001. N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR’07, pages 239–246, 2007. N. Cristianini, J. Shawe-taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In NIPS’01, 2001. J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon. Information-theoretic metric learning. In ICML’07, page 216, 2007. S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS’90, 41:391–407, 1990. J.H. Friedman. Greedy function approximation: a gradient boosting machine. Ann. Statist, 29(5): 1189–1232, 2001. J. Gao, K. Toutanova, and W. Yih. Clickthrough-based latent semantic models for web search. In SIGIR’11, pages 675–684, 2011. D. Grangier and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE Transactions on PAMI, 30(8):1371–1384, 2008. D.R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. NIPS’99, pages 115–132, 1999. T. Hertz, A. Bar-Hillel, and D. Weinshall. Boosting margin based distance functions for clustering. In ICML’04, page 50, 2004. T. Hofmann. Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst., 22:89–115, 2004. S. CH. Hoi, W. Liu, M.R. Lyu, and W.Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In CVPR’06, volume 2, pages 2072–2078, 2006. 2546 L EARNING B ILINEAR M ODEL FOR M ATCHING Q UERIES AND D OCUMENTS S. JacobGoldberger and R. GeoffHinton. Neighbourhood components analysis. NIPS’04, 2004. K. Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR’00, pages 41–48, 2000. T. Joachims. Optimizing search engines using clickthrough data. In KDD ’02, pages 133–142, 2002. I.T. Jolliffe. Principal Component Analysis. Springer verlag, 2002. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semi-definite programming. In ICML’02, pages 323–330, 2002. H. Li. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, 4(1):1–113, 2011. T.Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225–331, 2009. H. Ma, H.X. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data for query suggestion. In CIKM’08, pages 709–718, 2008. C. Micchelli and M. Pontil. Learning the kernel function via regularization. JMLR’05, 6:1099– 1125, 2005. C.S. Ong, A.J. Smola, and R.C. Williamson. Learning the kernel with hyperkernels. JMLR ’05, 6: 1043–1071, 2005. J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In SIGIR’98, pages 275–281, 1998. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994. R. Rosipal and N. Kr¨ mer. Overview and recent advances in partial least squares. Subspace, Latent a Structure and Feature Selection, pages 34–51, 2006. C. Rudin, C. Cortes, M. Mohri, and R. E. Schapire. Margin-based ranking meets boosting in the middle. In COLT’05, pages 63–78, 2005. G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA, 1986. P.J. Schreier. A unifying discussion of correlation analysis for complex random vectors. Signal Processing, IEEE Transactions on, 56(4):1327–1336, 2008. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. NIPS’03, 2003. N. Srebro, J.D.M. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. NIPS’05, pages 1329–1336, 2005. 2547 W U , L U AND L I M. Sugiyama. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. JMLR’07, 8:1027–1061, 2007. M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In ICML’09, page 134, 2009. J.A. Wegelin. A survey of partial least squares (pls) methods, with emphasis on the two-block case. Technical Report, No.371, Seattle: Department of Statistics, Univ. of Wash., 2000. K.Q. Weinberger and L.K. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR’09, 10:207–244, 2009. W. Wu, J. Xu, H. Li, and O. Satoshi. Learning a robust relevance model for search using kernel methods. JMLR’11, 12:1429–1458, 2011. E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. NIPS’03, pages 521–528, 2003. J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR’ 07, pages 391–398, 2007. J. Xu, H. Li, and Z.L. Zhong. Relevance ranking using kernels. In AIRS ’10, 2010. L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In AAAI’06, page 543, 2006. Y. Ying, K. Huang, and C. Campbell. Sparse Metric Learning via Smooth Optimization. NIPS’09, pages 521–528, 2009. C.X. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179–214, 2004. 2548