nips nips2010 nips2010-277 nips2010-277-reference knowledge-graph by maker-knowledge-mining

277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average


Source: pdf

Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma

Abstract: This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from query-document matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1


reference text

[1] S. Agarwal, T. Graepel, R. Herbrich, S.Har-Peled, and D. Roth. Generalization bounds for the area under the roc curve. Journal of Machine Learning Research, 6:393–425, 2005.

[2] S. Agarwal and P. Niyogi. Generalization bounds for ranking algorithms via algorithmic stability. Journal of Machine Learning Research, 10:441–474, 2009.

[3] P. L. Bartlett, S. Mendelson, and M. Long. Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002.

[4] J. Baxter. Learning internal representations. In Proceedings of the Eighth International Conference on Computational Learning Theory, pages 311–320. ACM Press, 1995.

[5] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd International Conference on Machine learning, pages 89–96, 2005.

[6] Y. Cao, J. Xu, T. Y. Liu, H. Li, Y. Huang, and H. W. Hon. Adapting ranking svm to document retrieval. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 186–193. ACM Press, 2006.

[7] Z. Cao, T. Qin, T. Y. Liu, M. F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML ’07: Proceedings of the 24th International Conference on Machine learning, pages 129–136, 2007.

[8] C. L. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2009 web track. Technical report, no date.

[9] S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and scoring using empirical risk minimizae ¸ tion. In COLT ’05: Proceedings of the 18th Annual Conference on Learning Theory, pages 1–15, 2005.

[10] D. Cossock and T. Zhang. Subset ranking using regression. In COLT ’06: Proceedings of the 19th Annual Conference on Learning Theory, pages 605–619, 2006.

[11] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933–969, 2003.

[12] R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132, Cambridge, MA, 1999. MIT.

[13] T. Joachims. Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142, 2002.

[14] Y. Y. Lan, T. Y. Liu, Z. M. Ma, and H. Li. Generalization analysis of listwise learning-torank algorithms. In ICML ’09: Proceedings of the 26th International Conference on Machine Learning, pages 577–584, 2009.

[15] Y. Y. Lan, T. Y. Liu, T. Qin, Z. M. Ma, and H. Li. Query-level stability and generalization in learning to rank. In ICML ’08: Proceedings of the 25th International Conference on Machine Learning, pages 512–519, 2008.

[16] T. Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3:225–331, 2009.

[17] J. R. Wen, J. Y. Nie, and H. J. Zhang. Clustering user queries of a search engine. In WWW ’01: Proceedings of the 10th international conference on World Wide Web, pages 162–168, New York, NY, USA, 2001. ACM.

[18] F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to rank - theory and algorithm. In ICML ’08: Proceedings of the 25th International Conference on Machine learning, pages 1192–1199. Omnipress, 2008.

[19] E. Yilmaz and S. Robertson. Deep versus shallow judgments in learning to rank. In SIGIR ’09: Proceedings of the 32th annual international ACM SIGIR conference on Research and development in information retrieval, pages 662–663, 2009. 9