nips nips2009 nips2009-90 nips2009-90-reference knowledge-graph by maker-knowledge-mining

90 nips-2009-Factor Modeling for Advertisement Targeting

Source: pdf

Author: Ye Chen, Michael Kapralov, John Canny, Dmitry Y. Pavlov

Abstract: We adapt a probabilistic latent variable model, namely GaP (Gamma-Poisson) [6], to ad targeting in the contexts of sponsored search (SS) and behaviorally targeted (BT) display advertising. We also approach the important problem of ad positional bias by formulating a one-latent-dimension GaP factorization. Learning from click-through data is intrinsically large scale, even more so for ads. We scale up the algorithm to terabytes of real-world SS and BT data that contains hundreds of millions of users and hundreds of thousands of features, by leveraging the scalability characteristics of the algorithm and the inherent structure of the problem including data sparsity and locality. SpeciÄ?Ĺš cally, we demonstrate two somewhat orthogonal philosophies of scaling algorithms to large-scale problems, through the SS and BT implementations, respectively. Finally, we report the experimental results using YahooĂ˘€™s vast datasets, and show that our approach substantially outperform the state-of-the-art methods in prediction accuracy. For BT in particular, the ROC area achieved by GaP is exceeding 0.95, while one prior approach using Poisson regression [11] yielded 0.83. For computational performance, we compare a single-node sparse implementation with a parallel implementation using Hadoop MapReduce, the results are counterintuitive yet quite interesting. We therefore provide insights into the underlying principles of large-scale learning. 1

reference text

[1] http://hadoop.apache.org/.

[2] http://www.mathworks.com/products/matlab/.

[3] http://www.netlib.org/lapack/.

[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. The Journal of Machine Learning Research, 3:993Ă˘€“1022, 2003.

[5] A. Broder, M. Fontoura, V. Josifovski, and L. Riedel. A semantic approach to contextual advertising. ACM Conference on Information Retrieval (SIGIR 2007), pages 559Ă˘€“566, 2007.

[6] J. F. Canny. GaP: a factor model for discrete data. ACM Conference on Information Retrieval (SIGIR 2004), pages 122Ă˘€“129, 2004.

[7] J. F. Canny, S. Zhong, S. Gaffney, C. Brower, P. Berkhin, and G. H. John. Granular data for behavioral targeting. U.S. Patent Application 20090006363.

[8] D. Chakrabarti, D. Agarwal, and V. Josifovski. Contextual advertising by combining relevance with click feedback. International World Wide Web Conference (WWW 2008), pages 417Ă˘€“426, 2008.

[9] O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. International World Wide Web Conference (WWW 2009), pages 1Ă˘€“10, 2009.

[10] Y. Chen, D. Pavlov, P. Berkhin, and J. F. Canny. Large-scale behavioral targeting for advertising over a network. U.S. Patent Application 12/351,749, Ä?Ĺš led: Jan 09, 2009.

[11] Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. ACM Conference on Knowledge Discovery and Data Mining (KDD 2009), 2009.

[12] C. Y. Chung, J. M. Koran, L.-J. Lin, and H. Yin. Model for generating user proÄ?Ĺš les in a behavioral targeting system. U.S. Patent 11/394,374, Ä?Ĺš led: Mar 29, 2006.

[13] M. Ciaramita, V. Murdock, and V. Plachouras. Online learning from click data for sponsored search. International World Wide Web Conference (WWW 2008), pages 227Ă˘€“236, 2008.

[14] N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. Web Search and Web Data Mining (WSDM 2008), pages 87Ă˘€“94, 2008.

[15] J. Dean and S. Ghemawat. Mapreduce: SimpliÄ?Ĺš ed data processing on large clusters. Communications of the ACM, 51(1):107Ă˘€“113, 2008.

[16] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391Ă˘€“407, 1990.

[17] D. C. Fain and J. O. Pedersen. Sponsored search: a brief history. Bulletin of the American Society for Information Science and Technology, 32(2):12Ă˘€“13, 2006.

[18] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(12):177Ă˘€“196, 2001.

[19] A. HyvĂ‚Â¨ rinen. Fast and robust Ä?Ĺš xed-point algorithms for independent component analysis. IEEE Transa actions on Neural Networks, 10(3):626Ă˘€“634, 1999.

[20] I. T. Jolliffe. Principal Component Analysis. Springer, 2002.

[21] A. Lacerda, M. Cristo, M. A. Goncalves, W. Fan, N. Ziviani, and B. Ribeiro-Neto. Learning to advertise. Ă‚Â¸ ACM Conference on Information Retrieval (SIGIR 2006), pages 549Ă˘€“556, 2006.

[22] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS 2000), 13:556Ă˘€“562, 2000.

[23] S. Pandey and C. Olston. Handling advertisements of unknown quality in search advertising. Advances in Neural Information Processing Systems (NIPS 2006), 19:1065Ă˘€“1072, 2006.

[24] F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. National Conference on ArtiÄ?Ĺš cial Intelligence (AAAI 2006), pages 1406Ă˘€“1412, 2006.

[25] M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. International World Wide Web Conference (WWW 2007), pages 521Ă˘€“530, 2007.

[26] D. A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3):385Ă˘€“463, 2004. 9