nips nips2013 nips2013-85 nips2013-85-reference knowledge-graph by maker-knowledge-mining

85 nips-2013-Deep content-based music recommendation

Source: pdf

Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen

Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative ﬁltering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks signiﬁcantly outperforming the traditional approach. 1

reference text

[1] M. Slaney. Web-scale multimedia analysis: Does content matter? MultiMedia, IEEE, 18(2):12–15, 2011. `

[2] O. Celma. Music Recommendation and Discovery in the Long Tail. PhD thesis, Universitat Pompeu Fabra, Barcelona, 2008.

[3] Malcolm Slaney, Kilian Q. Weinberger, and William White. Learning a metric for music similarity. In Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR), 2008.

[4] Jan Schl¨ ter and Christian Osendorfer. Music Similarity Estimation with the Mean-Covariance Restricted u Boltzmann Machine. In Proceedings of the 10th International Conference on Machine Learning and Applications (ICMLA), 2011.

[5] Brian McFee, Luke Barrington, and Gert R. G. Lanckriet. Learning content similarity for music recommendation. IEEE Transactions on Audio, Speech & Language Processing, 20(8), 2012.

[6] Richard Stenzel and Thomas Kamps. Improving Content-Based Similarity Measures by Training a Collaborative Model. pages 264–271, London, UK, September 2005. University of London. 8

[7] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.

[8] James Bennett and Stan Lanning. The netﬂix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35, 2007.

[9] Eric J. Humphrey, Juan P. Bello, and Yann LeCun. Moving beyond feature design: Deep architectures and automatic feature learning in music informatics. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), 2012.

[10] Philippe Hamel and Douglas Eck. Learning features from music audio with deep belief networks. In Proceedings of the 11th International Conference on Music Information Retrieval (ISMIR), 2010.

[11] Honglak Lee, Peter Pham, Yan Largman, and Andrew Ng. Unsupervised feature learning for audio classiﬁcation using convolutional deep belief networks. In Advances in Neural Information Processing Systems 22. 2009.

[12] Sander Dieleman, Phil´ mon Brakel, and Benjamin Schrauwen. Audio-based music classiﬁcation with a e pretrained convolutional network. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011.

[13] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 11th International Conference on Music Information Retrieval (ISMIR), 2011.

[14] Brian McFee, Thierry Bertin-Mahieux, Daniel P.W. Ellis, and Gert R.G. Lanckriet. The million song dataset challenge. In Proceedings of the 21st international conference companion on World Wide Web, 2012.

[15] Andreas Rauber, Alexander Schindler, and Rudolf Mayer. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR), 2012.

[16] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative ﬁltering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, 2008.

[17] Jason Weston, Chong Wang, Ron Weiss, and Adam Berenzweig. Latent collaborative retrieval. In Proceedings of the 29th international conference on Machine learning, 2012.

[18] Jason Weston, Samy Bengio, and Philippe Hamel. Large-scale music annotation and retrieval: Learning to rank in joint semantic spaces. Journal of New Music Research, 2011.

[19] Jonathan T Foote. Content-based retrieval of music and audio. In Voice, Video, and Data Communications, pages 138–147. International Society for Optics and Photonics, 1997.

[20] Matthew Hoffman, David Blei, and Perry Cook. Easy As CBA: A Simple Probabilistic Model for Tagging Music. In Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR), 2009.

[21] Brian McFee and Gert R. G. Lanckriet. Metric learning to rank. In Proceedings of the 27 th International Conference on Machine Learning, 2010.

[22] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.

[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 2012.

[24] Vinod Nair and Geoffrey E. Hinton. Rectiﬁed linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010.

[25] James Bergstra, Olivier Breuleux, Fr´ d´ ric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Dese e jardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientiﬁc Computing Conference (SciPy), June 2010.

[26] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. Technical report, University of Toronto, 2012.

[27] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579-2605):85, 2008.

[28] Chong Wang and David M. Blei. Collaborative topic modeling for recommending scientiﬁc articles. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011.

[29] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, volume 20, 2008. 9