jmlr jmlr2009 jmlr2009-4 jmlr2009-4-reference knowledge-graph by maker-knowledge-mining

4 jmlr-2009-A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

Source: pdf

Author: Asela Gunawardana, Guy Shani

Abstract: Recommender systems are now popular both commercially and in the research community, where many algorithms have been suggested for providing recommendations. These algorithms typically perform differently in various domains and tasks. Therefore, it is important from the research perspective, as well as from a practical view, to be able to decide on an algorithm that matches the domain and the task of interest. The standard way to make such decisions is by comparing a number of algorithms ofﬂine using some evaluation metric. Indeed, many evaluation metrics have been suggested for comparing recommendation algorithms. The decision on the proper evaluation metric is often critical, as each metric may favor a different algorithm. In this paper we review the proper construction of ofﬂine experiments for deciding on the most appropriate algorithm. We discuss three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. We demonstrate how using an improper evaluation metric can lead to the selection of an improper algorithm for the task of interest. We also discuss other important considerations when designing ofﬂine experiments. Keywords: recommender systems, collaborative ﬁltering, statistical analysis, comparative studies

reference text

D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12:387–415, 1975. Y. Bengio and Y. Grandvalet. No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 2004. F. Bodon. A fast APRIORI implementation. In The IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003. D. Braziunas and C. Boutilier. Local utility elicitation in GAI models. In Proceedings of the Twenty-ﬁrst Conference on Uncertainty in Artiﬁcial Intelligence, pages 42–49, Edinburgh, 2005. J. S. Breese, D. Heckerman, and C. M. Kadie. Empirical analysis of predictive algorithms for collaborative ﬁltering. In UAI: Uncertainty in Artiﬁcial Intelligence, pages 43–52, 1998. ` O. Celma and P. Herrera. A new approach to evaluating novel recommendations. In RecSys ’08: Proceedings of the 2008 ACM Conference on Recommender Systems, 2008. M. Claypool, P. Le, M. Waseda, and D. Brown. Implicit interest indicators. In Intelligent User Interfaces, pages 33–40. ACM Press, 2001. 2959 G UNAWARDANA AND S HANI F. Hern´ ndez del Olmo and E. Gaudioso. Evaluation of recommender systems: A new approach. a Expert Systems Applications, 35(3), 2008. J. Demˇar. Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine Learns ing Research, 7, 2006. R. O. Duda and P. E. Hart. Pattern Classiﬁcation and Scene Analysis. Wiley, 1973. C. Goutte and E. Gaussier. A probabilistic interpretation of precision, recall, and F-score, with implication for evaluation. In ECIR ’05: Proceedings of the 27th European Conference on Information Retrieval, pages 345–359, 2005. A. Gunawardana and C. Meek. Aggregators and contextual effects in search ad markets. In WWW Workshop on Targeting and Ranking for Online Advertising, 2008. J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative ﬁltering recommender systems. ACM Transactions on Information Systems, 22(1), 2004. C. N. Hsu, H. H. Chung, and H. S. Huang. Mining skewed and sparse transaction data for personalized shopping recommendation. Machine Learning, 57(1-2), 2004. R. Hu and P. Pu. A comparative user study on rating vs. personality quiz based preference elicitation methods. In IUI ’09: Proceedings of the 13th International Conference on Intelligent User Interfaces, 2009. S. L. Huang. Comparision of utility-based recommendation methods. In The Paciﬁc Asia Conference on Information Systems, 2008. C. Kadie, C. Meek, and D. Heckerman. CFW: A collaborative ﬁltering system using posteriors over weights of evidence. In Proceedings of the 18th Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI-02), pages 242–250, San Francisco, CA, 2002. Morgan Kaufmann. R. Kohavi, R. Longbotham, D. Sommerﬁeld, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 2009. J. A. Konstan, S. M. McNee, C. N. Ziegler, R. Torres, N. Kapoor, and J. Riedl. Lessons on applying automated recommender systems to information-seeking tasks. In Proceedings of the TwentyFirst National Conference on Artiﬁcal Intelligence (AAAI), 2006. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Recommendation systems: A probabilistic analysis. In FOCS ’98: Proceedings of the 39th Annual Symposium on Foundations of Computer Science, 1998. D. Larocque, J. Nevalainen, and H. Oja. A weighted multivariate sign test for cluster-correlated data. Biometrika, 94:267–283, 2007. G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative ﬁltering. IEEE Internet Computing, 7(1), 2003. 2960 A S URVEY OF E VALUATION M ETRICS OF R ECOMMENDATION TASKS M. R. McLaughlin and J. L. Herlocker. A collaborative ﬁltering algorithm and evaluation metric that accurately model the user experience. In SIGIR ’04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004. S. Mcnee, S. K. Lam, C. Guetzlaff, J. A. Konstan, and J. Riedl. Conﬁdence displays and training in recommender systems. In Proceedings of the 9th IFIP TC13 International Conference on Human Computer Interaction INTERACT, pages 176–183. IOS Press, 2003. S. M. McNee, J. Riedl, and J. K. Konstan. Making recommendations better: an analytic model for human-recommender interaction. In CHI ’06 Extended Abstracts on Human Factors in Computing Systems, 2006. M. Montaner, B. L´ pez, and J. L. De La Rosa. A taxonomy of recommender agents on the internet. o Artiﬁcial Intelligence Review, 19(4), 2003. D. Oard and J. Kim. Implicit feedback for recommender systems. In The AAAI Workshop on Recommender Systems, pages 81–83, 1998. B. Price and P. Messinger. Optimal recommendation sets: Covering uncertainty over user preferences. In National Conference on Artiﬁcial Intelligence (AAAI), pages 541–548. AAAI Press AAAI Press / The MIT Press, 2005. P. Pu and L. Chen. Trust building with explanation interfaces. In IUI ’06: Proceedings of the 11th International Conference on Intelligent User Interfaces, 2006. P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, 40(3), 1997. C. J. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann, 1979. G. Salton. The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1971. S. L. Salzberg. On comparing classiﬁers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3), 1997. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Analysis of recommendation algorithms for ecommerce. In EC ’00: Proceedings of the 2nd ACM conference on Electronic commerce, 2000. J. B. Schafer, J. Konstan, and J. Riedi. Recommender systems in e-commerce. In EC ’99: Proceedings of the 1st ACM conference on Electronic commerce, 1999. A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In SIGIR ’02: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002. G. Shani, D. Heckerman, and R. I. Brafman. An mdp-based recommender system. Journal of Machine Learning Research, 6:1265–1295, 2005. M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36(1):111–147, 1974. 2961 G UNAWARDANA AND S HANI E. M. Voorhees. The philosophy of information retrieval evaluation. In CLEF ’01: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of CrossLanguage Information Retrieval Systems, 2002a. E. M. Voorhees. Overview of trec 2002. In The 11th Text Retrieval Conference (TREC 2002), NIST Special Publication 500-251, pages 1–15, 2002b. C. N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversiﬁcation. In WWW ’05: Proceedings of the 14th International Conference on the World Wide Web, 2005. 2962