nips nips2011 nips2011-66 nips2011-66-reference knowledge-graph by maker-knowledge-mining

66 nips-2011-Crowdclustering

Source: pdf

Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona

Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1

reference text

[1] A. Sorokin and D. A. Forsyth. Utility data annotation with Amazon Mechanical Turk. In Internet Vision, pages 1–8, 2008.

[2] Sudheendra Vijayanarasimhan and Kristen Grauman. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. In CVPR, 2011.

[3] Peter Welinder, Steve Branson, Serge Belongie, and Pietro Perona. The multidimensional wisdom of crowds. In Neural Information Processing Systems Conference (NIPS), 2010.

[4] J. B. Kruskal. Multidimensional scaling by optimizing goodness-of-ﬁt to a nonmetric hypothesis. PSym, 29:1–29, 1964.

[5] Alexander Strehl and Joydeep Ghosh. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583–617, 2002.

[6] Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52(1–2):91–118, 2003.

[7] Gionis, Mannila, and Tsaparas. Clustering aggregation. In ACM Transactions on Knowledge Discovery from Data, volume 1. 2007.

[8] A.Y. Lo. On a class of bayesian nonparametric estimates: I. density estimates. The Annals of Statistics, pages 351–357, 1984.

[9] I. Sutskever, R. Salakhutdinov, and J.B. Tenenbaum. Modelling relational data using bayesian clustered tensor factorization. Advances in Neural Information Processing Systems (NIPS), 2009.

[10] Hagai Attias. A variational baysian framework for graphical models. In NIPS, pages 209–215, 1999.

[11] Kenichi Kurihara, Max Welling, and Nikos Vlassis. Accelerated variational dirichlet process mixtures. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information o Processing Systems 19. MIT Press, Cambridge, MA, 2007.

[12] J. M. Bernardo and A. F. M. Smith. Bayesian Theory. Wiley, 1994.

[13] Tommi S. Jaakkola and Michael I. Jordan. A variational approach to Bayesian logistic regression models and their extensions, August 13 1996.

[14] Ryan Gomes, Peter Welinder, Andreas Krause, and Pietro Perona. Crowdclustering. Technical Report CaltechAUTHORS:20110628-202526159, June 2011.

[15] Li Fei-Fei and Pietro Perona. A Bayesian hierarchical model for learning natural scene categories. In CVPR, pages 524–531. IEEE Computer Society, 2005.

[16] P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. CaltechUCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology, 2010.

[17] G. Martinez-Munoz, N. Larios, E. Mortensen, W. Zhang, A. Yamamuro, R. Paasch, N. Payet, D. Lytle, L. Shapiro, S. Todorovic, et al. Dictionary-free categorization of very similar objects via stacked evidence trees. 2009.

[18] T. Berg, A. Berg, and J. Shih. Automatic attribute discovery and characterization from noisy web data. Computer Vision–ECCV 2010, pages 663–676, 2010.

[19] V. Pareto. Cours d’economie politique. 1896.

[20] M. Meila. Comparing clusterings by the variation of information. In Learning theory and Kernel machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003: proceedings, volume 2777, page 173. Springer Verlag, 2003.

[21] Tao Li, Chris H. Q. Ding, and Michael I. Jordan. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In ICDM, pages 577–582. IEEE Computer Society, 2007. 9