nips nips2010 nips2010-267 nips2010-267-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Welinder, Steve Branson, Pietro Perona, Serge J. Belongie
Abstract: Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought” amongst the annotators, and can group together images belonging to separate categories. 1
[1] A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. J. Roy. Statistical Society, Series C, 28(1):20–28, 1979. 1, 2, 5, 6
[2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009. 2
[3] D.M. Green and J.M. Swets. Signal detection theory and psychophysics. John Wiley and Sons Inc, New York, 1966. 5
[4] V.C. Raykar, S. Yu, L.H. Zhao, A. Jerebko, C. Florin, G.H. Valadez, L. Bogoni, and L. Moy. Supervised Learning from Multiple Experts: Whom to trust when everyone lies a bit. In ICML, 2009. 1, 2
[5] V.S. Sheng, F. Provost, and P.G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008. 1, 2
[6] P. Smyth, U. Fayyad, M. Burl, P. Perona, and P. Baldi. Inferring ground truth from subjective labelling of Venus images. NIPS, 1995. 1, 2
[7] R. Snow, B. O’Connor, D. Jurafsky, and A.Y. Ng. Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In EMNLP, 2008. 1, 2, 8
[8] A. Sorokin and D. Forsyth. Utility data annotation with amazon mechanical turk. In First IEEE Workshop on Internet Vision at CVPR’08, 2008. 1, 2
[9] M. Spain and P. Perona. Some objects are more equal than others: measuring and predicting importance. In ECCV, 2008. 1, 2
[10] L. von Ahn and L. Dabbish. Labeling images with a computer game. In SIGCHI conference on Human factors in computing systems, pages 319–326, 2004. 2
[11] L. von Ahn, B. Maurer, C. McMillen, D. Abraham, and M. Blum. reCAPTCHA: Human-based character recognition via web security measures. Science, 321(5895):1465–1468, 2008. 2
[12] Peter Welinder and Pietro Perona. Online crowdsourcing: rating annotators and obtaining costeffective labels. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (ACVHL), 2010. 1, 2, 3
[13] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, 2009. 1, 2, 5, 6
[14] T. D. Wickens. Elementary signal detection theory. Oxford University Press, United States, 2002. 5 9