nips nips2011 nips2011-66 knowledge-graph by maker-knowledge-mining

66 nips-2011-Crowdclustering


Source: pdf

Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona

Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. [sent-2, score-1.056]

2 We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. [sent-3, score-1.102]

3 1 Introduction Outsourcing information processing to large groups of anonymous workers has been made easier by the internet. [sent-5, score-0.435]

4 Additionally, individuals, whether untrained or expert, might not agree on the criteria used to define categories and may not even agree on the number of categories that are present. [sent-17, score-0.248]

5 We explore the question of crowdsourcing clustering in two steps: (a) Reduce the problem to a number of independent HITs of reasonable size and assign them to a large pool of human workers (Section 2). [sent-20, score-0.591]

6 Our approach (Figure 1) is based on displaying small subsets of M images and asking workers to group them by means of mouse clicks. [sent-25, score-0.472]

7 We provide instructions that may cue workers to certain attributes but we do not provide the worker with category definitions or examples. [sent-26, score-0.927]

8 The worker groups the M items into clusters of his choosing, as many as he sees fit. [sent-27, score-0.865]

9 An item may be placed in its own cluster if it is unlike the others in the HIT. [sent-28, score-0.231]

10 edu 1 Annotators Pairwise labels Model / inference 6= Image set Data Items xi 6= GUI Vk Φk lt atbtjt “Atomic” clusters zi ⇡ Categories Wj τj 6= Annotators Pairwise Labels Figure 1: Schematic of Bayesian crowdclustering. [sent-31, score-0.358]

11 In each HIT (Section 2), the worker views a small subset of images on a GUI. [sent-33, score-0.555]

12 By associating (arbitrarily chosen) colors with sets of images the worker proposes a (partial) local clustering. [sent-34, score-0.555]

13 Each HIT thus produces multiple binary pairwise labels: each pair of images shown in the same HIT is placed by the worker either in the same category or in different categories. [sent-35, score-0.671]

14 Each image is viewed by multiple workers in different contexts. [sent-36, score-0.395]

15 increases super-linearly with the number of items), the resolution of the images (more images on the screen means that they will be smaller), and contextual information that may guide the worker to make more global category decisions (more images give a better context, see Section 4. [sent-41, score-0.763]

16 ) Partial clusterings on many M -sized subsets of the data from many different workers are thus the raw data on which we compute clustering. [sent-43, score-0.476]

17 A large body of work exists in the social sciences that makes use of human-provided similarity values defined between pairs of data items (e. [sent-45, score-0.22]

18 We do not expect workers to agree on their definitions of categories, or to be consistent in categorization when performing multiple HITs. [sent-51, score-0.427]

19 2 We assume that there are N total items (indexed by i), J workers (indexed by j), and H HITs (indexed by h). [sent-54, score-0.615]

20 The information obtained from workers is a set of binary variables L, with elements lt ∈ {−1, +1} indexed by a positive integer t ∈ {1, . [sent-55, score-0.481]

21 , J} indicates the worker that produced the label, and at ∈ {1, . [sent-62, score-0.478]

22 , N } indicate the two data items compared by the label. [sent-68, score-0.22]

23 If we simply seperate the items into disjoint sets, then it will be impossible to infer a clustering over the entire data set. [sent-75, score-0.322]

24 We will not know whether two items in different HITs are in the same cluster or not. [sent-76, score-0.399]

25 There must be some overlap or redundancy: data items must be members of multiple HITs. [sent-77, score-0.22]

26 In the other extreme, we could construct HITs such that each pair of items may be found in at least one HIT, so that every possible pairwise category relation is sampled. [sent-78, score-0.336]

27 This would be quite expensive for large number of items N , since the number of labels scales asymptotically as T ∈ Ω(N 2 ). [sent-79, score-0.275]

28 However, we expect a noisy transitive property to hold: if items a and b are likely to be in the same cluster, and items b and c are (not) likely in the same cluster, then items a and c are (not) likely to be in the same cluster as well. [sent-80, score-0.839]

29 The transitive nature of binary cluster relations should allow sparse sampling, especially when the number of clusters is relatively small. [sent-81, score-0.306]

30 2 The N items are first distributed deterministically among the HITs, so that there are M items V in each HIT. [sent-85, score-0.44]

31 Then the remaining M − M items in each HIT are filled by sampling without reV placement from the N − M items that are not yet allocated to the HIT. [sent-86, score-0.468]

32 We introduce an additional parameter R, which is the number of different workers that perform each constructed HIT. [sent-88, score-0.395]

33 The total number of HITs distributed to the crowdsourcing service is therefore H = R N V , and we impose the constraint that a worker can not perform the same HIT M more than once. [sent-89, score-0.548]

34 This problem is known as consensus clustering [6], clustering aggregation [7], or cluster ensembles [5]. [sent-95, score-0.53]

35 While some of these methods can work with partial input clusterings, most have not been demonstrated in situations where the input clusterings involve only a small subset of the total data items (M << N ), which is the case in our problem. [sent-96, score-0.328]

36 For example, suppose one worker groups objects into a cluster of tall objects and another of short objects, while a different worker groups the same objects into a cluster of red objects and another of blue objects. [sent-102, score-1.618]

37 Then, our method should recover four atomic clusters: tall red objects, short red objects, tall blue objects, and short blue objects. [sent-103, score-0.303]

38 The behavior of the two workers may then be summarized using a confusion table of the atomic clusters (see Section 3. [sent-104, score-0.773]

39 The first worker groups the first and third atomic cluster into one category and the second and fourth atomic cluster into another category. [sent-106, score-1.216]

40 The second worker groups the first and second atomic clusters into a category and the third and fourth atomic clusters into another category. [sent-107, score-1.112]

41 1 Generative Model We propose an approach in which data items are represented as points in a Euclidean space and workers are modeled as pairwise binary classifiers in this space. [sent-109, score-0.677]

42 Atomic clusters are then obtained by clustering these inferred points using a Dirichlet process mixture model, which estimates the number of clusters [8]. [sent-110, score-0.408]

43 Certain items may be inherently more difficult to categorize, in which case they may lie between clusters. [sent-112, score-0.22]

44 This method does not apply to our problem, since it involves binary labels applied to single data items rather than to pairs, and therefore requires that categories be defined a priori and agreed upon by all workers, which is incompatible with the crowdclustering problem. [sent-118, score-0.55]

45 We propose a probabilistic latent variable model that relates pairwise binary labels to hidden variables associated with both workers and images. [sent-119, score-0.512]

46 Symmetric matrix Wj ∈ RD×D with entries [Wj ]d1 d2 and bias τj ∈ R are used to define a pairwise binary classifier, explained in the next paragraph, that represents worker j’s labeling behavior. [sent-122, score-0.54]

47 The key term is the pairwise quadratic logistic regression likelihood that captures worker j’s tendency to label the pair of images at and bt with lt : 1 p(lt |xat , xbt , Wjt , τjt ) = (1) 1 + exp(−lt At ) 3 where we define the pairwise quadratic activity At = xTt Wjt xbt + τjt . [sent-125, score-1.001]

48 i µw and σ w are symmetric matrix variational mean and variance parameters associated with worker j j τ j, and µτ and σj are variational mean and variance parameters for the bias τj of worker j. [sent-144, score-1.132]

49 3 Worker Confusion Analysis As discussed in Section 3, we propose to understand a worker’s behavior in terms of how he groups atomic clusters into his own notion of categories. [sent-162, score-0.31]

50 ) We used 1354 images of birds from 10 species in the CUB-200 data set [16] (Table 1) and the 3845 images in the Stonefly9 data set [17] (Table 1) in order to compare our method quantitatively to other cluster aggregation methods. [sent-172, score-0.43]

51 5 1 bedroom suburb kitchen living room coast forest highway inside city mountain open country street tall building office 70 60 50 40 30 20 10 1 2 3 4 5 6 7 8 9 10 11 Inferred Cluster 1. [sent-184, score-0.764]

52 Left: Mean locations µx projected onto first two Fisher discriminant vectors, along i with cluster labels superimposed at cluster means mk . [sent-186, score-0.455]

53 Data items are colored according to their MAP label argmaxk qik . [sent-187, score-0.342]

54 ) Right: Confusion table between ground truth scene categories and inferred clusters. [sent-189, score-0.366]

55 The first cluster includes three indoor ground truth categories, the second includes forest and open country categories, and the third includes two urban categories. [sent-190, score-0.502]

56 (Right of line) Selected worker confusion matrices for Scenes experiment. [sent-221, score-0.586]

57 Worker 9 (left) makes distinctions that correspond closely to the atomic clustering. [sent-222, score-0.241]

58 Figure 2 (left) shows the mean locations of the data items µx learned from the Scene data set, i visualized as points in Euclidean space. [sent-225, score-0.22]

59 We find well seperated clusters whose labels k are displayed at their mean locations mk . [sent-226, score-0.254]

60 The points are colored according to argmaxk qik , which is item i’s MAP cluster assignment. [sent-227, score-0.328]

61 The cluster labels are sorted according to the number of assigned items, with cluster 1 being the largest. [sent-228, score-0.413]

62 The clusters are well seperated in the four dimensionsal 1 space (we give the average assignment entropy − N ik qik log qik in the figure title, which shows little cluster overlap. [sent-230, score-0.482]

63 Figure 2 (right) shows the confusion table between the ground truth categories and the MAP clustering. [sent-232, score-0.377]

64 We find that the MAP clusters often correspond to single ground truth categories, but they sometimes combine ground truth categories in reasonable ways. [sent-233, score-0.541]

65 3) associated with the 40 workers that performed the most HITs. [sent-237, score-0.395]

66 This matrix captures the worker’s tendency to label items from different atomic clusters as being in the same or different category. [sent-238, score-0.515]

67 Worker 9 makes relatively fine grained distinctions, including seperating clusters 1 and 9 that correspond to the indoor categories and the bedroom scenes, respectively. [sent-241, score-0.404]

68 Worker 45 combines clusters 5 and 8 which correspond to city street and highway scenes in addition to grouping together all indoor scene categories. [sent-242, score-0.485]

69 The finer grained distinctions made by worker 9 may be a result of performing more HITs (74) and seeing a larger number of images than worker 45, who performed 15 HITs. [sent-243, score-1.131]

70 Finally (far right), we find a worker whose labels do not align with the atomic clustering. [sent-244, score-0.676]

71 Figure 4 (top left) shows the number of HITs performed by each worker according to descending rank. [sent-246, score-0.478]

72 , the law of the vital few) [19] roughly holds: the top 20% most active workers perform nearly 80% of the work. [sent-250, score-0.427]

73 We wish to understand the extent to which the most active workers contribute to the results. [sent-251, score-0.427]

74 5 Top workers excluded Bottom workers excluded 2. [sent-255, score-0.79]

75 Left Top: Number of completed HITs by worker rank. [sent-266, score-0.478]

76 The top workers are removed one at a time, bottom ranked workers are removed in groups so that both curves cover roughly the same domain. [sent-269, score-0.83]

77 The most active workers do not dominate the results. [sent-270, score-0.427]

78 Right: Variation of Information between the inferred clustering and the ground truth categories on the Scene data set, as a function of sampling parameter V . [sent-271, score-0.451]

79 Quality is measured using Variation of Information between the inferred clustering and ground truth. [sent-298, score-0.227]

80 inferred MAP clustering and the ground truth categorization. [sent-300, score-0.299]

81 Removal of workers corresponds to moving from right to left on the x-axis, which indicates the number of HITs used to learn the model. [sent-303, score-0.395]

82 The results show that removing the large number of workers that do fewer HITs is more detrimental to performance than removing the relatively few workers that do a large number of HITs (given the same number of total HITs), indicating that the atomic clustering is learned from the crowd at large. [sent-304, score-1.078]

83 We compare our approach (Bayesian crowdclustering) to two existing clustering aggregation methods from the literature: consensus clustering by nonnegative matrix factorization (NMF) [21] and the cluster ensembles method of Strehl and Ghosh (S&G;) [5]. [sent-306, score-0.53]

84 NMF and S&G; require the number of inferred clusters to be provided as a parameter, and we set this to the number of ground truth categories. [sent-307, score-0.324]

85 To judge the benefit of modeling the characteristics of individual workers, we also compare against a variant of our model in which all HITs are treated as if they are performed by a single worker (Bayesian consensus. [sent-309, score-0.478]

86 In both cases, we instruct workers to categorize based on species. [sent-314, score-0.46]

87 Left: Confusion matrix and high confidence examples when running our method on images assigned to cluster one in the original experiment (Figure 2). [sent-334, score-0.256]

88 Right: Workers may find perceptually relevant distinctions not present in the ground truth categories. [sent-337, score-0.267]

89 1 Divisive Clustering As indicated by the confusion matrix in Figure 2 (right), our method results in clusters that correspond to reasonable categories. [sent-344, score-0.235]

90 We conjecture that this is a result of the limited context presented to the worker in each HIT. [sent-346, score-0.478]

91 When shown a set of M = 36 images consisting mostly of different types of outdoor scenes and a few indoor scenes, it is reasonable for a worker to consider the indoor scenes as a unified category. [sent-347, score-0.879]

92 However, if a HIT is composed purely of indoor scenes, a worker might draw finer distinctions between images of offices, kitchens, and living rooms. [sent-348, score-0.809]

93 To test this conjecture, we developed a hierarchical procedure in which we run Bayesian crowdclustering independently on images that are MAP assigned to the same cluster in the original Scenes experiment. [sent-349, score-0.407]

94 Figure 5 (left) shows the results on the indoor scenes assigned to original cluster 1. [sent-350, score-0.341]

95 We find that when restricted to indoor scenes, the workers do find the relevant distinctions and our algorithm accurately recovers the kitchen, living room, and office ground truth categories. [sent-351, score-0.794]

96 In Figure 5 (center) we ran the procedure on images from original cluster 4, which is composed predominantly of mountain scenes. [sent-352, score-0.305]

97 In Figure 5 (right) the workers divide a cluster into three subclusters that are perceptually relevant: they have organized them according to the number of cars present. [sent-354, score-0.598]

98 It is based on using a novel model of human clustering, as well as a novel machine learning method to aggregate worker annotations. [sent-356, score-0.502]

99 Modeling both data item properties and the workers’ annotation process and parameters appears to produce performance that is superior to existing clustering aggregation methods. [sent-357, score-0.235]

100 Can we model contextual effects, perhaps by modeling the way that humans “regularize” their categorical decisions depending on the number and variety of items present in the task? [sent-360, score-0.246]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('worker', 0.478), ('workers', 0.395), ('hits', 0.315), ('items', 0.22), ('cluster', 0.179), ('crowdclustering', 0.151), ('atomic', 0.143), ('wj', 0.132), ('clusters', 0.127), ('categories', 0.124), ('vecp', 0.121), ('hit', 0.12), ('confusion', 0.108), ('clustering', 0.102), ('distinctions', 0.098), ('xbt', 0.091), ('variational', 0.088), ('indoor', 0.087), ('lt', 0.086), ('clusterings', 0.081), ('tall', 0.08), ('wjt', 0.08), ('images', 0.077), ('xat', 0.076), ('scenes', 0.075), ('ground', 0.073), ('qik', 0.073), ('truth', 0.072), ('crowdsourcing', 0.07), ('highway', 0.069), ('living', 0.069), ('bedroom', 0.066), ('stone', 0.066), ('categorize', 0.065), ('jt', 0.064), ('consensus', 0.063), ('zi', 0.063), ('pairwise', 0.062), ('office', 0.06), ('suburb', 0.06), ('uk', 0.056), ('labels', 0.055), ('welinder', 0.054), ('country', 0.054), ('category', 0.054), ('inferred', 0.052), ('item', 0.052), ('kitchen', 0.052), ('aggregation', 0.051), ('mountain', 0.049), ('coast', 0.046), ('nmf', 0.046), ('birds', 0.046), ('scene', 0.045), ('crowd', 0.043), ('street', 0.043), ('mk', 0.042), ('eq', 0.042), ('caltech', 0.042), ('pietro', 0.042), ('bayesian', 0.041), ('groups', 0.04), ('strehl', 0.04), ('ghosh', 0.04), ('room', 0.04), ('city', 0.039), ('normal', 0.039), ('forest', 0.037), ('objects', 0.036), ('gomes', 0.034), ('variation', 0.033), ('pareto', 0.033), ('ensembles', 0.033), ('active', 0.032), ('categorization', 0.032), ('proxy', 0.031), ('seperated', 0.03), ('annotation', 0.03), ('logistic', 0.029), ('cj', 0.029), ('redundancy', 0.029), ('sampling', 0.028), ('partial', 0.027), ('map', 0.027), ('xi', 0.027), ('gui', 0.027), ('center', 0.026), ('categorical', 0.026), ('opinion', 0.026), ('ner', 0.026), ('label', 0.025), ('discovery', 0.025), ('vi', 0.024), ('factorized', 0.024), ('argmaxk', 0.024), ('perceptually', 0.024), ('utility', 0.024), ('euclidean', 0.024), ('attribute', 0.024), ('human', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 66 nips-2011-Crowdclustering

Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona

Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1

2 0.40456694 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems

Author: David R. Karger, Sewoong Oh, Devavrat Shah

Abstract: Crowdsourcing systems, in which tasks are electronically distributed to numerous “information piece-workers”, have emerged as an effective paradigm for humanpowered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such crowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers’ answers. We show that our algorithm significantly outperforms majority voting and, in fact, is asymptotically optimal through comparison to an oracle that knows the reliability of every worker. 1

3 0.25778168 72 nips-2011-Distributed Delayed Stochastic Optimization

Author: Alekh Agarwal, John C. Duchi

Abstract: We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible. In application to distributed optimization, we show n-node architectures whose optimization error in stochastic problems—in √ spite of asynchronous delays—scales asymptotically as O(1/ nT ), which is known to be optimal even in the absence of delays. 1

4 0.10849398 186 nips-2011-Noise Thresholds for Spectral Clustering

Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh

Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1

5 0.10634615 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

Author: Emin Orhan, Robert A. Jacobs

Abstract: Extensive evidence suggests that items are not encoded independently in visual short-term memory (VSTM). However, previous research has not quantitatively considered how the encoding of an item influences the encoding of other items. Here, we model the dependencies among VSTM representations using a multivariate Gaussian distribution with a stimulus-dependent mean and covariance matrix. We report the results of an experiment designed to determine the specific form of the stimulus-dependence of the mean and the covariance matrix. We find that the magnitude of the covariance between the representations of two items is a monotonically decreasing function of the difference between the items’ feature values, similar to a Gaussian process with a distance-dependent, stationary kernel function. We further show that this type of covariance function can be explained as a natural consequence of encoding multiple stimuli in a population of neurons with correlated responses. 1

6 0.10304049 54 nips-2011-Co-regularized Multi-view Spectral Clustering

7 0.088915631 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

8 0.082822472 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

9 0.082381248 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

10 0.077531174 258 nips-2011-Sparse Bayesian Multi-Task Learning

11 0.075017162 303 nips-2011-Video Annotation and Tracking with Active Learning

12 0.072213866 127 nips-2011-Image Parsing with Stochastic Scene Grammar

13 0.070531182 198 nips-2011-On U-processes and clustering performance

14 0.069855839 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

15 0.062748559 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

16 0.062050927 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

17 0.060444634 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

18 0.059075981 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments

19 0.059052803 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

20 0.058611449 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.183), (1, 0.075), (2, -0.061), (3, 0.047), (4, 0.017), (5, -0.06), (6, -0.055), (7, -0.073), (8, 0.023), (9, 0.055), (10, 0.012), (11, 0.039), (12, -0.025), (13, -0.221), (14, 0.167), (15, 0.112), (16, -0.086), (17, 0.012), (18, 0.315), (19, 0.131), (20, 0.076), (21, -0.159), (22, -0.038), (23, -0.26), (24, 0.205), (25, -0.245), (26, 0.148), (27, -0.108), (28, -0.016), (29, -0.122), (30, -0.001), (31, 0.077), (32, 0.08), (33, -0.032), (34, 0.038), (35, -0.065), (36, -0.015), (37, -0.075), (38, 0.005), (39, 0.03), (40, 0.028), (41, -0.027), (42, -0.084), (43, 0.101), (44, 0.022), (45, -0.006), (46, -0.048), (47, -0.002), (48, 0.005), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93511593 66 nips-2011-Crowdclustering

Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona

Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1

2 0.82907254 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems

Author: David R. Karger, Sewoong Oh, Devavrat Shah

Abstract: Crowdsourcing systems, in which tasks are electronically distributed to numerous “information piece-workers”, have emerged as an effective paradigm for humanpowered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such crowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give a new algorithm for deciding which tasks to assign to which workers and for inferring correct answers from the workers’ answers. We show that our algorithm significantly outperforms majority voting and, in fact, is asymptotically optimal through comparison to an oracle that knows the reliability of every worker. 1

3 0.60137117 72 nips-2011-Distributed Delayed Stochastic Optimization

Author: Alekh Agarwal, John C. Duchi

Abstract: We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible. In application to distributed optimization, we show n-node architectures whose optimization error in stochastic problems—in √ spite of asynchronous delays—scales asymptotically as O(1/ nT ), which is known to be optimal even in the absence of delays. 1

4 0.43946424 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

Author: Fabian L. Wauthier, Michael I. Jordan

Abstract: Biased labelers are a systemic problem in crowdsourcing, and a comprehensive toolbox for handling their responses is still being developed. A typical crowdsourcing application can be divided into three steps: data collection, data curation, and learning. At present these steps are often treated separately. We present Bayesian Bias Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all three. Most data curation methods account for the effects of labeler bias by modeling all labels as coming from a single latent truth. Our model captures the sources of bias by describing labelers as influenced by shared random effects. This approach can account for more complex bias patterns that arise in ambiguous or hard labeling tasks and allows us to merge data curation and learning into a single computation. Active learning integrates data collection with learning, but is commonly considered infeasible with Gibbs sampling inference. We propose a general approximation strategy for Markov chains to efficiently quantify the effect of a perturbation on the stationary distribution and specialize this approach to active learning. Experiments show BBMC to outperform many common heuristics. 1

5 0.304001 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

Author: Joshua T. Abbott, Katherine A. Heller, Zoubin Ghahramani, Thomas L. Griffiths

Abstract: How do people determine which elements of a set are most representative of that set? We extend an existing Bayesian measure of representativeness, which indicates the representativeness of a sample from a distribution, to define a measure of the representativeness of an item to a set. We show that this measure is formally related to a machine learning method known as Bayesian Sets. Building on this connection, we derive an analytic expression for the representativeness of objects described by a sparse vector of binary features. We then apply this measure to a large database of images, using it to determine which images are the most representative members of different sets. Comparing the resulting predictions to human judgments of representativeness provides a test of this measure with naturalistic stimuli, and illustrates how databases that are more commonly used in computer vision and machine learning can be used to evaluate psychological theories. 1

6 0.2885015 232 nips-2011-Ranking annotators for crowdsourced labeling tasks

7 0.28771883 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

8 0.27854729 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

9 0.278054 186 nips-2011-Noise Thresholds for Spectral Clustering

10 0.26027897 54 nips-2011-Co-regularized Multi-view Spectral Clustering

11 0.26010436 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments

12 0.25861162 95 nips-2011-Fast and Accurate k-means For Large Datasets

13 0.25696722 198 nips-2011-On U-processes and clustering performance

14 0.2503992 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

15 0.25038937 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

16 0.24595879 90 nips-2011-Evaluating the inverse decision-making approach to preference learning

17 0.24580489 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies

18 0.24563637 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

19 0.23659015 258 nips-2011-Sparse Bayesian Multi-Task Learning

20 0.2292631 141 nips-2011-Large-Scale Category Structure Aware Image Categorization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.021), (4, 0.05), (11, 0.209), (20, 0.039), (26, 0.026), (31, 0.14), (33, 0.064), (43, 0.054), (45, 0.081), (57, 0.046), (60, 0.021), (74, 0.048), (83, 0.036), (84, 0.012), (99, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83322692 66 nips-2011-Crowdclustering

Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona

Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1

2 0.80502558 73 nips-2011-Divide-and-Conquer Matrix Factorization

Author: Lester W. Mackey, Michael I. Jordan, Ameet Talwalkar

Abstract: This work introduces Divide-Factor-Combine (DFC), a parallel divide-andconquer framework for noisy matrix factorization. DFC divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorization algorithm, and combines the subproblem solutions using techniques from randomized matrix approximation. Our experiments with collaborative filtering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with this approach. Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.

3 0.77584171 18 nips-2011-Action-Gap Phenomenon in Reinforcement Learning

Author: Amir-massoud Farahmand

Abstract: Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an ˆ agent following the greedy policy π with respect to an action-value function Q, the ˆ ˆ ˆ performance loss E V ∗ (X) − V π (X) is upper bounded by O( Q − Q∗ 1+ζ ), ∞ in which ζ ≥ 0 is the parameter quantifying the action-gap regularity. For ζ > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms. 1

4 0.76765215 36 nips-2011-Analysis and Improvement of Policy Gradient Estimation

Author: Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama

Abstract: Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates. Finally, we demonstrate the usefulness of the improved PGPE method through experiments. 1

5 0.74341989 150 nips-2011-Learning a Distance Metric from a Network

Author: Blake Shaw, Bert Huang, Tony Jebara

Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1

6 0.67944932 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

7 0.67934012 180 nips-2011-Multiple Instance Filtering

8 0.67540687 75 nips-2011-Dynamical segmentation of single trials from population neural data

9 0.67488974 229 nips-2011-Query-Aware MCMC

10 0.66900033 241 nips-2011-Scalable Training of Mixture Models via Coresets

11 0.66593117 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

12 0.66390026 249 nips-2011-Sequence learning with hidden units in spiking neural networks

13 0.66236436 55 nips-2011-Collective Graphical Models

14 0.66154695 278 nips-2011-TD gamma: Re-evaluating Complex Backups in Temporal Difference Learning

15 0.66103011 227 nips-2011-Pylon Model for Semantic Segmentation

16 0.66098392 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search

17 0.6602273 221 nips-2011-Priors over Recurrent Continuous Time Processes

18 0.66002655 102 nips-2011-Generalised Coupled Tensor Factorisation

19 0.65945148 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

20 0.65941375 303 nips-2011-Video Annotation and Tracking with Active Learning