nips nips2010 nips2010-267 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Welinder, Steve Branson, Pietro Perona, Serge J. Belongie
Abstract: Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought” amongst the annotators, and can group together images belonging to separate categories. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. [sent-4, score-0.71]
2 Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. [sent-10, score-0.616]
3 This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. [sent-11, score-0.781]
4 As some annotators are unreliable, the common wisdom is to collect multiple labels per exemplar and rely on “majority voting” to determine the correct label. [sent-18, score-0.745]
5 We propose a model for the annotation process with the goal of obtaining more reliable labels with as few annotators as possible. [sent-19, score-0.857]
6 It has been observed that some annotators are more skilled and consistent in their labels than others. [sent-20, score-0.728]
7 We postulate that the ability of annotators is multidimensional; that is, an annotator may be good at some aspects of a task but worse at others. [sent-21, score-1.207]
8 We describe an inference algorithm to estimate the properties of the data being labeled and the annotators labeling them. [sent-29, score-0.71]
9 We show on synthetic and real data that the model can be used to estimate data difficulty and annotator 1 (b) (a) zi specimen species location weather ! [sent-30, score-0.726]
10 viewpoint Ii xi β α σj θz annotators wj γ τj M camera zi images yij xi lij ij Ji N labels |Lij | Figure 1: (a) Sample MTurk task where annotators were asked to click on images of Indigo Bunting (described in Section 5. [sent-34, score-2.239]
11 The image is then transformed into a low-dimensional representation xi which captures the main attributes that are considered by annotators in labeling the image. [sent-38, score-0.908]
12 (c) Probabilistic graphical model of the entire annotation process where image formation is summarized by the nodes zi and xi . [sent-39, score-0.451]
13 The observed variables, indicated by shaded circles, are the index i of the image, index j of the annotators, and value lij of the label provided by annotator j for image i. [sent-40, score-0.812]
14 The annotation process is repeated for all i and for multiple j thus obtaining multiple labels per image with each annotator labeling multiple images (see Section 3). [sent-41, score-0.939]
15 In general, it has been found that many labels are of high quality [8], but a few sloppy annotators do low quality work [7, 12]; thus the need for efficient algorithms for integrating the labels from many annotators [5, 12]. [sent-45, score-1.477]
16 Methods for combining the labels from many different annotators have been studied before. [sent-47, score-0.728]
17 Dawid and Skene [1] presented a model for multi-valued annotations where the biases and skills of the annotators were modeled by a confusion matrix. [sent-48, score-0.746]
18 [4] considered annotator bias in the context of training binary classifiers with noisy labels. [sent-51, score-0.568]
19 Building on these works, our model goes a step further in modeling each annotator as a multidimensional classifier in an abstract feature space. [sent-52, score-0.586]
20 [13], who modeled both annotator competence and image difficulty, but did not consider annotator bias. [sent-54, score-1.294]
21 Our model generalizes [13] by introducing a high-dimensional concept of image difficulty and combining it with a broader definition of annotator competence. [sent-55, score-0.641]
22 By modeling annotator competence and image difficulty as multidimensional quantities, our approach achieves better performance on real data than previous methods and provides a richer output space for separating groups of annotators and images. [sent-57, score-1.479]
23 Competent annotators provide accurate and precise labels, while unskilled annotators provide inconsistent labels. [sent-59, score-1.353]
24 There is also the possibility of adversarial annotators assigning labels that are opposite to those assigned by competent annotators. [sent-60, score-0.792]
25 For example, when asked to label images containing ducks some annotators may be more aware of the distinction between ducks and geese while others may be more aware of the distinction between ducks, grebes, and cormorants (visually similar bird species). [sent-62, score-1.349]
26 Furthermore, different annotators may weigh errors differently; one annotator may be intolerant of false positives, while another is more optimistic and accepts the cost of a few false positives in order to get a higher detection rate. [sent-63, score-1.29]
27 Another way to think about xi is that it is a vector of visual attributes (beak shape, plumage color, tail length etc) that the annotator will consider when deciding on a label. [sent-74, score-0.639]
28 There are M annotators in total, and the set of annotators that label image i is denoted by Ji . [sent-76, score-1.459]
29 An annotator j ∈ Ji , selected to label image Ii , does not have direct access to xi , but rather to yij = xi + nij , a version of the signal corrupted by annotator-specific and image-specific “noise” nij . [sent-77, score-1.02]
30 The statistics of this noise are different from annotator to annotator and are parametrized by σj . [sent-80, score-1.096]
31 Each annotator is parameterized by a unit vector wj , which models the annotator’s individual weighting on each of these components. [sent-83, score-0.679]
32 In this ˆ way, wj encodes the training or expertise of the annotator in a multidimensional space. [sent-84, score-0.754]
33 If the signal is above the threshold, the annotator ˆ ˆ assigns a label lij = 1, and lij = 0 otherwise. [sent-86, score-0.878]
34 We assume a 2 Bernoulli prior on zi with p(zi = 1) = β, and that xi is normally distributed1 with variance θz , 2 p(xi | zi ) = N (xi ; µz , θz ), (2) where µz = −1 if zi = 0 and µz = 1 if zi = 1 (see Figure 2a). [sent-95, score-0.58]
35 The noisy version of the signal xi that annotator j sees, denoted by yij , is assumed to be generated by 2 2 a Gaussian with variance σj centered at xi , that is p(yij | xi , σj ) = N (yij ; xi , σj ) (see Figure 2b). [sent-98, score-1.004]
36 We assume that each annotator assigns the label lij according to a linear classifier. [sent-99, score-0.712]
37 (a) Labeling is modeled in a signal detection theory framework, where the signal yij that annotator j sees for image Ii is produced by one of two Gaussian distributions. [sent-108, score-0.852]
38 Depending on yij and annotator parameters wj and τj , the annotator labels 1 or 0. [sent-109, score-1.432]
39 Depending on the annotator j, noise nij is added to xi . [sent-115, score-0.645]
40 The three lower plots shows the noise distributions for three different annotators (A,B,C), with increasing “incompetence” σj . [sent-116, score-0.68]
41 The biases τj of the annotators are shown with the red bars. [sent-117, score-0.694]
42 out yij and put lij in direct dependence on xi , p(lij = 1 | xi , σj , τj ) = Φ ˆ wj , xi − τj ˆ ˆ σj , (3) where Φ(·) is the cumulative standardized normal distribution, a sigmoidal-shaped function. [sent-122, score-0.624]
43 For the prior on wj , we kept the center close to the origin to be initially pessimistic of the annotator competence, and to allow for adversarial annotators (mean 1, std 3). [sent-128, score-1.369]
44 Thus, to do inference, we need to optimize N M log p(xi | θz , β) + m(x, w, τ ) = i=1 M log p(wj | α) + j=1 log p(τj | γ) j=1 N [lij log Φ ( wj , xi − τj ) + (1 − lij ) log (1 − Φ ( wj , xi − τj ))] . [sent-133, score-0.548]
45 Then we fix (w, τ ) and optimize for x using gradient ascent, iterating between fixing the image parameters and annotator parameters back and forth. [sent-136, score-0.641]
46 4 number of annotators number of annotators number of annotators number of annotators Figure 3: (a) and (b) show the correlation between the ground truth and estimated parameters as the number of annotators increases on synthetic data for 1-d and 2-d xi and wj . [sent-138, score-3.678]
47 In signal detection theory, the sensitivity index, conventionally denoted d , is a measure of how well the annotator can discriminate the two values of zi [14]. [sent-146, score-0.709]
48 (7) s 2 + σ2 θ z j Thus, the lower σj , the better the annotator can distinguish between classes of zi , and the more “competent” he is. [sent-148, score-0.67]
49 Similarly, the “threshold”, which is a measure of annotator bias, can be computed by λ = − 1 Φ−1 (h) + Φ−1 (f ) . [sent-150, score-0.541]
50 A large positive λ means that the annotator attributes a high cost 2 to false positives, while a large negative λ means the annotator avoids false negative mistakes. [sent-151, score-1.164]
51 Some annotators may be more aware of the distinction between ducks and geese and others may be more aware of the distinction between ducks, grebes and cormorants. [sent-156, score-1.094]
52 One dimension represents image attributes that are useful in the distinction between ducks and geese, and the other dimension models parameters that are useful in distinction between ducks and grebes (see Figure 2c). [sent-158, score-0.58]
53 Presumably all annotators see the same attributes, signified by xi , but they use them differently. [sent-159, score-0.73]
54 The model can distinguish between annotators with preferences for different attributes, as shown in Section 5. [sent-160, score-0.666]
55 If there is a particular ground truth decision plane, (w , τ ), images Ii with xi close to the plane will be more difficult for annotators to label. [sent-163, score-0.97]
56 This is because the annotators see a noise corrupted version, yij , of xi . [sent-164, score-0.894]
57 How well the annotators can label a particular image depends on both the closeness of xi to the ground truth decision plane and the annotator’s “noise” level, σj . [sent-165, score-1.034]
58 Of course, if the annotator bias τj is far from the ground truth decision plane, the labels for images near the ground truth decision plane will be consistent for that annotator, but not necessarily correct. [sent-166, score-1.008]
59 1 Synthetic Data To explore whether the inference procedure estimates image and annotator parameters accurately, we tested our model on synthetic data generated according to the model’s assumptions. [sent-168, score-0.677]
60 Similar to the experimental setup in [13], we generated 500 synthetic image parameters and simulated between 4 and 20 annotators labeling each image. [sent-169, score-0.846]
61 We generated the annotator parameters by randomly sampling σj from a Gamma distribution (shape 1. [sent-171, score-0.557]
62 (b-d) The image difficulty parameters xi , annotator competence 2/s, and bias τj /s learned by our model are compared to the ground truth equivalents. [sent-176, score-0.955]
63 As can be seen from the figure, the model estimates the parameters accurately, with the accuracy increasing as the number of annotators labeling each image increases. [sent-190, score-0.81]
64 For comparison, we also tried three other methods on the same data: a simple majority voting rule for each image, the biascompetence model of [1], and the GLAD algorithm from [13]2 , which models 1-d image difficulty and annotator competence, but not bias. [sent-194, score-0.692]
65 In a separate experiment (not shown) we generated synthetic annotators with increasing bias parameters τj . [sent-196, score-0.729]
66 We found that GLAD performs worse than majority voting when the variance in the bias between different annotators is high (γ 0. [sent-197, score-0.744]
67 8); this was expected as GLAD does not model annotator bias. [sent-198, score-0.541]
68 The annotators were given a description and example photos of the two bird species. [sent-204, score-0.723]
69 Figure 3d shows how the performance varies as the number of annotators per image is increased. [sent-205, score-0.766]
70 We sampled a subset of the annotators for each image. [sent-206, score-0.666]
71 edu/˜jake/ x2 i A E B F C G H D x1 i Figure 5: Estimated image parameters (symbols) and annotator decision planes (lines) for the greeble ex- periment. [sent-213, score-0.74]
72 Our model learns two image parameter dimensions x1 and x2 which roughly correspond to color i i and height, and identifies two clusters of annotator decision planes, which correctly correspond to annotators primed with color information (green lines) and height information (red lines). [sent-214, score-1.457]
73 C and F are easy for all annotators, A and H are difficult for annotators that prefer height but easy for annotators that prefer color, D and E are difficult for annotators that prefer color but easy for annotators that prefer height, B and G are difficult for all annotators. [sent-216, score-2.809]
74 We used a total of 180 ellipse images, with rotation angle varying from 1-180◦ , and collected labels from 20 MTurk annotators for each image. [sent-219, score-0.78]
75 In this dataset, the estimated image parameters xi and annotator parameters wj are 1dimensional, where the magnitudes encode image difficulty and annotator competence respectively. [sent-220, score-1.611]
76 ˆ The results in Figure 4b-d show that annotator competence and bias vary among annotators. [sent-222, score-0.68]
77 Moreover, the figure shows that our model accurately estimates image difficulty, annotator competence, and annotator bias on data from real MTurk annotators. [sent-223, score-1.226]
78 Greeble Dataset: In the second experiment, annotators were shown pictures of “greebles” (see Figure 5) and were told that the greebles belonged to one of two classes. [sent-224, score-0.743]
79 Some annotators were told that the two greeble classes could be discriminated by height, while others were told they could be discriminated by color (yellowish vs. [sent-225, score-0.837]
80 This was done to explore the scenario in which annotators have different types of prior knowledge or abilities. [sent-227, score-0.666]
81 We used a total of 200 images with 20 annotators labeling each image. [sent-228, score-0.773]
82 The results in Figure 5 show that the model successfully learned two clusters of annotator decision surfaces, one (green) of which responds mostly to the first dimension of xi (color) and another (red) responding mostly to the second dimension of xi (height). [sent-231, score-0.696]
83 These two clusters coincide with the sets of annotators primed with the two different attributes. [sent-232, score-0.687]
84 Additionally, for the second attribute, we observed a few “adversarial” annotators whose labels tended to be inverted from their true values. [sent-233, score-0.742]
85 This was because the instructions to our color annotation task were ambiguously worded, so that some annotators had become confused and had inverted their labels. [sent-234, score-0.825]
86 ˆ Waterbird Dataset: The greeble experiment shows that our model is able to segregate annotators looking for different attributes in images. [sent-236, score-0.742]
87 For each image, we asked 40 annotators on MTurk if they could see a duck in the image (only Mallards and American Black Ducks are ducks). [sent-240, score-0.838]
88 The hypothesis 7 x2 i 1 2 3 x1 i Figure 6: Estimated image and annotator parameters on the Waterbirds dataset. [sent-241, score-0.641]
89 The annotators were asked to select images containing at least one “duck”. [sent-242, score-0.759]
90 The darkness of the lines is an indicator of wj : darker gray means the model estimated the annotator to be more competent. [sent-246, score-0.694]
91 was that some annotators would be able to discriminate ducks from the two other bird species, while others would confuse ducks with geese and/or grebes. [sent-249, score-1.127]
92 Interestingly, the first group of annotators was better at separating out Canada geese than Red-necked grebes. [sent-251, score-0.751]
93 There were also a few outlier annotators that did not provide answers consistent with any other annotators. [sent-253, score-0.666]
94 This is a common phenomenon on MTurk, where a small percentage of the annotators will provide bad quality labels in the hope of still getting paid [7]. [sent-254, score-0.728]
95 Given only binary labels of images from many different annotators, it is possible to infer not only the underlying class (or value) of the image, but also parameters such as image difficulty and annotator competence and bias. [sent-261, score-0.878]
96 Furthermore, the model represents both the images and the annotators as multidimensional entities, with different high level attributes and strengths respectively. [sent-262, score-0.808]
97 Experiments with images annotated by MTurk workers show that indeed different annotators have variable competence level and widely different biases, and that the annotators’ classification criterion is best modeled in multidimensional space. [sent-263, score-0.886]
98 Ultimately, our model can accurately estimate the ground truth labels by integrating the labels provided by several annotators with different skills, and it does so better than the current state of the art methods. [sent-264, score-0.918]
99 Furthermore, our findings suggest that annotators fall into different groups depending on their expertise and on how they perceive the task. [sent-267, score-0.711]
100 This could be used to select annotators that are experts on certain tasks and to discover different schools of thought on how to carry out a given task. [sent-268, score-0.685]
wordName wordTfidf (topN-words)
[('annotators', 0.666), ('annotator', 0.541), ('ducks', 0.159), ('yij', 0.15), ('lij', 0.144), ('wj', 0.138), ('zi', 0.129), ('competence', 0.112), ('annotation', 0.112), ('image', 0.1), ('geese', 0.085), ('mturk', 0.085), ('xi', 0.064), ('glad', 0.064), ('grebes', 0.064), ('images', 0.063), ('labels', 0.062), ('ground', 0.057), ('truth', 0.054), ('crowdsourcing', 0.053), ('ellipses', 0.048), ('multidimensional', 0.045), ('labeling', 0.044), ('birds', 0.043), ('bunting', 0.042), ('duck', 0.042), ('greeble', 0.042), ('greebles', 0.042), ('indigo', 0.042), ('bird', 0.04), ('competent', 0.04), ('plane', 0.039), ('height', 0.036), ('species', 0.036), ('culty', 0.035), ('ellipse', 0.034), ('attributes', 0.034), ('voting', 0.034), ('color', 0.033), ('distinction', 0.032), ('planes', 0.03), ('expertise', 0.03), ('asked', 0.03), ('annotations', 0.03), ('formation', 0.029), ('biases', 0.028), ('decision', 0.027), ('bias', 0.027), ('label', 0.027), ('ji', 0.027), ('nij', 0.026), ('false', 0.024), ('adversarial', 0.024), ('signal', 0.022), ('skills', 0.022), ('dawid', 0.021), ('necks', 0.021), ('primed', 0.021), ('sloppy', 0.021), ('unskilled', 0.021), ('waterbirds', 0.021), ('welinder', 0.021), ('nuisance', 0.021), ('told', 0.02), ('synthetic', 0.02), ('everything', 0.02), ('circles', 0.019), ('centered', 0.019), ('prefer', 0.019), ('aware', 0.019), ('whitehill', 0.019), ('raykar', 0.019), ('schools', 0.019), ('ahn', 0.019), ('discriminated', 0.019), ('green', 0.019), ('positives', 0.018), ('angle', 0.018), ('dif', 0.018), ('others', 0.018), ('majority', 0.017), ('pietro', 0.017), ('wisdom', 0.017), ('photos', 0.017), ('alarm', 0.017), ('accurately', 0.017), ('detection', 0.017), ('process', 0.017), ('services', 0.016), ('generated', 0.016), ('groups', 0.015), ('hit', 0.015), ('amazon', 0.015), ('estimated', 0.015), ('pictures', 0.015), ('mixture', 0.014), ('noise', 0.014), ('muri', 0.014), ('inverted', 0.014), ('marked', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 267 nips-2010-The Multidimensional Wisdom of Crowds
Author: Peter Welinder, Steve Branson, Pietro Perona, Serge J. Belongie
Abstract: Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought” amongst the annotators, and can group together images belonging to separate categories. 1
2 0.10890717 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
Author: Yang Wang, Greg Mori
Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.
3 0.099254534 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
4 0.077274948 149 nips-2010-Learning To Count Objects in Images
Author: Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1
5 0.072655953 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
6 0.056245923 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
7 0.052316472 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
8 0.0522721 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
9 0.051599815 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
10 0.048418269 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
11 0.046869747 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
12 0.045785554 63 nips-2010-Distributed Dual Averaging In Networks
13 0.043569606 155 nips-2010-Learning the context of a category
14 0.041088771 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
15 0.040334586 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
16 0.040298961 103 nips-2010-Generating more realistic images using gated MRF's
17 0.038125541 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
18 0.037326634 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
19 0.036340583 133 nips-2010-Kernel Descriptors for Visual Recognition
20 0.034478549 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
topicId topicWeight
[(0, 0.106), (1, 0.052), (2, -0.04), (3, -0.083), (4, -0.022), (5, 0.001), (6, -0.045), (7, -0.001), (8, 0.022), (9, -0.064), (10, -0.01), (11, -0.002), (12, -0.01), (13, 0.011), (14, 0.029), (15, -0.005), (16, -0.003), (17, -0.024), (18, 0.088), (19, 0.008), (20, 0.015), (21, -0.044), (22, 0.054), (23, -0.003), (24, 0.005), (25, 0.08), (26, 0.017), (27, -0.096), (28, -0.075), (29, 0.026), (30, 0.024), (31, 0.068), (32, 0.036), (33, 0.022), (34, -0.005), (35, 0.015), (36, 0.011), (37, 0.019), (38, 0.035), (39, 0.106), (40, -0.038), (41, -0.037), (42, -0.09), (43, 0.043), (44, 0.049), (45, 0.043), (46, -0.014), (47, -0.021), (48, 0.097), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.91198021 267 nips-2010-The Multidimensional Wisdom of Crowds
Author: Peter Welinder, Steve Branson, Pietro Perona, Serge J. Belongie
Abstract: Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought” amongst the annotators, and can group together images belonging to separate categories. 1
2 0.60512537 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
Author: Yang Wang, Greg Mori
Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.
3 0.58369398 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
4 0.52854854 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
5 0.52393436 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
6 0.52026612 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
7 0.49292928 149 nips-2010-Learning To Count Objects in Images
8 0.49235228 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
9 0.47845104 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
10 0.45668277 177 nips-2010-Multitask Learning without Label Correspondences
11 0.4418419 256 nips-2010-Structural epitome: a way to summarize one’s visual experience
12 0.416278 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
13 0.40928033 224 nips-2010-Regularized estimation of image statistics by Score Matching
14 0.40871531 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike
15 0.40810993 120 nips-2010-Improvements to the Sequence Memoizer
16 0.40718049 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
17 0.38254452 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake
18 0.38084289 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
19 0.37789333 17 nips-2010-A biologically plausible network for the computation of orientation dominance
20 0.3774628 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
topicId topicWeight
[(13, 0.029), (17, 0.028), (18, 0.331), (27, 0.086), (30, 0.035), (35, 0.028), (45, 0.184), (50, 0.054), (52, 0.023), (60, 0.024), (77, 0.026), (78, 0.023), (90, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.73817956 267 nips-2010-The Multidimensional Wisdom of Crowds
Author: Peter Welinder, Steve Branson, Pietro Perona, Serge J. Belongie
Abstract: Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought” amongst the annotators, and can group together images belonging to separate categories. 1
2 0.62755191 144 nips-2010-Learning Efficient Markov Networks
Author: Vibhav Gogate, William Webb, Pedro Domingos
Abstract: We present an algorithm for learning high-treewidth Markov networks where inference is still tractable. This is made possible by exploiting context-specific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closed-form weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many state-of-the-art Markov network structure learners. 1
3 0.62225121 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression
Author: Ling Huang, Jinzhu Jia, Bin Yu, Byung-gon Chun, Petros Maniatis, Mayur Naik
Abstract: Predicting the execution time of computer programs is an important but challenging problem in the community of computer systems. Existing methods require experts to perform detailed analysis of program code in order to construct predictors or select important features. We recently developed a new system to automatically extract a large number of features from program execution on sample inputs, on which prediction models can be constructed without expert knowledge. In this paper we study the construction of predictive models for this problem. We propose the SPORE (Sparse POlynomial REgression) methodology to build accurate prediction models of program performance using feature data collected from program execution on sample inputs. Our two SPORE algorithms are able to build relationships between responses (e.g., the execution time of a computer program) and features, and select a few from hundreds of the retrieved features to construct an explicitly sparse and non-linear model to predict the response variable. The compact and explicitly polynomial form of the estimated model could reveal important insights into the computer program (e.g., features and their non-linear combinations that dominate the execution time), enabling a better understanding of the program’s behavior. Our evaluation on three widely used computer programs shows that SPORE methods can give accurate prediction with relative error less than 7% by using a moderate number of training data samples. In addition, we compare SPORE algorithms to state-of-the-art sparse regression algorithms, and show that SPORE methods, motivated by real applications, outperform the other methods in terms of both interpretability and prediction accuracy.
4 0.59454799 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
Author: Fuxin Li, Cristian Sminchisescu
Abstract: We propose an approach to multiple-instance learning that reformulates the problem as a convex optimization on the likelihood ratio between the positive and the negative class for each training instance. This is casted as joint estimation of both a likelihood ratio predictor and the target (likelihood ratio variable) for instances. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio. It is shown that likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. The likelihood ratio estimates provide a ranking of instances within a bag and are used as input features to learn a linear classifier on bags of instances. Instance-level classification is achieved from the bag-level predictions and the individual likelihood ratios. Experiments on synthetic and real datasets demonstrate the competitiveness of the approach.
5 0.54702169 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
6 0.54695606 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
7 0.54512453 98 nips-2010-Functional form of motion priors in human motion perception
8 0.54415148 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
9 0.54358917 17 nips-2010-A biologically plausible network for the computation of orientation dominance
10 0.54245198 194 nips-2010-Online Learning for Latent Dirichlet Allocation
11 0.54203594 268 nips-2010-The Neural Costs of Optimal Control
12 0.54196197 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
13 0.54162449 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
14 0.54079312 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
15 0.5401383 20 nips-2010-A unified model of short-range and long-range motion perception
16 0.53944552 161 nips-2010-Linear readout from a neural population with partial correlation data
17 0.53926116 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
18 0.53919387 155 nips-2010-Learning the context of a category
19 0.53898311 103 nips-2010-Generating more realistic images using gated MRF's
20 0.53885812 158 nips-2010-Learning via Gaussian Herding