nips nips2010 nips2010-151 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 , when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. [sent-7, score-0.219]
2 However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. [sent-9, score-0.238]
3 Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. [sent-11, score-0.794]
4 Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. [sent-12, score-0.679]
5 The use of the labeling vectors provides a principled way not to exclude any information. [sent-13, score-0.395]
6 Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines. [sent-15, score-0.226]
7 Partial data, noise, missing labels and other similar common issues can make you deviate from this ideal situation, moving the learning scenario from supervised learning to semi-supervised learning [7, 26]. [sent-18, score-0.182]
8 Moreover, in real problems samples are often gathered in groups, and the intrinsic nature of the problem could be used to constrain the possible labels for the samples from the same group. [sent-24, score-0.217]
9 For example, we might have that two labels can not appear together in the same group or a label can appear only once in each group, as, for example, a specific face in an image. [sent-25, score-0.286]
10 Inspired by these scenarios, we focus on the general case where we have bags of instances, with each bag associated with a set of several possible labeling vectors, and among them only one is fully correct. [sent-26, score-0.782]
11 Each labeling vector consists of labels for each corresponding instance in the bag. [sent-27, score-0.551]
12 “sun in sky” and “car on street”), have been explored to prune down the labeling possibilities [14]. [sent-33, score-0.345]
13 Another similar task is to learn a face recognition system from images gathered from news websites or videos, using the associated text captions and video scripts [3, 8, 16, 13]. [sent-34, score-0.373]
14 These works use different approaches to integrate the constraints, such as that two faces in one image could not be associated with the same name [3], mouth motion and gender of the person [8], or modeling both names and action verbs jointly [16]. [sent-35, score-0.36]
15 Another problem is the multiple annotators scenario, where each data is associated with the labels given by independently hired annotators. [sent-36, score-0.211]
16 Our work is also closely related to the ambiguous labeling problem presented in [8, 15]. [sent-45, score-0.485]
17 Our framework generalizes them, to the cases where instances and possible labels come in the form of bags. [sent-46, score-0.334]
18 This particular generalization gives us a principled way for using different kinds of prior knowledge on instances and labels correlation, without hacking the learning algorithm. [sent-47, score-0.334]
19 More specifically, prior knowledge, such as pairwise constraints [21] and mutual exclusiveness of some labels, can be easily encoded in the labeling vectors. [sent-48, score-0.345]
20 Although several works have focused on integrating these weakly labeled information that are complementary to the labeled or unlabeled training data into existing algorithms, these approaches are usually computational expensive. [sent-49, score-0.193]
21 In both setups, several instances are grouped into bags, and their labels are not individually given but assigned to the bags directly. [sent-52, score-0.545]
22 However, contrary to our framework, in MIML noisy labeling is not allowed. [sent-53, score-0.37]
23 In other words, all the labels being assigned to the bags are assumed to be true. [sent-54, score-0.321]
24 Moreover, current MIL and MIML algorithms usually rely on a ‘key’ instance in the bag [1] or they transform each bag into single instance representation [25], while our algorithm makes an explicit effort to label every instance in a bag and to consider all of them during learning. [sent-55, score-1.095]
25 Hence, it has a clear advantage in problems where the bags are dense in labeled instances and instances in the same bag are independent, as opposed to the cases when several instances jointly represent a label. [sent-56, score-1.021]
26 Our algorithm is also related to Latent Structural SVMs [22], where the correct labels could be considered as latent variables. [sent-57, score-0.211]
27 In this section, we formalize the CLS setting, which is a generalization of the ambiguous labeling problem described in [17] from single instances to bags of instances. [sent-59, score-0.813]
28 In the CLS setting, the N training data are provided in the form {Xi , Zi }N , where Xi is a bag of i=1 Mi instances, Xi = {xi,m }Mi , and xi,m ∈ Rd , ∀ i = 1, . [sent-65, score-0.294]
29 The associated m=1 set of Li candidate labeling vectors is Zi = {zi,l }Li , where zi,l ∈ Y Mi , and Y = {1, . [sent-72, score-0.55]
30 In l=1 other words there are Li different combinations of Mi labels for the Mi instances in the i-th bag. [sent-76, score-0.334]
31 We assume that the correct labeling vector for Xi is present in Zi , while the other labeling vectors maybe partially correct or even completely wrong. [sent-77, score-0.804]
32 It is important to point out that this assumption is not equivalent to just associating Li candidate labels to each instance. [sent-78, score-0.271]
33 In fact, in this way we also encode explicitly the correlations between instances and their labels in a bag. [sent-79, score-0.334]
34 In the following we will assume that the labeling set Zi is given with the training set. [sent-81, score-0.374]
35 The problem would become the standard multiclass supervised learning if there is only one labeling vector in every labeling set Zi , i. [sent-85, score-0.753]
36 On the other hand, given a set of C labels, without any prior knowledge, a bag of Mi instances could have maximum C Mi labeling vectors, which becomes a clustering problem. [sent-88, score-0.829]
37 It is helpful to first define by X the generic bag of M instances {x1 , . [sent-92, score-0.452]
38 , zL } the generic set of candidate labeling vectors, and y = {y1 , . [sent-98, score-0.469]
39 We start by introducing the loss function that assumes the true label ym of each instance xm is known M ∆(zm , ym ) , ℓ∆ (z, y) = (1) m=1 where ∆(zm , ym ) is a non-negative loss function measuring how much we pay for having predicted zm instead of ym . [sent-105, score-1.11]
40 For example ∆(zm , ym ) can be defined as 1(zm = ym ), where 1 is the indicator function. [sent-106, score-0.298]
41 Hence, if the vector z is the predicted label for the bag, ℓ∆ (z, y) simply counts the number of misclassified instances in the bag. [sent-107, score-0.258]
42 However, the true labels are unknown, and we only have access to the set Z, knowing that the true labeling vector is in Z. [sent-108, score-0.492]
43 So we use a proxy of this loss function, and propose the ambiguous version of this loss: ℓA (z, Z) = min ℓ∆ (z, z ′ ) . [sent-109, score-0.205]
44 ∆ ′ z ∈Z We also define, with a small abuse of notation, ℓA (X , Z; f ) = ℓA (f (X ), Z), where f (X ) returns ∆ ∆ a labeling vector which consists of labels for each instance in the bag X . [sent-110, score-0.816]
45 3] to the bag case, and prove that ℓA /(1 − η) is an upper bound to ℓ∆ in expectation, where η is a factor ∆ between 0 and 1, and its value depends on the hardness of the problem. [sent-114, score-0.265]
46 Like the definition in [8], η corresponds to the maximum probability of an extra label co-occurring with the true label over all labels and instances. [sent-115, score-0.289]
47 Hence, minimizing the ambiguous loss we are actually minimizing an upper bound of the true loss. [sent-116, score-0.205]
48 We can now define F(X , y; w) = M m=1 F (xm , ym ), which intuitively is gathering from each instance in X the confidence on the labels in y. [sent-122, score-0.355]
49 Hence the function F can be defined as the scalar product between w and a joint feature map between the bag X and the labeling vector y. [sent-124, score-0.61]
50 If the prior probabilities of every candidate labeling vectors zl ∈ Z are also available, they could be incorporated by slightly modifying the feature mapping scheme in (2). [sent-126, score-0.613]
51 We can now introduce the following loss function ℓmax (X , Z; w) = max ℓA (¯, Z) + F(X , z ; w) − max F(X , z; w) ¯ ∆ z z ∈Z ¯/ z∈Z (3) + where |x|+ = max(0, x). [sent-127, score-0.167]
52 On the other hand, in the CLS setting the correct labeling vector for X is unknown, but it is known to be a member of the candidate set Z. [sent-142, score-0.534]
53 Hence we could maximize the log-likelihood ratio between P (Z|X ; θ) and the most likely incorrect labeling vector which is not member of Z (denoted as z ). [sent-143, score-0.41]
54 maxz∈Z P (¯|X ; θ) z maxz∈Z P (¯|X ; θ) z ¯/ ¯/ (4) If we assume independence between the instances in the bag, (4) can be factorized as: − log maxz∈Z maxz∈Z ¯/ P (zm |xm ; θ) = max z z ∈Z ¯/ m P (¯m |xm ; θ) m log P (zm |xm ; θ) . [sent-148, score-0.238]
55 To convexify this problem, one could approximate the second max(·) in (3) with the average over all the labeling vectors in Zi . [sent-151, score-0.427]
56 However, the approximation could be very loose if the number of labeling vectors is large. [sent-153, score-0.427]
57 4 CCCP replaces maxz∈Zi F(Xi , z; w) in the loss function by max F(Xi , z; w(r) ) + (w − w(r) ) · ∂ max F(Xi , z; w) z∈Zi . [sent-162, score-0.167]
58 2 Solve the convex optimization problem using the Pegasos framework In order to solve the relaxed convex optimization problem (8) efficiently at each round of the CCCP, we have designed a stochastic subgradient descent algorithm, using the Pegasos framework developed in [18]. [sent-170, score-0.184]
59 An upper bound on the radius of the ball in which the optimal hyperplane lives can be calculated by considering that λ ∗ w 2 2 2 ≤ min w λ w 2 2 2 + 1 N N ℓ(r) (Xi , Zi ; w) ≤ B cccp i=1 (r) ∗ where w is the optimal solution of (8), and B = maxi (ℓcccp (Xi , Zi ; 0)). [sent-173, score-0.218]
60 If we use ∆(zm , ym ) = 1(zm =ym ) in (7), B equals the maximum number of instances in the bags. [sent-174, score-0.361]
61 Greedily searching for the most violating labeling vector zk in line 4 of Algorithm 2 can be comˆ putational expensive. [sent-180, score-0.446]
62 In the general situation, the Mi worst case complexity of searching the maximum of z ∈ Zi is O( m=1 Ci,m ), where Ci,m is the ¯/ number of unique possible labels for xi,m in Zi (usually Ci,m ≪ Li ). [sent-183, score-0.177]
63 This complexity can be greatly reduced when there are special structures such as graphs and trees in the labeling set. [sent-184, score-0.345]
64 Finally, we test our algorithm on one of the examples motivating our study — learning a face recognition system from news images weakly annotated by their associated captions. [sent-193, score-0.247]
65 We benchmark MMS against the following baselines: • SVM: we train a fully-supervised SVM classifier using the ground-truth labels by considering every instance separately while ignoring the other candidate labels. [sent-194, score-0.33]
66 • CL-SVM: the Candidate Labeling SVM (CL-SVM) is a naive approach which transforms the ambiguous labeled data into a standard supervised representation by treating all possible labels of each instance as true labels. [sent-197, score-0.435]
67 Then it learns 1-vs-All SVM classifiers from the resulting dataset, where the negative examples are instances which do not have the corresponding label in their candidate labeling set. [sent-198, score-0.727]
68 We train the MIML algorithms by treating the labels in Zi as a label for the bag. [sent-201, score-0.218]
69 During the test phase, we consider each instance separately and predict the labels as: y = arg maxy∈Y Fmiml (x, y), where Fmiml is the obtained classifier, and Fmiml (x, y) can be interpreted as the confidence of the classifier in assigning the instance x to the class y. [sent-202, score-0.265]
70 We would like to underline that although some of the experimental setups may favor our algorithm, we include the comparison between MMS and MIML algorithms because to the best of our knowledge it is the only existing principle framework for modeling instance bags with multiple labels. [sent-203, score-0.227]
71 MIML algorithms may still have their own advantage in scenarios when no prior knowledge is available about the instances within a bag. [sent-204, score-0.187]
72 The artificial training sets are created as follows: we first set at random pairs of classes as “correlated classes”, and as “ambiguous classes”, where the ambiguous classes can be different from the correlated classes. [sent-225, score-0.265]
73 Following that, instances are grouped randomly into bags of fixed size B with probability at least Pc that two instances from correlated classes will appear in the same bag. [sent-226, score-0.616]
74 Then L ambiguous labeling vectors are created for each bag, by modifying a few elements of the correct labeling vector. [sent-227, score-0.941]
75 , B}, and the new labels are chosen among a predefined ambiguous set. [sent-231, score-0.287]
76 The ambiguous set is composed by the other correct labels from the same bag (except the true one) and a subset of the ambiguous pairs of all the correct labels from the bag. [sent-232, score-0.903]
77 The probability of whether the ambiguous pair of a label is present equals Pa . [sent-233, score-0.236]
78 For example, when Pa > 0, noisy labels are likely to be present in the labeling set. [sent-236, score-0.517]
79 If Pc is large, instances from two correlated classes are likely to be grouped into the same bag, thus it becomes more difficult to distinguish between these two classes. [sent-238, score-0.288]
80 Second, the change on performance of MMS is small when the size of the candidate labeling set grows. [sent-250, score-0.469]
81 Moreover, when correlated instances and extra noisy labels are present in the dataset, the baseline methods’ performance drops by several percentages, while MMS is less affected. [sent-251, score-0.391]
82 This behavior also proves that approximating the second max(·) function in the loss function (3) with the average over all the possible labeling vectors can lead to poor performance. [sent-253, score-0.46]
83 2 Applications to learning from images & captions A huge amount of images with accompanying text captions are available on the web. [sent-255, score-0.382]
84 Thanks to the recent developments in the computer vision and natural language processing fields, faces in the images can be detected by a face detector and names in the captions can be identified using a language parser. [sent-259, score-0.465]
85 z1 Z: » z2 z3 z4 z5 z6 na nb na ◦ ◦ nb ◦ na nb ◦ nb na – ← facea ← faceb Figure 2: (Left): An example image and its associated caption. [sent-263, score-0.633]
86 There are two detected faces facea and faceb and two names Barack Obama (na ) and Michelle Obama (nb ) from the caption. [sent-264, score-0.292]
87 (Right): The candidate labeling set for this image-captions pairs. [sent-265, score-0.469]
88 The labeling vectors are generated using the following constrains: i). [sent-266, score-0.395]
89 a face in the image can either be assigned with a name from its caption, or it possibly corresponds to none of them (a NULL class, denoted as ◦); ii) a face can be assigned to at most one name; iii) a name can be assigned to at most a face. [sent-267, score-0.351]
90 Differently from previous methods, we do not allow the labeling vector with all the faces assigned to the NULL class, because it would lead to the trivial solution with 0 loss by classifying every instance as NULL. [sent-268, score-0.58]
91 This task is difficult due to the so called “correspondence ambiguity” problem: there could be more than one face and name appearing in the image-caption pairs, and not all the names in the caption appear in the image, and vice versa. [sent-281, score-0.335]
92 Since the names of the key persons in the image typically appear in the captions, combined with other common assumptions [3, 13], we can easily generate the candidate labeling sets (see Figure 2 for a practical example). [sent-283, score-0.597]
93 The dataset is fully annotated for association of faces in the image with names in the caption, precomputed facial features were also available with the dataset. [sent-286, score-0.269]
94 The experiments are performed over 5 different permutations, sampling 80% images and captions as training set, and using the rest for testing. [sent-290, score-0.22]
95 For all algorithms, NULL names are considered as an additional class, except for MIML algorithms where unknown faces can be automatically considered as negative instances. [sent-292, score-0.206]
96 5 Conclusion In this paper, we introduce the “Candidate Labeling Set” problem where training samples contain multiple instances and a set of possible labeling vectors. [sent-298, score-0.561]
97 In particular, our framework provides a principled way to encode prior knowledge about relationships between instances and labels, and these constraints are explicitly taken into account into the loss function by the algorithm. [sent-301, score-0.252]
98 It could be also possible to group separate instances into bags and solve the learning problem using MMS, when there are labeling constraints between these instances (e. [sent-303, score-0.892]
99 Multiple instance metric learning from automatically labeled bags of faces. [sent-412, score-0.254]
100 Who’s doing what: Joint modeling of names and verbs for simultaneous face and pose annotation. [sent-430, score-0.229]
wordName wordTfidf (topN-words)
[('labeling', 0.345), ('miml', 0.28), ('bag', 0.265), ('mms', 0.264), ('zi', 0.231), ('cccp', 0.218), ('maxz', 0.216), ('cls', 0.189), ('instances', 0.187), ('zm', 0.158), ('ym', 0.149), ('labels', 0.147), ('captions', 0.146), ('bags', 0.141), ('ambiguous', 0.14), ('names', 0.128), ('candidate', 0.124), ('xm', 0.096), ('pegasos', 0.094), ('mi', 0.087), ('covtype', 0.086), ('obama', 0.086), ('ci', 0.084), ('faces', 0.078), ('nb', 0.077), ('label', 0.071), ('zk', 0.071), ('mil', 0.069), ('face', 0.068), ('loss', 0.065), ('fmiml', 0.065), ('mimlsvm', 0.065), ('pc', 0.064), ('wt', 0.063), ('instance', 0.059), ('name', 0.058), ('xi', 0.057), ('svm', 0.054), ('labeled', 0.054), ('round', 0.054), ('na', 0.052), ('yahoo', 0.051), ('max', 0.051), ('vectors', 0.05), ('subgradient', 0.049), ('caption', 0.049), ('ambiguously', 0.049), ('margin', 0.046), ('images', 0.045), ('news', 0.045), ('barack', 0.043), ('facea', 0.043), ('faceb', 0.043), ('letter', 0.04), ('gathered', 0.038), ('li', 0.038), ('guillaumin', 0.038), ('liblinear', 0.038), ('michelle', 0.038), ('grouped', 0.037), ('dataset', 0.035), ('lj', 0.035), ('orabona', 0.035), ('supervised', 0.035), ('jmlr', 0.034), ('assigned', 0.033), ('member', 0.033), ('xk', 0.033), ('verbs', 0.033), ('annotators', 0.033), ('zl', 0.033), ('sponsored', 0.033), ('classes', 0.032), ('correlated', 0.032), ('could', 0.032), ('correct', 0.032), ('null', 0.031), ('pa', 0.031), ('jie', 0.031), ('associated', 0.031), ('searching', 0.03), ('weakly', 0.03), ('jin', 0.029), ('classi', 0.029), ('training', 0.029), ('modifying', 0.029), ('annotated', 0.028), ('tags', 0.028), ('multiclass', 0.028), ('convex', 0.027), ('setups', 0.027), ('libsvm', 0.027), ('relaxed', 0.027), ('effort', 0.026), ('formulation', 0.026), ('usps', 0.026), ('usually', 0.026), ('noisy', 0.025), ('equals', 0.025), ('coming', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
2 0.25613976 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
Author: Fuxin Li, Cristian Sminchisescu
Abstract: We propose an approach to multiple-instance learning that reformulates the problem as a convex optimization on the likelihood ratio between the positive and the negative class for each training instance. This is casted as joint estimation of both a likelihood ratio predictor and the target (likelihood ratio variable) for instances. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio. It is shown that likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. The likelihood ratio estimates provide a ranking of instances within a bag and are used as input features to learn a linear classifier on bags of instances. Instance-level classification is achieved from the bag-level predictions and the individual likelihood ratios. Experiments on synthetic and real datasets demonstrate the competitiveness of the approach.
3 0.21720135 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
4 0.1826652 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
Author: Umar Syed, Ben Taskar
Abstract: We address the problem of semi-supervised learning in an adversarial setting. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm, and analyze its convergence. We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. 1
5 0.17201683 235 nips-2010-Self-Paced Learning for Latent Variable Models
Author: M. P. Kumar, Benjamin Packer, Daphne Koller
Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition. 1
6 0.12766232 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
7 0.11756103 23 nips-2010-Active Instance Sampling via Matrix Partition
8 0.1168144 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
9 0.10794546 177 nips-2010-Multitask Learning without Label Correspondences
10 0.099306293 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
11 0.099254534 267 nips-2010-The Multidimensional Wisdom of Crowds
12 0.098679736 63 nips-2010-Distributed Dual Averaging In Networks
13 0.096812785 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
14 0.095981173 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
15 0.095036253 22 nips-2010-Active Estimation of F-Measures
16 0.094760142 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates
17 0.087659776 228 nips-2010-Reverse Multi-Label Learning
18 0.08452376 25 nips-2010-Active Learning by Querying Informative and Representative Examples
19 0.08320266 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
20 0.082113191 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics
topicId topicWeight
[(0, 0.228), (1, 0.084), (2, 0.081), (3, -0.164), (4, 0.011), (5, 0.087), (6, -0.116), (7, -0.103), (8, 0.032), (9, -0.194), (10, -0.012), (11, 0.012), (12, 0.069), (13, 0.088), (14, 0.015), (15, -0.042), (16, -0.169), (17, 0.006), (18, 0.097), (19, -0.056), (20, 0.044), (21, -0.1), (22, 0.046), (23, 0.07), (24, 0.077), (25, 0.162), (26, 0.099), (27, -0.168), (28, -0.145), (29, 0.045), (30, 0.066), (31, 0.134), (32, 0.097), (33, 0.072), (34, -0.086), (35, 0.108), (36, -0.009), (37, 0.064), (38, -0.116), (39, -0.028), (40, 0.054), (41, 0.01), (42, 0.034), (43, 0.017), (44, -0.011), (45, -0.012), (46, 0.053), (47, 0.007), (48, 0.016), (49, 0.092)]
simIndex simValue paperId paperTitle
same-paper 1 0.9407863 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
2 0.87123984 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
3 0.84957457 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
Author: Fuxin Li, Cristian Sminchisescu
Abstract: We propose an approach to multiple-instance learning that reformulates the problem as a convex optimization on the likelihood ratio between the positive and the negative class for each training instance. This is casted as joint estimation of both a likelihood ratio predictor and the target (likelihood ratio variable) for instances. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio. It is shown that likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. The likelihood ratio estimates provide a ranking of instances within a bag and are used as input features to learn a linear classifier on bags of instances. Instance-level classification is achieved from the bag-level predictions and the individual likelihood ratios. Experiments on synthetic and real datasets demonstrate the competitiveness of the approach.
4 0.61179674 267 nips-2010-The Multidimensional Wisdom of Crowds
Author: Peter Welinder, Steve Branson, Pietro Perona, Serge J. Belongie
Abstract: Distributing labeling tasks among hundreds or thousands of annotators is an increasingly important method for annotating large datasets. We present a method for estimating the underlying value (e.g. the class) of each image from (noisy) annotations provided by multiple annotators. Our method is based on a model of the image formation and annotation process. Each image has different characteristics that are represented in an abstract Euclidean space. Each annotator is modeled as a multidimensional entity with variables representing competence, expertise and bias. This allows the model to discover and represent groups of annotators that have different sets of skills and knowledge, as well as groups of images that differ qualitatively. We find that our model predicts ground truth labels on both synthetic and real data more accurately than state of the art methods. Experiments also show that our model, starting from a set of binary labels, may discover rich information, such as different “schools of thought” amongst the annotators, and can group together images belonging to separate categories. 1
5 0.60828328 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
Author: Umar Syed, Ben Taskar
Abstract: We address the problem of semi-supervised learning in an adversarial setting. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm, and analyze its convergence. We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. 1
6 0.5334695 235 nips-2010-Self-Paced Learning for Latent Variable Models
7 0.52687901 177 nips-2010-Multitask Learning without Label Correspondences
8 0.50318271 228 nips-2010-Reverse Multi-Label Learning
9 0.46419039 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
10 0.46224394 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
11 0.45313945 22 nips-2010-Active Estimation of F-Measures
12 0.45279497 23 nips-2010-Active Instance Sampling via Matrix Partition
13 0.41670236 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
14 0.41323027 25 nips-2010-Active Learning by Querying Informative and Representative Examples
15 0.4018231 158 nips-2010-Learning via Gaussian Herding
16 0.40098944 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
17 0.39973247 1 nips-2010-(RF)^2 -- Random Forest Random Field
18 0.39658684 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification
19 0.39576754 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
20 0.38445652 155 nips-2010-Learning the context of a category
topicId topicWeight
[(13, 0.037), (17, 0.02), (27, 0.049), (30, 0.058), (35, 0.021), (45, 0.194), (50, 0.03), (52, 0.027), (60, 0.029), (77, 0.047), (78, 0.384), (90, 0.037)]
simIndex simValue paperId paperTitle
1 0.83336508 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics
Author: Thomas Peel, Sandrine Anthoine, Liva Ralaivola
Abstract: We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernsteintype inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions. 1
same-paper 2 0.79028225 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
3 0.76372737 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
Author: Alessandro Chiuso, Gianluigi Pillonetto
Abstract: We introduce a new Bayesian nonparametric approach to identification of sparse dynamic linear systems. The impulse responses are modeled as Gaussian processes whose autocovariances encode the BIBO stability constraint, as defined by the recently introduced “Stable Spline kernel”. Sparse solutions are obtained by placing exponential hyperpriors on the scale factors of such kernels. Numerical experiments regarding estimation of ARMAX models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization. 1
4 0.73315471 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman
Abstract: We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1
5 0.67690587 222 nips-2010-Random Walk Approach to Regret Minimization
Author: Hariharan Narayanan, Alexander Rakhlin
Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1
6 0.641922 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
7 0.62354356 25 nips-2010-Active Learning by Querying Informative and Representative Examples
8 0.60974282 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
9 0.59277302 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
10 0.58952051 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
11 0.58950245 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
12 0.5883317 23 nips-2010-Active Instance Sampling via Matrix Partition
13 0.57161576 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation
14 0.56260854 282 nips-2010-Variable margin losses for classifier design
15 0.56118792 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
16 0.56102651 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
17 0.55944675 243 nips-2010-Smoothness, Low Noise and Fast Rates
18 0.55765551 22 nips-2010-Active Estimation of F-Measures
19 0.55598825 1 nips-2010-(RF)^2 -- Random Forest Random Field
20 0.55242616 27 nips-2010-Agnostic Active Learning Without Constraints