nips nips2010 nips2010-236 knowledge-graph by maker-knowledge-mining

236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information


Source: pdf

Author: Umar Syed, Ben Taskar

Abstract: We address the problem of semi-supervised learning in an adversarial setting. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm, and analyze its convergence. We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. [sent-4, score-0.62]

2 We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. [sent-5, score-0.602]

3 We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. [sent-7, score-0.468]

4 1 Introduction Semi-supervised learning algorithms use both labeled and unlabeled examples. [sent-8, score-0.312]

5 Most theoretical analyses of semi-supervised learning assume that m + n labeled examples are drawn i. [sent-9, score-0.391]

6 The workers who label the data are often poorly motivated, and may deliberately skip examples that are difficult to correctly label. [sent-17, score-0.458]

7 In such a setting, a learning algorithm should not assume that the examples were labeled at random. [sent-18, score-0.388]

8 Additionally, in many semi-supervised learning settings, the partial label information is not provided on a per-example basis. [sent-19, score-0.355]

9 For example, in multiple instance learning [2], examples are presented to a learning algorithm in sets, with either zero or one positive examples per set. [sent-20, score-0.37]

10 In graph-based regularization [3], a learning algorithm is given information about which examples are likely to have the same label, but not necessarily the identity of that label. [sent-21, score-0.329]

11 Recently, there has been much interest in algorithms that learn from labeled features [4]; in this setting, the learning algorithm is given information about the expected value of several features with respect to the true distribution on labeled examples. [sent-22, score-0.465]

12 To summarize, in a typical semi-supervised learning problem, label information is often missing in an arbitrary fashion, and even when present, does not always have a simple form, like one label per example. [sent-23, score-0.76]

13 We derive our learning algorithm within a framework that is expressive enough to permit a very general notion of label information, allowing us to make minimal assumptions about which examples in a data set have been labeled, how they have been labeled, and why. [sent-25, score-0.59]

14 We also provide experimental results on several standard data sets, which show that our algorithm is effective and robust when the label information has been provided by “lazy” or “unhelpful” labelers. [sent-27, score-0.349]

15 Learning with this type of label noise is known to be quite difficult, and positive results often make quite restrictive assumptions about the underlying data distribution [6, 7]. [sent-29, score-0.391]

16 By contrast, our results apply far more generally, at the expense of assuming a more benign (but possibly more realistic) model of label noise, where the labeler can adversarially erase labels, but not change them. [sent-30, score-0.982]

17 In other words, we assume that the labeler equivocates, but does not lie. [sent-31, score-0.606]

18 The difference in these assumptions shows up quite clearly in our analysis: As we point out in Section 3, our bounds become vacuous if the labeler is allowed to mislabel data. [sent-32, score-0.769]

19 In Section 2 we describe how our framework encodes label information in a label regularization function, which closely resembles the idea of a compatibility function introduced by Balcan & Blum [8]. [sent-33, score-0.801]

20 Let (ˆ , y) ∼ Dm be the m labeled training examples. [sent-51, score-0.271]

21 We make a much weaker assumption about what label information is available. [sent-54, score-0.375]

22 We assume that, after the labeled training set (ˆ , y) has been drawn, the learning algorithm is only given access to x ˆ ˆ the examples x and to a label regularization function R. [sent-55, score-0.921]

23 The function R encodes some information ˆ ˆ about the labels y of x, and is selected by a potentially adversarial labeler from a family R(ˆ , y). [sent-56, score-0.828]

24 x ˆ ˆ A label regularization function R maps each possible soft labeling q of the training examples x to a real number R(q) (a soft labeling is natual generalization of a labeling that we will define formally in a moment). [sent-57, score-1.536]

25 Except for knowing that R belongs to R(ˆ , y), the learner can make no assumptions x ˆ about how the labeler selects R. [sent-58, score-0.717]

26 We give examples of label regularization functions in Section 2. [sent-59, score-0.614]

27 A soft labeling q ∈ ∆m of the training examples x is a doubly-indexed vector, where q(i, y) is interpreted as the probability that example xi has label ˆ y ∈ Y. [sent-62, score-0.872]

28 The correct soft labeling has q(i, y) = 1{y = yi }, where the indicator function 1{·} is 1 ˆ ˆ when its argument is true and 0 otherwise; we overload notation and write y to denote the correct soft labeling. [sent-63, score-0.622]

29 Although the labeler is possibly adversarial, the family R(ˆ , y) of label regularization functions x ˆ restricts the choices the labeler can make. [sent-64, score-1.727]

30 We are interested in designing learning algorithms that ˆ work well when each R ∈ R(ˆ , y) assigns a low value to the correct labeling y. [sent-65, score-0.272]

31 This is the sense in which label information is “missing” — it is difficult for any learning algorithm to distinguish among these minima. [sent-68, score-0.383]

32 2 We are interested in learning a parameterized model that predicts a label y given an example x. [sent-70, score-0.355]

33 ′ y ∈Y ˆ Given training examples x, label regularization function R, and loss function L, the goal of a learning algorithm is to find a parameter θ that minimizes the expected loss ED [L(θ, x, y)], where ED [·] denotes expectation with respect to (x, y) ∼ D. [sent-73, score-0.874]

34 Let Ex,q [f (x, y)] denote the expected value of f (x, y) when example x is chosen uniformly at ˆ ˆ random from the training examples x and — supposing that this is example xi — label y is chosen ˆ from the distribution q(i, ·). [sent-74, score-0.59]

35 Accordingly, Ex,ˆ [f (x, y)] denotes the expected value of f (x, y) when ˆy labeled example (x, y) is chosen uniformly at random from the labeled training examples (ˆ , y). [sent-75, score-0.622]

36 1 Examples of Label Regularization Functions To make the concept of a label regularization function more clear, we describe several well-known learning settings in which the information provided to the learning algorithm is less than the fully labeled training set. [sent-77, score-0.857]

37 In the supervised learning setting, the label of every example in the training set is revealed to the learner. [sent-80, score-0.437]

38 In this setting, the label regularization function family R(ˆ , y) x ˆ ˆ contains a single function Ry such that Ry (q) = 0 if q = y, and Ry (q) = ∞ otherwise. [sent-81, score-0.489]

39 ˆ ˆ ˆ In the semi-supervised learning setting, the labels of only some of the training examples are revealed. [sent-82, score-0.354]

40 In other words, RI (q) is zero ˆ ˆ if and only if the soft labeling q agrees with y on the examples in I. [sent-84, score-0.469]

41 This implies that RI (q) is independent of how q labels examples not in I — these are the examples whose labels are missing. [sent-85, score-0.476]

42 In the ambiguous learning setting [9, 10], which is a generalization of semi-supervised learning, ˆ ˆ the labeler reveals a label set Yi ⊆ Y for each training example xi such that yi ∈ Yi . [sent-86, score-1.176]

43 That is, ˆ ˆ for each training example, the learning algorithm is given a set of possibile labels the example can have (semi-supervised learning is the special case where each label set has size 1 or k). [sent-87, score-0.6]

44 , Ym ) be all the label sets revealed to the learner, there is a function RY ∈ R(ˆ , y) for x ˆ ˆ ˆ ˆ each possible Y such that RY (q) = 0 if supp(qi ) ⊆ Yi for all i ∈ [m] and RY (q) = ∞ otherwise. [sent-91, score-0.356]

45 ˆ Here qi q(i, ·) and supp(qi ) is the support of label distribution qi . [sent-92, score-0.433]

46 In other words, RY (q) is ˆ ˆ ˆ zero if and only if the soft labeling q is supported on the sets Y1 , . [sent-93, score-0.367]

47 The label regularization functions described above essentially give only local information; they specify, for each example in the training set, which labels are possible for that example. [sent-97, score-0.66]

48 In some cases, we may want to allow the labeler to provide more global information about the correct labeling. [sent-98, score-0.642]

49 One example of providing global information is Laplacian regularization, a kind of graph-based regularization [3] that encodes information about which examples are likely to have the same labels. [sent-99, score-0.296]

50 For any soft labeling q, let q[y] be the m-length vector whose ith component is q(i, y). [sent-100, score-0.332]

51 The Laplax x cian regularizer is defined to be RL (q) = y∈Y q[y]T L(ˆ )q[y], where L(ˆ ) is an m × m positive ˆ semi-definite matrix defined so that RL (q) is large whenever examples in x that are believed to have the same label are assigned different label distributions by q. [sent-101, score-0.816]

52 As noted by several authors 3 [4, 11, 12], it is often convenient for a labeler to provide information about the expected value of f (x, y) with respect to the true distribution. [sent-104, score-0.631]

53 This term penalizes soft labelings q which cause the expected value of f on the training set to deviate from b. [sent-106, score-0.316]

54 So, for instance, ambiguous learning can be combined with a Laplacian, and in this case the learner is given a label regularization function of the form RY (q)+RL (q). [sent-108, score-0.599]

55 ˆ Note that, in all the examples described above, while the correct labeling y is at or close to the minimum of each function R ∈ R(ˆ , y), there may be many labelings meeting this condition. [sent-110, score-0.511]

56 x ˆ Again, this is the sense in which label information is “missing”. [sent-111, score-0.321]

57 It is also important to note that we have only specified what information the labeler can reveal to the learner (some function from the set R(ˆ , y)), but we do not specify how that information is chosen x ˆ by the labeler (which function R ∈ R(ˆ , y)? [sent-112, score-1.278]

58 Using the notation defined above, most analyses of semi-supervised learning assume that RI is chosen be selecting a random subset I of the training examples [13, 14]. [sent-116, score-0.309]

59 If (ˆ , y) ∼ Dm then with probax ˆ bility at least 1 − δ for all parameters θ ∈ Θ and label regularization functions R ∈ R(ˆ , y) x ˆ y ED [L(θ, x, y)] ≤ max (Ex,q [L(θ, x, y)] − R(q)) + R(ˆ ) + ǫ(δ, m). [sent-128, score-0.477]

60 As we will see, the existence of a matching lower bound depends strongly on the structure of the label regularization function family R. [sent-130, score-0.561]

61 Note that, given a labeled training set (x, y), the set R(x, y) essentially constrains what information the labeler can reveal to the learning algorithm, thereby encoding our assumptions about how the labeler will behave. [sent-131, score-1.587]

62 x ˜ Recall that all the label regularization functions described in Section 2. [sent-137, score-0.477]

63 It is easy to verify that all the examples of label regularization function families given in Section 2. [sent-142, score-0.624]

64 Also note that Assumption 1 allows the finite part of R (denoted by F ) to depend on the entire soft labeling q in a basically arbitrarily manner. [sent-144, score-0.332]

65 We write h to denote a labeling function that maps examples X to labels Y. [sent-146, score-0.469]

66 Also, for any labeling function h and unlabeled training set x ∈ X m , we let h(x) ∈ Y m denote the vector of labels whose ith component is h(xi ). [sent-147, score-0.474]

67 Let px be an N -length vector that represents unlabeled training set x as a distribution on |{j : xj =˜i }| x X , whose ith component is px (i) . [sent-148, score-0.323]

68 For any labeling function h∗ and unlabeled training sets x, x′ such that px − px′ ∞ ≤ γ the following holds: For all R ∈ R(x, h∗ (x)) there exists R′ ∈ R(x′ , h∗ (x′ )) such that R(h(x)) < ∞ if and only if R′ (h(x′ )) < ∞, for all labeling functions h. [sent-151, score-0.712]

69 For all labeled training sets (x, y) and R ∈ R(x, y), if R(y′ ) < ∞ then R ∈ R(x, y′ ). [sent-154, score-0.306]

70 We argue it is necessary for two reasons: Firstly, all the examples of label regularization function families given in Section 2. [sent-156, score-0.624]

71 Let A be a (possibly randomized) learning algorithm ˆ that takes a set of unlabeled training examples x and a label regularization function R as input, and ˆ outputs an estimated parameter θ. [sent-159, score-0.821]

72 Also, if under distribution D each example x ∈ X is associated with exactly one label h∗ (x) ∈ Y, then we write D = DX · h∗ , where the data distribution DX is the marginal distribution of D on X . [sent-160, score-0.35]

73 Theorem 2 proves the existence of a true labeling function h∗ such that a nearly tight lower bound holds for all learning algorithms A and all data distributions DX whenever the training set is drawn from DX · h∗ . [sent-161, score-0.445]

74 Suppose Assumptions 1, 2 and 3 hold for label regularization function family R, the loss function L is 0-1 loss, and the set of all possible examples X is finite. [sent-165, score-0.697]

75 ˆ ED [L(θ, x, y)] ≥ Obviously, Assumptions 1, 2 and 3 restrict the kinds of label regularization function families to which Theorem 2 can be applied. [sent-167, score-0.487]

76 Therefore, these bounds y are asymptotically matching if the labeler always chooses a label regularization function R such ˆ that R(ˆ ) = minq R(q). [sent-175, score-1.161]

77 , the correct labeling of the training set is not possible under R), our upper bound is vacuous. [sent-182, score-0.404]

78 In this sense, our framework is best suited to settings in which the information provided by the labeler is equivocal, but not actually untruthful, as it is in the malicious label noise setting [6, 7]. [sent-183, score-1.075]

79 4 Algorithm ˆ Given the unlabeled training examples x and label regularization function R, the bounds in Section 3 suggest an obvious learning algorithm: Find a parameter θ ∗ that realizes the minimum min max (Ex,q [L(θ, x, y)] − R(q)) + α θ ˆ m θ q∈∆ 2 . [sent-185, score-0.909]

80 In order to estimate θ ∗ , throughout this section we make the following assumption about the loss function L and label regularization function R. [sent-188, score-0.576]

81 The loss function L is convex in θ, and the label regularization function R is convex in q. [sent-190, score-0.522]

82 It is easy to verify that all of the loss functions and label regularization functions we gave as examples in Sections 2 and 2. [sent-191, score-0.711]

83 Instead of finding θ ∗ directly, our approach will be to “swap” the min and max in (1), find the soft labeling q∗ that realizes the maximum, and then use q∗ to compute θ ∗ . [sent-193, score-0.371]

84 In the first step of Algorithm 1, we modify the objective (1) by swapping the min and max, and then ˜ find a soft labeling q that approximately maximizes this modified objective. [sent-198, score-0.332]

85 In the next step, we 6 ˜ find a parameter θ that approximately minimizes the original objective with respect to the fixed soft ˜ labeling q. [sent-199, score-0.332]

86 In all of our experiments, we labeled a fraction of the training examples sets in a non-random manner that was designed to simulate various types of difficult — even adversarial — labelers. [sent-217, score-0.572]

87 For several values of p ∈ [0, 1] and for each training set, we labeled only the p-fraction of examples with the highest outlier score. [sent-221, score-0.434]

88 In this way, we simulated an “unhelpful” labeler who only labels examples that are exceptions to the general rule, thinking (perhaps sincerely, but erroneously) that this is the most effective use of her effort. [sent-222, score-0.844]

89 4 Fraction of training set labeled 80 Transductive SVM Laplacian SVM Game 70 60 50 40 80 Accuracy 90 60 Accuracy Accuracy 70 60 40 Uniform EM Game 20 0. [sent-243, score-0.271]

90 fraction of partially labeled data for Faces in the Wild data set. [sent-256, score-0.294]

91 label y with respect to their distance, in feature space, from the centroid of the cluster of examples with true label y ′ . [sent-258, score-0.804]

92 For several values of p ∈ [0, 1], we added the label y ′ to the top p-fraction of this list. [sent-259, score-0.321]

93 The net effect of this procedure is that examples on the “border” of the two clusters are given both labels y and y ′ in the training set. [sent-260, score-0.32]

94 The idea behind this labeling procedure is to mimic a (realistic, in our view) situation where a “lazy” labeler declines to commit to one label for those examples that are especially difficult to distinguish. [sent-261, score-1.266]

95 The results of our experiments indicate that, for certain labeling styles, as the fraction of fully labeled examples decreases, the GAME algorithm’s pessimistic approach is substantially more effective. [sent-267, score-0.633]

96 Importantly, Figures 1(a)-(c) show that the GAME algorithm’s performance advantage is most significant when the number of labeled examples is very small. [sent-268, score-0.326]

97 One can show that both steps of the GAME algorithm can be implemented efficiently even when the number of labels is combinatorial, provided that both the loss function and label regularization function decompose appropriately over the structure. [sent-273, score-0.651]

98 Another possibility is to interactively poll the labeler for label information, resulting in a sequence of successively more informative label regularization functions, with the aim of extracting the most useful label information from the labeler with a minimum of labeling effort. [sent-274, score-2.539]

99 Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. [sent-289, score-0.312]

100 A PAC-style model for learning from labeled and unlabeled data. [sent-311, score-0.312]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('labeler', 0.606), ('label', 0.321), ('labeling', 0.202), ('labeled', 0.189), ('game', 0.178), ('ry', 0.17), ('examples', 0.137), ('soft', 0.13), ('regularization', 0.13), ('labelings', 0.104), ('labels', 0.101), ('reciprocity', 0.097), ('dm', 0.091), ('unlabeled', 0.089), ('missing', 0.084), ('laplacian', 0.084), ('malicious', 0.083), ('ben', 0.082), ('training', 0.082), ('px', 0.076), ('fraction', 0.075), ('ambiguous', 0.073), ('llike', 0.073), ('dx', 0.072), ('loss', 0.071), ('assumptions', 0.07), ('unhelpful', 0.064), ('ri', 0.06), ('wild', 0.059), ('minq', 0.059), ('theorem', 0.058), ('lazy', 0.058), ('qi', 0.056), ('faces', 0.055), ('adversarially', 0.055), ('adversarial', 0.054), ('transductive', 0.054), ('assumption', 0.054), ('mislabel', 0.048), ('bounds', 0.045), ('upper', 0.044), ('supp', 0.043), ('styles', 0.043), ('bci', 0.043), ('klivans', 0.043), ('limm', 0.043), ('rocco', 0.043), ('columbia', 0.042), ('learner', 0.041), ('bound', 0.04), ('umar', 0.039), ('realizes', 0.039), ('maxq', 0.039), ('cssg', 0.039), ('settings', 0.039), ('family', 0.038), ('adam', 0.037), ('svm', 0.037), ('regularizer', 0.037), ('coil', 0.037), ('turk', 0.037), ('find', 0.036), ('families', 0.036), ('correct', 0.036), ('sets', 0.035), ('skewed', 0.035), ('amazon', 0.035), ('learning', 0.034), ('yi', 0.034), ('ym', 0.034), ('pictures', 0.033), ('syed', 0.033), ('minimum', 0.032), ('meet', 0.032), ('balcan', 0.032), ('lower', 0.032), ('analyses', 0.031), ('exponentiated', 0.031), ('mechanical', 0.031), ('partially', 0.03), ('nearly', 0.03), ('rl', 0.03), ('annual', 0.03), ('pessimistic', 0.03), ('write', 0.029), ('encodes', 0.029), ('taskar', 0.029), ('algorithm', 0.028), ('contained', 0.028), ('darpa', 0.027), ('michael', 0.027), ('bernhard', 0.027), ('setting', 0.026), ('outlier', 0.026), ('functions', 0.026), ('true', 0.025), ('face', 0.025), ('simplex', 0.025), ('ready', 0.025), ('chosen', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

Author: Umar Syed, Ben Taskar

Abstract: We address the problem of semi-supervised learning in an adversarial setting. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm, and analyze its convergence. We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. 1

2 0.1826652 151 nips-2010-Learning from Candidate Labeling Sets

Author: Jie Luo, Francesco Orabona

Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.

3 0.17261559 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

Author: Samy Bengio, Jason Weston, David Grangier

Abstract: Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including One-vs-Rest while being orders of magnitude faster. 1

4 0.13031918 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu

Abstract: We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum. 1

5 0.11753461 27 nips-2010-Agnostic Active Learning Without Constraints

Author: Alina Beygelzimer, John Langford, Zhang Tong, Daniel J. Hsu

Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1

6 0.11730785 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

7 0.093620487 94 nips-2010-Feature Set Embedding for Incomplete Data

8 0.093566962 177 nips-2010-Multitask Learning without Label Correspondences

9 0.092116363 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations

10 0.090782396 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

11 0.088384204 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

12 0.086660311 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

13 0.083108127 290 nips-2010-t-logistic regression

14 0.080918916 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification

15 0.079694234 39 nips-2010-Bayesian Action-Graph Games

16 0.074568212 22 nips-2010-Active Estimation of F-Measures

17 0.073500521 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

18 0.071574777 23 nips-2010-Active Instance Sampling via Matrix Partition

19 0.070421338 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

20 0.070317134 228 nips-2010-Reverse Multi-Label Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.224), (1, 0.059), (2, 0.12), (3, -0.087), (4, 0.008), (5, 0.067), (6, -0.116), (7, -0.087), (8, -0.006), (9, -0.09), (10, 0.021), (11, -0.016), (12, 0.035), (13, 0.036), (14, 0.038), (15, -0.04), (16, -0.039), (17, -0.053), (18, 0.002), (19, -0.077), (20, 0.015), (21, -0.031), (22, 0.079), (23, -0.077), (24, 0.068), (25, 0.085), (26, 0.011), (27, 0.027), (28, -0.079), (29, 0.074), (30, 0.15), (31, 0.083), (32, -0.02), (33, 0.054), (34, 0.045), (35, -0.047), (36, -0.059), (37, 0.109), (38, 0.052), (39, 0.124), (40, 0.081), (41, -0.178), (42, -0.005), (43, -0.008), (44, 0.014), (45, 0.017), (46, 0.056), (47, 0.05), (48, 0.061), (49, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94290465 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

Author: Umar Syed, Ben Taskar

Abstract: We address the problem of semi-supervised learning in an adversarial setting. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm, and analyze its convergence. We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. 1

2 0.68914557 151 nips-2010-Learning from Candidate Labeling Sets

Author: Jie Luo, Francesco Orabona

Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.

3 0.66810095 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

Author: Samy Bengio, Jason Weston, David Grangier

Abstract: Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including One-vs-Rest while being orders of magnitude faster. 1

4 0.6565246 177 nips-2010-Multitask Learning without Label Correspondences

Author: Novi Quadrianto, James Petterson, Tibério S. Caetano, Alex J. Smola, S.v.n. Vishwanathan

Abstract: We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1

5 0.59912628 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu

Abstract: We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum. 1

6 0.58631533 267 nips-2010-The Multidimensional Wisdom of Crowds

7 0.57700747 27 nips-2010-Agnostic Active Learning Without Constraints

8 0.55977637 15 nips-2010-A Theory of Multiclass Boosting

9 0.5354346 142 nips-2010-Learning Bounds for Importance Weighting

10 0.53056043 94 nips-2010-Feature Set Embedding for Incomplete Data

11 0.52615952 226 nips-2010-Repeated Games against Budgeted Adversaries

12 0.50316387 228 nips-2010-Reverse Multi-Label Learning

13 0.49865779 114 nips-2010-Humans Learn Using Manifolds, Reluctantly

14 0.49484119 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

15 0.49196309 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

16 0.48602194 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

17 0.47802606 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

18 0.47379953 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

19 0.47084445 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

20 0.47015083 191 nips-2010-On the Theory of Learnining with Privileged Information


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.059), (17, 0.016), (27, 0.044), (29, 0.193), (30, 0.111), (35, 0.022), (45, 0.212), (50, 0.04), (52, 0.034), (60, 0.04), (77, 0.05), (78, 0.064), (90, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9071539 107 nips-2010-Global seismic monitoring as probabilistic inference

Author: Nimar Arora, Stuart Russell, Paul Kidwell, Erik B. Sudderth

Abstract: The International Monitoring System (IMS) is a global network of sensors whose purpose is to identify potential violations of the Comprehensive Nuclear-Test-Ban Treaty (CTBT), primarily through detection and localization of seismic events. We report on the first stage of a project to improve on the current automated software system with a Bayesian inference system that computes the most likely global event history given the record of local sensor data. The new system, VISA (Vertically Integrated Seismological Analysis), is based on empirically calibrated, generative models of event occurrence, signal propagation, and signal detection. VISA exhibits significantly improved precision and recall compared to the current operational system and is able to detect events that are missed even by the human analysts who post-process the IMS output. 1

same-paper 2 0.82583892 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

Author: Umar Syed, Ben Taskar

Abstract: We address the problem of semi-supervised learning in an adversarial setting. Instead of assuming that labels are missing at random, we analyze a less favorable scenario where the label information can be missing partially and arbitrarily, which is motivated by several practical examples. We present nearly matching upper and lower generalization bounds for learning in this setting under reasonable assumptions about available label information. Motivated by the analysis, we formulate a convex optimization problem for parameter estimation, derive an efficient algorithm, and analyze its convergence. We provide experimental results on several standard data sets showing the robustness of our algorithm to the pattern of missing label information, outperforming several strong baselines. 1

3 0.81338072 93 nips-2010-Feature Construction for Inverse Reinforcement Learning

Author: Sergey Levine, Zoran Popovic, Vladlen Koltun

Abstract: The goal of inverse reinforcement learning is to find a reward function for a Markov decision process, given example traces from its optimal policy. Current IRL techniques generally rely on user-supplied features that form a concise basis for the reward. We present an algorithm that instead constructs reward features from a large collection of component features, by building logical conjunctions of those component features that are relevant to the example policy. Given example traces, the algorithm returns a reward function as well as the constructed features. The reward function can be used to recover a full, deterministic, stationary policy, and the features can be used to transplant the reward function into any novel environment on which the component features are well defined. 1

4 0.81116635 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1

5 0.78526056 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

Author: Sivan Sabato, Nathan Srebro, Naftali Tishby

Abstract: We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the γ-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the γ-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions. 1

6 0.78276902 243 nips-2010-Smoothness, Low Noise and Fast Rates

7 0.77895778 27 nips-2010-Agnostic Active Learning Without Constraints

8 0.77834213 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

9 0.77771157 63 nips-2010-Distributed Dual Averaging In Networks

10 0.77714229 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

11 0.77616549 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

12 0.77613091 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

13 0.77375424 265 nips-2010-The LASSO risk: asymptotic results and real world examples

14 0.77343667 155 nips-2010-Learning the context of a category

15 0.77298188 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

16 0.77143198 148 nips-2010-Learning Networks of Stochastic Differential Equations

17 0.77125669 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

18 0.77039778 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

19 0.76981139 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

20 0.76976424 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models