nips nips2013 nips2013-60 knowledge-graph by maker-knowledge-mining

60 nips-2013-Buy-in-Bulk Active Learning


Source: pdf

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. [sent-5, score-0.436]

2 This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. [sent-6, score-0.353]

3 In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. [sent-7, score-0.769]

4 We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. [sent-8, score-0.308]

5 In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i. [sent-9, score-0.505]

6 , buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. [sent-11, score-0.235]

7 1 Introduction In many practical applications of active learning, the cost to acquire a large batch of labels at once is significantly less than the cost of the same number of sequential rounds of individual label requests. [sent-12, score-0.792]

8 This is true for both practical reasons (overhead time for start-up, reserving equipment in discrete time-blocks, multiple labelers working in parallel, etc. [sent-13, score-0.049]

9 Consider making one vs multiple hematological diagnostic tests on an out-patient. [sent-17, score-0.063]

10 There is a fixed cost in setting up and running the microarray, but virtually no incremental cost as to the number of samples, just a constraint on the max allowed. [sent-21, score-0.197]

11 There is a fixed cost in forming the group (determine membership, contract, travel, etc. [sent-27, score-0.087]

12 For instance in a medical diagnosis case, the most cost-effective way to minimize diagnostic tests is purely sequential active learning, where each test may rule out a set of hypotheses (diagnoses) and informs the next test to perform. [sent-31, score-0.285]

13 But a patient suffering from a serious disease may worsen while sequential tests are being conducted. [sent-32, score-0.085]

14 Hence batch testing makes sense if the batch can be tested in parallel. [sent-33, score-0.313]

15 In general one can convert delay into a second cost factor and optimize for batch size that minimizes a combination of total delay and the sum of the costs for the individual tests. [sent-34, score-0.352]

16 Parallelizing means more tests would be needed, since we lack the benefit of earlier tests to rule out future ones. [sent-35, score-0.078]

17 In order to perform this 1 batch-size optimization we also need to estimate the number of redundant tests incurred by turning a sequence into a shorter sequence of batches. [sent-36, score-0.062]

18 For the reasons cited above, it can be very useful in practice to generalize active learning to activebatch learning, with buy-in-bulk discounts. [sent-37, score-0.218]

19 This paper developes a theoretical framework exploring the bounds and sample compelxity of active buy-in-bulk machine learning, and analyzing the tradeoff that can be achieved between the number of batches and the total number of queries required for accurate learning. [sent-38, score-0.419]

20 In scenarios such as those mentioned above, a batch mode active learning strategy is desirable, rather than a method that selects instances to be labeled one-at-a-time. [sent-40, score-0.409]

21 There have recently been several attempts to construct heuristic approaches to the batch mode active learning problem (e. [sent-41, score-0.387]

22 In contrast, there has recently been significant progress in understanding the advantages of fully-sequential active learning (e. [sent-45, score-0.2]

23 In the present work, we are interested in extending the techniques used for the fully-sequential active learning model, studying natural analogues of them for the batchmodel active learning model. [sent-48, score-0.426]

24 Formally, we are interested in two quantities: the sample complexity and the total cost. [sent-49, score-0.112]

25 The sample complexity refers to the number of label requests used by the algorithm. [sent-50, score-0.259]

26 We expect batch-mode active learning methods to use more label requests than their fully-sequential cousins. [sent-51, score-0.426]

27 On the other hand, if the cost to obtain a batch of labels is sublinear in the size of the batch, then we may sometimes expect the total cost used by a batch-mode learning method to be significantly less than the analogous fully-sequential algorithms, which request labels individually. [sent-52, score-0.788]

28 In the active learning protocol, the algorithm has direct access to the Xt sequence, but must request to observe each label Yt , sequentially. [sent-70, score-0.484]

29 The algorithm asks up to a specified number of label requests n (the budget), and then halts and returns a classifier. [sent-71, score-0.209]

30 We are particularly interested in determining, for a given algorithm, how large this number of label requests needs to be in order to guarantee small error rate with high probability, a value known as the label complexity. [sent-72, score-0.359]

31 In the present work, we are also interested in the cost expended by the algorithm. [sent-73, score-0.148]

32 Specifically, in this context, there is a cost function c : N → (0, ∞), and to request the labels {Yi1 , Yi2 , . [sent-74, score-0.323]

33 , Xim } at once requires the algorithm to pay c(m); we are then interested in the sum of these costs, over all batches of label requests made by the algorithm. [sent-80, score-0.361]

34 Depending on the form of the cost function, minimizing the cost of learning may actually require the algorithm to request labels in batches, which we expect would actually increase the total number of label requests. [sent-81, score-0.587]

35 2 To help quantify the label complexity and cost complexity, we make use of the following definition, due to [6, 7]. [sent-82, score-0.261]

36 [6, 7] Define the disagreement coefficient of h∗ as θ(ǫ) = sup r>ǫ 3 DX (DIS(B(h∗ , r))) . [sent-85, score-0.084]

37 r Buy-in-Bulk Active Learning in the Realizable Case: k-batch CAL We begin our anlaysis with the simplest case: namely, the realizable case, with a fixed prespecified number of batches. [sent-86, score-0.099]

38 We are then interested in quantifying the label complexity for such a scenario. [sent-87, score-0.2]

39 We first review a well-known method for active learning in the realizable case, refered to as CAL after its discoverers Cohn, Atlas, and Ladner [8]. [sent-90, score-0.334]

40 Return h = argminh∈C er(h; Q) The label complexity of CAL is known to be O (θ(ǫ)(d log(θ(ǫ)) + log(log(1/ǫ)/δ)) log(1/ǫ)) [7]. [sent-97, score-0.174]

41 One particularly simple way to modify this algorithm to make it batch-based is to simply divide up the budget into equal batch sizes. [sent-99, score-0.196]

42 If b > k, Return any h ∈ V We expect the label complexity of k-batch CAL to somehow interpolate between passive learning (at k = 1) and the label complexity of CAL (at k = n). [sent-117, score-0.448]

43 Indeed, the following theorem bounds the label complexity of k-batch CAL by a function that exhibits this interpolation behavior with respect to the known upper bounds for these two cases. [sent-118, score-0.174]

44 In the realizable case, for some λ(ǫ, δ) = O kǫ−1/k θ(ǫ)1−1/k (d log(1/ǫ) + log(1/δ)) , for any n ≥ λ(ǫ, δ), with probability at least 1 − δ, running k-batch CAL with budget n produces a ˆ ˆ classifier h with er(h) ≤ ǫ. [sent-121, score-0.164]

45 These correspond to the version space at the conclusion of batch b in the k-batch CAL algorithm. [sent-129, score-0.148]

46 h∈Vb−1 M Noting that P (DIS(V0 )) ≤ 1 implies V1 ⊆ B h∗ , c d log(M/d)+log(k/δ) , by induction we have M max er(h) ≤ max ǫ, c h∈Vk d log(M/d) + log(k/δ) M k−1 k For some constant c′ > 0, any M ≥ c′ θ(ǫ) ǫ1/k d log 1 ǫ k θ(ǫ)k−1 . [sent-137, score-0.126]

47 k−1 k Since M = ⌊n/k⌋, it suffices to have n ≥ k 1 + c′ θ(ǫ) ǫ1/k d log 1 ǫ + log(k/δ) . [sent-139, score-0.083]

48 1 has the property that, when the disagreement coefficient is small, the stated bound on the total number of label requests sufficient for learning is a decreasing function of k. [sent-141, score-0.297]

49 This makes sense, since θ(ǫ) small would imply that fully-sequential active learning is much better than passive learning. [sent-142, score-0.283]

50 Small values of k correspond to more passive-like behavior, while larger values of k take fuller advantage of the sequential nature of active learning. [sent-143, score-0.222]

51 In particular, when k = 1, we recover a well-known label complexity bound for passive learning by empirical risk minimization [10]. [sent-144, score-0.301]

52 Note that even k = 2 can sometimes provide significant reductions in label complexity over passive √ learning: for instance, by a factor proportional to 1/ ǫ in the case that θ(ǫ) is bounded by a finite constant. [sent-146, score-0.257]

53 4 Batch Mode Active Learning with Tsybakov noise The above analysis was for the realizable case. [sent-147, score-0.122]

54 To move beyond the realizable case, we need to allow the labels to be noisy, so that er(h∗ ) > 0. [sent-149, score-0.175]

55 based on a standard generalization bound for passive learning [12]. [sent-155, score-0.111]

56 Specifically, [12] have shown that, for any V ⊆ C, with probability at least 1 − δ/(4km2 ), sup |(er(h) − er(g)) − (erm (h) − erm (g))| < Em . [sent-156, score-0.12]

57 These correspond to the set V at the conclusion of batch b in the k-batch Robust CAL algorithm. [sent-189, score-0.148]

58 , k}, (1) (applied under the conditional distribution given Vb−1 , combined with the law of total probability) implies that ∀m > 0, letting Zb,m = {(Xi(b−1)M +1 , Yi(b−1)M +1 ), . [sent-193, score-0.08]

59 Combining these two results yields ¯ sup er(h) − er(h∗ ) = O h∈Vk (c2 θ(c2 ǫα ))β−β ¯ Mβ k−1 1 2−α (d log(M ) + log(kM/δ)) ¯ 1+β(β−β k−1 ) 2−α . [sent-220, score-0.06]

60 Setting this to ǫ and solving for n, we find that it suffices to have M ≥ c4 1 ǫ 2−α ¯ β α (c2 θ(c2 ǫ )) 1− β k−1 ¯ β d log d ǫ + log kd δǫ ¯ 1+β β−β k ¯ β , for some constant c4 ∈ [1, ∞), which then implies the stated result. [sent-221, score-0.206]

61 However, we can replace Em with a ˆ data-dependent local Rademacher complexity bound Em , as in [7], which also satisfies (1), and satisˆ m ≤ c′ Em , for some constant c′ ∈ [1, ∞) (see [13]). [sent-225, score-0.078]

62 A similar result can also be obtained for batch-based variants of other noise-robust disagreementbased active learning algorithms from the literature (e. [sent-227, score-0.2]

63 2 matches the best results for passive learning (up to log factors), which are known to be minimax optimal (again, up to log factors). [sent-231, score-0.249]

64 If we let k become large (while still considered as a constant), our result converges to the known results for one-at-a-time active learning with RobustCAL (again, up to log factors) [7, 14]. [sent-232, score-0.283]

65 Although those results are not always minimax optimal, they do represent the state-of-the-art in the general analysis of active learning, and they are really the best we could hope for from basing our algorithm on RobustCAL. [sent-233, score-0.2]

66 5 Buy-in-Bulk Solutions to Cost-Adaptive Active Learning The above sections discussed scenarios in which we have a fixed number k of batches, and we simply bounded the label complexity achievable within that constraint by considering a variant of CAL that uses k equal-sized batches. [sent-234, score-0.174]

67 In this section, we take a slightly different approach to the problem, by going back to one of the motivations for using batch-based active learning in the first place: namely, sublinear costs for answering batches of queries at a time. [sent-235, score-0.515]

68 If the cost of answering m queries at once is sublinear in m, then batch-based algorithms arise naturally from the problem of optimizing the total cost required for learning. [sent-236, score-0.376]

69 To understand the total cost required for learning in this model, we consider the following costadaptive modification of the CAL algorithm. [sent-239, score-0.123]

70 Request the labels of the next q ′ − q examples in DIS(V ) 8. [sent-247, score-0.098]

71 Update V by removing those classifiers inconsistent with these labels 9. [sent-248, score-0.076]

72 R ← DIS(V ) Note that the total cost expended by this method never exceeds the budget argument C. [sent-251, score-0.206]

73 In the realizable case, for some λ(ǫ, δ) = O c θ(ǫ) (d log(θ(ǫ)) + log(log(1/ǫ)/δ)) log(1/ǫ) , ˆ for any C ≥ λ(ǫ, δ), with probability at least 1 − δ, Cost-Adaptive CAL(C) returns a classifier h ˆ ≤ ǫ. [sent-255, score-0.116]

74 Supposing an unlimited budget (C = ∞), let us determine how much cost the algorithm incurs prior to having suph∈V er(h) ≤ ǫ; this cost would then be a sufficient size for C to guarantee this occurs. [sent-257, score-0.244]

75 Supposing suph∈V ′ er(h) > ǫ, we know (by the definition of θ(ǫ)) that P (R) ≤ P DIS B h∗ , sup er(h) h∈V ′ ≤ θ(ǫ) sup er(h). [sent-261, score-0.12]

76 So for each round of the outer loop while suph∈V er(h) > ǫ, by summing the geometric series of cost values c(q ′ − q) in the inner loop, we find the total cost incurred is at most O(c(θ(ǫ)(d log(θ(ǫ)) + log(log(1/ǫ)/δ)))). [sent-267, score-0.306]

77 Again, there are at most ⌈log2 (1/ǫ)⌉ rounds of the outer loop while suph∈V er(h) > ǫ, so that the total cost incurred before we have suph∈V er(h) ≤ ǫ is at most O(c(θ(ǫ)(d log(θ(ǫ)) + log(log(1/ǫ)/δ))) log(1/ǫ)). [sent-268, score-0.248]

78 sup er(h) ≤ Comparing this result to the known label complexity of CAL, which is (from [7]) O (θ(ǫ) (d log(θ(ǫ)) + log(log(1/ǫ)/δ)) log(1/ǫ)) , we see that the major factor, namely the O (θ(ǫ) (d log(θ(ǫ)) + log(log(1/ǫ)/δ))) factor, is now inside the argument to the cost function c(·). [sent-269, score-0.321]

79 In particular, when this cost function is sublinear, we 7 expect this bound to be significantly smaller than the cost required by the original fully-sequential CAL algorithm, which uses batches of size 1, so that there is a significant advantage to using this batch-mode active learning algorithm. [sent-270, score-0.527]

80 Again, this result is formulated for the realizable case for simplicity, but can easily be extended to the Tsybakov noise model as in the previous section. [sent-271, score-0.122]

81 In particular, by reasoning quite similar to ˆ that above, a cost-adaptive variant of the Robust CAL algorithm of [14] achieves error rate er(h) − ∗ er(h ) ≤ ǫ with probability at least 1 − δ using a total cost O c θ(c2 ǫα )c2 ǫ2α−2 dpolylog (1/(ǫδ)) log (1/ǫ) . [sent-272, score-0.223]

82 Since the number of label re˜ ˜ quests among m samples in the inner loop is roughly O(mP (R)) ≤ O(mθc2 (suph∈V ′ er(h) − ∗ α ∗ er(h )) ), the batch size needed to make suph∈V P (h(X) = h (X)) ≤ P (R)/(2θ) is at ˜ most O θc2 22/α (suph∈V ′ er(h) − er(h∗ ))2α−2 d . [sent-277, score-0.31]

83 6 Conclusions We have seen that the analysis of active learning can be adapted to the setting in which labels are requested in batches. [sent-281, score-0.319]

84 In the first case, we supposed the number k of batches is specified, and we analyzed the number of label requests used by an algorithm that requested labels in k equal-sized batches. [sent-283, score-0.436]

85 As a function of k, this label complexity became closer to that of the analogous results for fully-sequential active learning for larger values of k, and closer to the label complexity of passive learning for smaller values of k, as one would expect. [sent-284, score-0.658]

86 Our second model was based on a notion of the cost to request the labels of a batch of a given size. [sent-285, score-0.471]

87 We studied an active learning algorithm designed for this setting, and found that the total cost used by this algorithm may often be significantly smaller than that used by the analogous fully-sequential active learning methods, particularly when the cost function is sublinear. [sent-286, score-0.637]

88 There are many active learning algorithms in the literature that can be described (or analyzed) in terms of batches of label requests. [sent-287, score-0.432]

89 For instance, this is the case for the margin-based active learning strategy explored by [15]. [sent-288, score-0.2]

90 This could potentially be a fruitful future direction for the study of batch mode active learning. [sent-291, score-0.387]

91 The tradeoff between the total number of queries and the number of rounds examined in this paper is natural to study. [sent-292, score-0.159]

92 In any two-party communication task, there are three measures of complexity that are typically used: communication complexity (the total number of bits exchanged), round complexity (the number of rounds of communication), and time complexity. [sent-294, score-0.333]

93 The classic work [16] considered the problem of the tradeoffs between communication complexity and rounds of communication. [sent-295, score-0.186]

94 [17] studies the tradeoffs among all three of communication complexity, round complexity, and time complexity. [sent-296, score-0.107]

95 Interested readers may wish to go beyond the present and to study the tradeoffs among all the three measures of complexity for batch mode active learning. [sent-297, score-0.485]

96 Feature value acquisition in testing: a sequential batch test algorithm. [sent-303, score-0.17]

97 An optimization based framework for dynamic batch mode active learning. [sent-309, score-0.387]

98 A bound on the label complexity of agnostic active learning. [sent-331, score-0.419]

99 Local rademacher complexities and oracle inequalities in risk minimization. [sent-366, score-0.053]

100 Activized learning: Transforming passive to active with improved label complexity. [sent-370, score-0.407]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('suph', 0.455), ('er', 0.39), ('vb', 0.347), ('cal', 0.329), ('dis', 0.321), ('active', 0.2), ('request', 0.16), ('batch', 0.148), ('label', 0.124), ('batches', 0.108), ('dxy', 0.108), ('realizable', 0.099), ('cost', 0.087), ('qb', 0.085), ('tsybakov', 0.085), ('requests', 0.085), ('passive', 0.083), ('log', 0.083), ('ibm', 0.081), ('labels', 0.076), ('sublinear', 0.074), ('xm', 0.071), ('em', 0.067), ('sup', 0.06), ('queries', 0.057), ('supposing', 0.057), ('complexity', 0.05), ('budget', 0.048), ('tradeoffs', 0.048), ('rounds', 0.048), ('maxh', 0.043), ('requested', 0.043), ('erm', 0.043), ('communication', 0.04), ('ming', 0.039), ('tests', 0.039), ('mode', 0.039), ('loop', 0.038), ('ces', 0.037), ('total', 0.036), ('mb', 0.035), ('tb', 0.035), ('expended', 0.035), ('refered', 0.035), ('xibj', 0.035), ('xibm', 0.035), ('answering', 0.035), ('labelers', 0.031), ('delay', 0.028), ('bound', 0.028), ('analogous', 0.027), ('robust', 0.026), ('satis', 0.026), ('letting', 0.026), ('interested', 0.026), ('cohn', 0.025), ('microarray', 0.025), ('induction', 0.025), ('costs', 0.025), ('disagreement', 0.024), ('diagnostic', 0.024), ('patient', 0.024), ('oracles', 0.024), ('vk', 0.024), ('incurred', 0.023), ('virtually', 0.023), ('noise', 0.023), ('classi', 0.023), ('union', 0.023), ('sequential', 0.022), ('balcan', 0.022), ('atlas', 0.022), ('unlimited', 0.022), ('kd', 0.022), ('examples', 0.022), ('dx', 0.022), ('labeled', 0.022), ('modi', 0.02), ('ym', 0.019), ('pac', 0.019), ('round', 0.019), ('yt', 0.019), ('rademacher', 0.019), ('suf', 0.019), ('tradeoff', 0.018), ('complexities', 0.018), ('pay', 0.018), ('implies', 0.018), ('reasons', 0.018), ('agnostic', 0.017), ('expect', 0.017), ('testing', 0.017), ('least', 0.017), ('questions', 0.016), ('condition', 0.016), ('back', 0.016), ('ever', 0.016), ('outer', 0.016), ('annals', 0.016), ('risk', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

2 0.24921578 143 nips-2013-Integrated Non-Factorized Variational Inference

Author: Shaobo Han, Xuejun Liao, Lawrence Carin

Abstract: We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. Our approach unifies the integrated nested Laplace approximation (INLA) under the variational framework. The proposed method is applicable in more challenging scenarios than typically assumed by INLA, such as Bayesian Lasso, which is characterized by the non-differentiability of the 1 norm arising from independent Laplace priors. We derive an upper bound for the Kullback-Leibler divergence, which yields a fast closed-form solution via decoupled optimization. Our method is a reliable analytic alternative to Markov chain Monte Carlo (MCMC), and it results in a tighter evidence lower bound than that of mean-field variational Bayes (VB) method. 1

3 0.20932163 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

4 0.16401012 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

5 0.12903444 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu

Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classifier drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classifier selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a fixed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random fields and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classification task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1

6 0.11821534 133 nips-2013-Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering

7 0.099821508 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

8 0.098166674 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

9 0.094978385 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

10 0.08477886 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

11 0.083484739 171 nips-2013-Learning with Noisy Labels

12 0.079842433 317 nips-2013-Streaming Variational Bayes

13 0.072030753 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

14 0.066635132 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

15 0.064471744 230 nips-2013-Online Learning with Costly Features and Labels

16 0.060556751 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

17 0.058345892 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

18 0.058187257 188 nips-2013-Memory Limited, Streaming PCA

19 0.05590269 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting

20 0.053903844 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.138), (1, 0.024), (2, 0.001), (3, -0.029), (4, 0.083), (5, 0.079), (6, -0.005), (7, -0.006), (8, -0.004), (9, 0.094), (10, -0.021), (11, 0.016), (12, -0.089), (13, 0.088), (14, -0.01), (15, -0.301), (16, -0.157), (17, 0.094), (18, 0.144), (19, -0.111), (20, -0.121), (21, 0.012), (22, 0.072), (23, -0.01), (24, -0.087), (25, 0.013), (26, -0.048), (27, 0.079), (28, 0.057), (29, -0.063), (30, 0.156), (31, -0.076), (32, -0.065), (33, 0.035), (34, -0.086), (35, 0.105), (36, 0.005), (37, 0.12), (38, -0.056), (39, 0.043), (40, -0.025), (41, 0.118), (42, 0.026), (43, 0.029), (44, -0.109), (45, -0.003), (46, -0.064), (47, -0.098), (48, -0.066), (49, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94484228 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

2 0.7120809 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

3 0.69971412 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

4 0.60557568 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

5 0.58341944 143 nips-2013-Integrated Non-Factorized Variational Inference

Author: Shaobo Han, Xuejun Liao, Lawrence Carin

Abstract: We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. Our approach unifies the integrated nested Laplace approximation (INLA) under the variational framework. The proposed method is applicable in more challenging scenarios than typically assumed by INLA, such as Bayesian Lasso, which is characterized by the non-differentiability of the 1 norm arising from independent Laplace priors. We derive an upper bound for the Kullback-Leibler divergence, which yields a fast closed-form solution via decoupled optimization. Our method is a reliable analytic alternative to Markov chain Monte Carlo (MCMC), and it results in a tighter evidence lower bound than that of mean-field variational Bayes (VB) method. 1

6 0.55237985 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

7 0.54435539 133 nips-2013-Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering

8 0.47553715 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

9 0.43128777 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

10 0.4251847 317 nips-2013-Streaming Variational Bayes

11 0.4009586 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

12 0.37117177 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

13 0.36176851 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

14 0.36058289 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

15 0.35909775 171 nips-2013-Learning with Noisy Labels

16 0.33262184 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

17 0.32789287 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

18 0.32697952 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

19 0.3269715 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning

20 0.31668293 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.014), (16, 0.011), (25, 0.265), (33, 0.126), (34, 0.117), (36, 0.027), (41, 0.017), (49, 0.016), (56, 0.127), (70, 0.017), (85, 0.04), (89, 0.022), (93, 0.045), (95, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.7813651 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

Author: Daniel L. Yamins, Ha Hong, Charles Cadieu, James J. DiCarlo

Abstract: Humans recognize visually-presented objects rapidly and accurately. To understand this ability, we seek to construct models of the ventral stream, the series of cortical areas thought to subserve object recognition. One tool to assess the quality of a model of the ventral stream is the Representational Dissimilarity Matrix (RDM), which uses a set of visual stimuli and measures the distances produced in either the brain (i.e. fMRI voxel responses, neural firing rates) or in models (features). Previous work has shown that all known models of the ventral stream fail to capture the RDM pattern observed in either IT cortex, the highest ventral area, or in the human ventral stream. In this work, we construct models of the ventral stream using a novel optimization procedure for category-level object recognition problems, and produce RDMs resembling both macaque IT and human ventral stream. The model, while novel in the optimization procedure, further develops a long-standing functional hypothesis that the ventral visual stream is a hierarchically arranged series of processing stages optimized for visual object recognition. 1

same-paper 2 0.76174802 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

3 0.6394105 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

Author: Ziteng Wang, Kai Fan, Jiaqi Zhang, Liwei Wang

Abstract: We study differentially private mechanisms for answering smooth queries on databases consisting of data points in Rd . A K-smooth query is specified by a function whose partial derivatives up to order K are all bounded. We develop an -differentially private mechanism which for the class of K-smooth queries has K accuracy O(n− 2d+K / ). The mechanism first outputs a summary of the database. To obtain an answer of a query, the user runs a public evaluation algorithm which contains no information of the database. Outputting the summary runs in time d O(n1+ 2d+K ), and the evaluation algorithm for answering a query runs in time d+2+ 2d K ˜ O(n 2d+K ). Our mechanism is based on L∞ -approximation of (transformed) smooth functions by low degree even trigonometric polynomials with small and efficiently computable coefficients. 1

4 0.63811237 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu

Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classifier drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classifier selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a fixed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random fields and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classification task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1

5 0.63604105 184 nips-2013-Marginals-to-Models Reducibility

Author: Tim Roughgarden, Michael Kearns

Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1

6 0.63492149 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods

7 0.63470703 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

8 0.6344986 249 nips-2013-Polar Operators for Structured Sparse Estimation

9 0.63448298 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

10 0.63318592 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

11 0.63279998 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

12 0.63231194 318 nips-2013-Structured Learning via Logistic Regression

13 0.63204062 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

14 0.63183689 268 nips-2013-Reflection methods for user-friendly submodular optimization

15 0.63063508 149 nips-2013-Latent Structured Active Learning

16 0.63027459 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

17 0.63015878 75 nips-2013-Convex Two-Layer Modeling

18 0.63005328 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

19 0.63001418 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models

20 0.62965721 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent