nips nips2013 nips2013-42 knowledge-graph by maker-knowledge-mining

42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs


Source: pdf

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. [sent-9, score-1.096]

2 We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. [sent-10, score-2.147]

3 1 Introduction Active learning algorithms seek to mitigate the cost of learning by using unlabeled data and sequentially selecting examples to query for their label to minimize total number of queries. [sent-12, score-0.256]

4 In some cases, however, the actual cost of each query depends on the true label of the example and is thus not known before the label is requested. [sent-13, score-0.313]

5 We term this setting auditing, and the cost incurred by the algorithm its auditing complexity. [sent-17, score-0.877]

6 For example, we may wish the algorithm to maximize the number of positive labels it finds for a fixed “budget” of negative labels, or to minimize the number of negative labels while finding a certain number or fraction of positive labels. [sent-19, score-0.197]

7 Similar to active learning, we assume we are given a large set of unlabeled examples, and aim to learn with minimal labeling cost. [sent-21, score-0.278]

8 But unlike active learning, we only incur a cost when requesting the label of an example that turns out to be negative. [sent-22, score-0.334]

9 The close relationship between auditing and active learning raises natural questions. [sent-23, score-1.026]

10 Can the auditing complexity be significantly better than the label complexity in active learning? [sent-24, score-1.266]

11 If so, should 1 algorithms be optimized for auditing, or do optimal active learning algorithms also have low auditing complexity? [sent-25, score-1.026]

12 To answer these questions, and demonstrate the differences between active learning and auditing, we study the simple hypothesis classes of thresholds and of axis-aligned rectangles in Rd , in both the realizable and the agnostic settings. [sent-26, score-0.664]

13 Existing work on active learning with costs (Margineantu, 2007; Kapoor et al. [sent-29, score-0.249]

14 The active label complexity of an algorithm is the total number of label queries the algorithm makes. [sent-54, score-0.529]

15 Its auditing complexity is the number of queries the algorithm makes on points with negative labels. [sent-55, score-1.03]

16 Auditing: Summary of Results The main point of this paper is that the auditing complexity can be quite different from the active label complexity, and that algorithms tuned to minimizing the audit label complexity give improvements over standard active learning algorithms. [sent-67, score-1.585]

17 Before presenting these differences, we note that in some regimes, neither active learning nor auditing can improve significantly over the passive sample complexity. [sent-68, score-1.111]

18 If an algorithm always finds a ˆ ˆ hypothesis h with err(D, h) ≤ err(D, H)+ for > 0, then for any η ∈ (0, 1) there is a distribution D with η = err(D, H) such that the auditing complexity of this algorithm for D is Ω(dη 2 / 2 ). [sent-74, score-1.014]

19 That is, when η is fixed while → 0, the auditing complexity scales as Ω(d/ 2 ), similar to the passive sample complexity. [sent-75, score-0.984]

20 Here it is sufficient to consider the case ˆ where a fixed pool S of m points is given and the algorithm must return a hypothesis h such that ˆ = 0 with probability 1. [sent-79, score-0.282]

21 A pool labeling algorithm can be used to learn a hypothesis err(S, h) which is good for a distribution by drawing and labeling a large enough pool. [sent-80, score-0.299]

22 We define auditing complexity for an unlabeled pool as the minimal number of negative labels needed to perfectly classify it. [sent-81, score-1.161]

23 It is easy to see that there are pools with an auditing complexity at least the VC dimension of the hypothesis class. [sent-82, score-1.006]

24 1 an auditing complexity of Ω(d/α2 ) is unavoidable, but we can hope to improve over the passive sample complexity lower bound of Ω(d/ηα2 ) (Devroye and Lugosi, 1995) by avoiding the dependence on η. [sent-85, score-1.094]

25 Our main results are summarized in Table 1, which shows the auditing and active learning complexities in the two regimes, for thresholds on [0, 1] and axis-aligned rectangles in Rd , where we assume that the hypotheses label the points in the rectangle as negative and points outside as positive. [sent-86, score-1.508]

26 Thresholds Realizable Rectangles Active Θ(ln m) m Thresholds Ω ln 1 η Rectangles Agnostic Ω d 1 η + + Auditing 1 2d 1 α2 1 α2 O O d2 ln2 1 η 1 α2 · 1 α2 ln 1 α Table 1: Auditing complexity upper bounds vs. [sent-87, score-0.213]

27 active label complexity lower bounds for realizable (pool size m) and agnostic (err(D, H) = η) cases. [sent-88, score-0.542]

28 In the realizable case, for thresholds, the optimal active learning algorithm performs binary search, resulting in Ω(ln m) labels in the worst case. [sent-90, score-0.354]

29 This is a significant improvement over the passive label complexity of m. [sent-91, score-0.245]

30 However, a simple auditing procedure that scans from right to left queries only a single negative point, achieving an auditing complexity of 1. [sent-92, score-1.81]

31 For rectangles, we present a simple coordinate-wise scanning procedure with auditing complexity of at most 2d, demonstrating a huge gap versus active learning, where the labels of all m points might be required. [sent-93, score-1.184]

32 Not all classes enjoy reduced auditing complexity: we also show that for rectangles with positive points on the inside, there exists pools of size m with an auditing complexity of m. [sent-94, score-1.91]

33 For active learning, it has been shown that in some cases, the Ω(d/η) passive sample complexity can be replaced by an exponentially smaller O(d ln(1/η)) active label complexity (Hanneke, 2011), albeit sometimes with a larger polynomial dependence on d. [sent-96, score-0.742]

34 In other cases, an Ω(1/η) dependence exists also for active learning. [sent-97, score-0.221]

35 Our main question is whether the dependence on η in the active label complexity can be further reduced for auditing. [sent-98, score-0.392]

36 For thresholds, active learning requires Ω(ln(1/η)) labels (Kulkarni et al. [sent-99, score-0.263]

37 For rectangles, we show that the active label complexity is at least Ω(d/η). [sent-103, score-0.367]

38 In contrast, we propose an algorithm with an auditing complexity of O(d2 ln2 (1/η)), reducing the linear dependence on 1/η to a logarithmic dependence. [sent-104, score-0.935]

39 3 3 Auditing for Thresholds on the Line The first question to ask is whether the audit label complexity can ever be significantly smaller than the active or passive label complexities, and whether a different algorithm is required to achieve this improvement. [sent-108, score-0.575]

40 The optimal active label complexity of Θ(log2 m) can be achieved by a binary search on the pool. [sent-114, score-0.367]

41 The auditing complexity of this algorithm can also be as large as Θ(log2 (m)). [sent-115, score-0.91]

42 This case exemplifies an interesting contrast between auditing and active learning. [sent-117, score-1.026]

43 Due to information-theoretic considerations, any algorithm which learns an unlabeled pool S has an active label complexity of at least log2 |H|S | (Kulkarni et al. [sent-118, score-0.552]

44 We showed that for the realizable case, the auditing label complexity for H is a constant. [sent-122, score-1.098]

45 The intuition behind our approach is that to get the optimal threshold in a pool with at most k errors, we can query from highest to lowest until observing k + 1 negative points and then find the minimal error threshold on the labeled points. [sent-124, score-0.347]

46 Then the ˆ ˆ procedure above finds h such that err(S, h) = err(S, H ) with an auditing complexity of k + 1. [sent-128, score-0.899]

47 To avoid this dependence, the auditing algorithm we propose uses Alg. [sent-138, score-0.841]

48 The algorithm for auditing thresholds on the line in the agnostic case is listed in Alg. [sent-147, score-0.991]

49 2 with input ηmax , δ, α has an auditing complexity of O(ln(1/δ)/α2 ), and returns h such that ˆ with probability 1 − δ, err(D, h) ≤ (1 + α)ηmax . [sent-194, score-0.899]

50 It immediately follows that if η = err(D, H) is known, (α, δ)-learning is achievable with an auditing complexity that does not depend on η. [sent-195, score-0.899]

51 2 with inputs ηmax = η, α, δ (α, δ)-learns D with respect to H with an auditing complexity of O(ln(1/δ)/α2 ). [sent-201, score-0.899]

52 The following lower bound shows that in this case, the best active complexity for threshold this similar to the best active label complexity. [sent-204, score-0.593]

53 For any δ ∈ (0, 1), if an auditing algorithm (α, δ)-learns any distribution D such that err(D, H ) ≥ ηmin , then the algorithm’s auditing complexity is Ω(ln( 1−δ ) ln(1/ηmin )). [sent-208, score-1.74]

54 δ In the next section show that there are classes with a significant gap between active and auditing complexities even without an upper bound on the error. [sent-209, score-1.096]

55 (1989), has been studied extensively in different regimes (Kearns, 1998; Long and Tan, 1998), including active learning (Hanneke, 2007b). [sent-212, score-0.211]

56 Because auditing costs are asymmetric, we consider two possibilities for label assignment. [sent-217, score-0.968]

57 , a[d]) ∈ Rd , define the hypotheses ha and h− by + a ha (x) = 2I[∃i ∈ [d], x[i] ≥ a[i]] − 1, and h− (x) = −ha (x). [sent-221, score-0.206]

58 All of our results for these classes can be easily extended to the corresponding classes of general axis-aligned rectangles on Rd , with at most a factor of two penalty on the auditing complexity. [sent-225, score-0.985]

59 1 The Realizable Case We first consider the pool setting for the realizable case, and show a sharp contrast between the − auditing complexity and the active label complexity for H2 and H2 . [sent-230, score-1.486]

60 − While the active learning complexity for H2 and H2 can be as large as m, the auditing complexities − for the two classes are quite different. [sent-232, score-1.149]

61 For H2 , the auditing complexity can be as large as m, but for H2 it is at most d. [sent-233, score-0.899]

62 We start by showing the upper bound for auditing of H2 . [sent-234, score-0.846]

63 The auditing complexity of any unlabeled pool Su of size m with respect to H2 is at most d. [sent-237, score-1.056]

64 The method is a generalization of the approach to auditing for thresholds. [sent-239, score-0.83]

65 It is easy to see that a similar approach yields an auditing complexity of 2d for full axis-aligned − rectangles. [sent-246, score-0.899]

66 We now provide a lower bound for the auditing complexity of H2 that immediately − implies the same lower bound for active label complexity of H2 and H2 . [sent-247, score-1.298]

67 For any m and any d ≥ 2, there is a pool − Su ⊆ Rd of size m such that its auditing complexity with respect to H2 is m. [sent-250, score-1.022]

68 The construction is a simple adaptation of a construction due to Dasgupta (2005), originally showing an active learning lower bound for the class of hyperplanes. [sent-252, score-0.237]

69 Any labeling which labels all the points in Su negative except any one − point is realizable for H2 , and so is the all-negative labeling. [sent-254, score-0.265]

70 3 (Realizable active label complexity of H2 and H2 ). [sent-257, score-0.367]

71 For H2 and H2 , there is a pool of size m such that its active label complexity is m. [sent-258, score-0.49]

72 However, for a general distribution, active label complexity cannot be significantly better than passive label complexity. [sent-262, score-0.543]

73 Any learning 2 algorithm that (α, δ)-learns all distributions such that err(D, H) = η for η > 0 with respect to H2 has an active label complexity of Ω(d/η). [sent-267, score-0.378]

74 In contrast, the auditing complexity of H2 can be much smaller, as we show for Alg. [sent-268, score-0.899]

75 For ηmin , α, δ ∈ (0, 1), there is an algorithm that (α, δ)-learns all distributions with η ≥ ηmin with respect to H2 with an auditing complexity of 2 O( d ln(1/αδ) ln2 (1/ηmin )). [sent-272, score-0.91]

76 α2 If ηmin is polynomially close to the true η, we get an auditing complexity of O(d2 ln2 (1/η)), compared to the active label complexity of Ω(d/η), an exponential improvement in η. [sent-273, score-1.266]

77 To bound the number of negative labels, the algorithm iteratively refines lower bounds on the locations of the best thresholds, and an upper bound on the negative error, defined as the probability that a point from D with negative label is classified as 6 positive by a minimal-error classifier. [sent-278, score-0.274]

78 The idea of iteratively refining a set of possible hypotheses has been used in a long line of active learning works (Cohn et al. [sent-280, score-0.251]

79 for i = 1 to d do j←0 while j ≤ (1 + ν)ηt |St | + 1 do If unqueried points exist, query the unqueried point with highest i’th coordinate; If query returned −1, j ← j + 1. [sent-296, score-0.26]

80 Its auditing complexity is bounded since it queries a bounded number of negative points at each round. [sent-309, score-1.019]

81 5 Outcome-dependent Costs for a General Hypothesis Class In this section we return to the realizable pool setting and consider finite hypothesis classes H. [sent-310, score-0.356]

82 Let S ⊆ X be an unlabeled pool, and let cost : S × H → R+ denote the cost of a query: For x ∈ S and h ∈ H, cost(x, h) is the cost of querying the label of x given that h is the true (unknown) hypothesis. [sent-312, score-0.255]

83 In the auditing setting, Y = {−1, +1} and cost(x, h) = I[h(x) = −1]. [sent-313, score-0.83]

84 In the active learning setting, where cost ≡ 1, it is NP-hard to obtain OPTcost (S) for general H and S. [sent-317, score-0.232]

85 A simple adaptation of the reduction for the auditing complexity, which we defer to the full version of this work, shows that it is also NP-hard to obtain OPTcost (S) in the auditing setting. [sent-319, score-1.672]

86 For active learning, and for query costs that do not depend on the true hypothesis (that is cost(x, h) ≡ cost(x)), Golovin and Krause (2011) showed an efficient greedy strategy that achieves a cost of O(OPTcost (S) · ln(|H|)) for any S. [sent-320, score-0.449]

87 For any cost function cost, hypothesis class H, pool S, and true hypothesis h ∈ H, the cost of the proposed algorithm is at most (ln(|H|S | − 1) + 1) · OPT. [sent-341, score-0.405]

88 If cost is the auditing cost, the proposed algorithm corresponds to the following intuitive strategy: At every round, select a query such that, if its result is a negative label, then the number of hypotheses removed from the version space is the largest. [sent-342, score-1.031]

89 In the auditing setting, it is always preferable to query x before querying x . [sent-344, score-0.914]

90 Therefore, for any realizable auditing problem, there exists an optimal algorithm that adheres to this principle. [sent-345, score-0.938]

91 An O(ln(|H|S |)) approximation factor for auditing is less appealing than the same factor for active learning. [sent-347, score-1.026]

92 By information-theoretic arguments, active label complexity is at least log2 (|H|S |) (and hence the approximation at most squares the cost), but this does not hold for auditing. [sent-348, score-0.367]

93 Nonetheless, hardness of approximation results for set cover (Feige, 1998), in conjunction with the reduction to set cover of Hyafil and Rivest (1976) mentioned above, imply that such an approximation factor cannot be avoided for a general auditing algorithm. [sent-349, score-0.841]

94 6 Conclusion and Future Directions As summarized in Section 2, we show that in the auditing setting, suitable algorithms can achieve improved costs in the settings of thresholds on the line and axis parallel rectangles. [sent-350, score-0.938]

95 First, it is known that for some hypothesis classes, active learning cannot improve over passive learning for certain distributions (Dasgupta, 2005), and the same is true for auditing. [sent-352, score-0.363]

96 However, exponential speedups are possible for active learning on certain classes of distributions (Balcan et al. [sent-353, score-0.24]

97 It is an open question whether a similar property of the distribution can guarantee an improvement with auditing over active or passive learning. [sent-356, score-1.1]

98 An interesting generalization of the auditing problem is a multiclass setting with a different cost for each label. [sent-358, score-0.866]

99 These measures are different from those studied in active learning, and may lead to new algorithmic insights. [sent-360, score-0.196]

100 A bound on the label complexity of agnostic active learning. [sent-445, score-0.461]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('auditing', 0.83), ('err', 0.335), ('active', 0.196), ('pool', 0.123), ('label', 0.102), ('rectangles', 0.101), ('realizable', 0.097), ('hypothesis', 0.093), ('ha', 0.084), ('agnostic', 0.078), ('passive', 0.074), ('query', 0.073), ('thresholds', 0.072), ('ln', 0.072), ('complexity', 0.069), ('sq', 0.065), ('sabato', 0.055), ('optcost', 0.052), ('sbt', 0.052), ('labels', 0.05), ('negative', 0.043), ('hanneke', 0.042), ('errneg', 0.041), ('mag', 0.041), ('points', 0.039), ('queries', 0.038), ('hypotheses', 0.038), ('argminh', 0.037), ('labeling', 0.036), ('cost', 0.036), ('costs', 0.036), ('unlabeled', 0.034), ('st', 0.032), ('unqueried', 0.031), ('rd', 0.03), ('dasgupta', 0.03), ('complexities', 0.027), ('classes', 0.027), ('max', 0.026), ('hya', 0.025), ('dependence', 0.025), ('golovin', 0.024), ('kapoor', 0.024), ('vc', 0.023), ('kulkarni', 0.023), ('beygelzimer', 0.023), ('su', 0.022), ('rectangle', 0.021), ('audit', 0.021), ('blocki', 0.021), ('disjunction', 0.021), ('disjunctions', 0.021), ('fraud', 0.021), ('wasteful', 0.021), ('balcan', 0.019), ('queried', 0.019), ('multiset', 0.018), ('draw', 0.017), ('et', 0.017), ('blumer', 0.017), ('gonen', 0.017), ('honest', 0.017), ('orthant', 0.017), ('rivest', 0.017), ('copies', 0.017), ('return', 0.016), ('sarwate', 0.016), ('error', 0.016), ('bound', 0.016), ('cohn', 0.015), ('vapnik', 0.015), ('regimes', 0.015), ('greedy', 0.015), ('minh', 0.014), ('threshold', 0.014), ('xm', 0.014), ('devroye', 0.014), ('transaction', 0.014), ('pays', 0.014), ('credit', 0.014), ('jacm', 0.014), ('pools', 0.014), ('unavoidable', 0.014), ('class', 0.013), ('settles', 0.013), ('vt', 0.013), ('labeled', 0.013), ('returned', 0.013), ('minimal', 0.012), ('bt', 0.012), ('hidden', 0.012), ('adaptation', 0.012), ('querying', 0.011), ('considerations', 0.011), ('avoided', 0.011), ('algorithm', 0.011), ('sequentially', 0.011), ('sample', 0.011), ('lemma', 0.011), ('krause', 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

2 0.22852167 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.15927994 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

4 0.098166674 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

5 0.094314046 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Author: Po-Ling Loh, Martin J. Wainwright

Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1

6 0.082151949 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

7 0.06560982 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

8 0.059010405 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

9 0.056905467 242 nips-2013-PAC-Bayes-Empirical-Bernstein Inequality

10 0.052144106 230 nips-2013-Online Learning with Costly Features and Labels

11 0.049695715 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

12 0.049002841 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

13 0.048955046 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test

14 0.046249542 171 nips-2013-Learning with Noisy Labels

15 0.039278023 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

16 0.03730211 69 nips-2013-Context-sensitive active sensing in humans

17 0.037030667 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

18 0.034825992 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

19 0.030157199 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

20 0.029669801 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.091), (1, 0.013), (2, 0.014), (3, -0.025), (4, 0.059), (5, 0.044), (6, -0.039), (7, -0.029), (8, -0.023), (9, 0.113), (10, -0.003), (11, -0.062), (12, -0.059), (13, 0.027), (14, 0.026), (15, -0.18), (16, -0.141), (17, 0.112), (18, 0.098), (19, -0.036), (20, -0.09), (21, 0.002), (22, 0.043), (23, -0.027), (24, 0.027), (25, -0.012), (26, -0.09), (27, 0.056), (28, -0.101), (29, 0.017), (30, 0.079), (31, -0.013), (32, 0.003), (33, -0.002), (34, -0.07), (35, -0.004), (36, 0.044), (37, 0.126), (38, 0.051), (39, 0.006), (40, 0.055), (41, -0.066), (42, 0.014), (43, 0.004), (44, -0.028), (45, -0.027), (46, -0.018), (47, -0.078), (48, 0.032), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95681214 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

2 0.91441935 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.79583937 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

4 0.77536768 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

5 0.60505342 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

6 0.50532246 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

7 0.47916764 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

8 0.42285129 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

9 0.41517782 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

10 0.39854762 69 nips-2013-Context-sensitive active sensing in humans

11 0.36271688 171 nips-2013-Learning with Noisy Labels

12 0.33899441 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

13 0.32776424 230 nips-2013-Online Learning with Costly Features and Labels

14 0.3251662 72 nips-2013-Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses

15 0.31903824 252 nips-2013-Predictive PAC Learning and Process Decompositions

16 0.29236805 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

17 0.28796896 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

18 0.26174119 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

19 0.26088235 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions

20 0.25862214 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.016), (16, 0.01), (24, 0.232), (33, 0.11), (34, 0.067), (41, 0.011), (49, 0.018), (56, 0.162), (70, 0.022), (85, 0.025), (89, 0.019), (93, 0.04), (95, 0.146)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77425545 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

2 0.72937512 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.70965719 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1

4 0.7055003 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Author: Po-Ling Loh, Martin J. Wainwright

Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1

5 0.69645894 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Guy van den Broeck, Adnan Darwiche

Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1

6 0.68824232 185 nips-2013-Matrix Completion From any Given Set of Observations

7 0.67866445 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

8 0.663885 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

9 0.65016925 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

10 0.64638722 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

11 0.64377171 184 nips-2013-Marginals-to-Models Reducibility

12 0.64215392 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

13 0.63833976 149 nips-2013-Latent Structured Active Learning

14 0.6336627 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

15 0.63299638 60 nips-2013-Buy-in-Bulk Active Learning

16 0.63223064 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

17 0.63144159 249 nips-2013-Polar Operators for Structured Sparse Estimation

18 0.63004237 224 nips-2013-On the Sample Complexity of Subspace Learning

19 0.629843 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods

20 0.62964851 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields