nips nips2013 nips2013-309 knowledge-graph by maker-knowledge-mining

309 nips-2013-Statistical Active Learning Algorithms


Source: pdf

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. [sent-5, score-0.787]

2 The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. [sent-6, score-0.664]

3 We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. [sent-8, score-1.485]

4 In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. [sent-11, score-0.667]

5 This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. [sent-12, score-0.98]

6 An extensively used and studied technique is active learning, where the algorithm is presented with a large pool of unlabeled examples and can interactively ask for the labels of examples of its own choosing from the pool, with the goal to drastically reduce labeling effort. [sent-16, score-0.899]

7 In particular, several general characterizations have been developed for describing when active learning can in principle have an advantage over the classic passive supervised learning paradigm, and by how much. [sent-18, score-0.853]

8 The situation is even worse in the presence of noise where active learning appears to be particularly hard. [sent-20, score-0.639]

9 In particular, prior to this work, there were no known efficient active algorithms for concept classes of super-constant VC-dimension that are provably robust to random and independent noise while giving improvements over the passive case. [sent-21, score-0.977]

10 Our Results: We propose a framework for designing efficient (polynomial time) active learning algorithms which is based on restricting the way in which examples (both labeled and unlabeled) are accessed by the algorithm. [sent-22, score-0.75]

11 These restricted algorithms can be easily simulated using active sampling and, in addition, possess a number of other useful properties. [sent-23, score-0.649]

12 The main property we will consider is 1 tolerance to random classification noise of rate η (each label is flipped randomly and independently with probability η [1]). [sent-24, score-0.529]

13 For τ and τ0 chosen by the algorithm the estimate is provided to within tolerance τ as long as E(x,y)∼P [x ∈ χS ] ≥ τ0 (nothing is guaranteed otherwise). [sent-30, score-0.379]

14 Such a query is referred to as active statistical query (SQ) and algorithms using active SQs are referred to as active statistical algorithms. [sent-32, score-2.284]

15 Further, statistical algorithms enjoy additional properties: they can be simulated in a differentially-private way [11], automatically parallelized on multi-core architectures [17] and have known information-theoretic characterizations of query complexity [13, 26]. [sent-37, score-0.434]

16 As we show, our framework inherits the strengths of the SQ model while, as we will argue, capturing the power of active learning. [sent-38, score-0.563]

17 At a first glance being active and statistical appear to be incompatible requirements on the algorithm. [sent-39, score-0.631]

18 Active algorithms typically make label query decisions on the basis of examining individual samples (for example as in binary search for learning a threshold or the algorithms in [27, 21, 22]). [sent-40, score-0.379]

19 But there also exist a number of active learning algorithms that can be seen as applying passive learning techniques to batches of examples that are obtained from querying labels of samples that satisfy the same filter. [sent-42, score-0.927]

20 We start by presenting a general reduction showing that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. [sent-45, score-1.485]

21 We then demonstrate the generality of our framework by showing that the most commonly studied concept classes including thresholds, balanced rectangles, and homogenous linear separators can be efficiently actively learned via active statistical algorithms. [sent-46, score-0.821]

22 For these concept classes, we design efficient active learning algorithms that are statistical and provide the same exponential improvements in the dependence on the error over passive learning as their non-statistical counterparts. [sent-47, score-0.999]

23 The primary problem we consider is active learning of homogeneous halfspaces, a problem that has attracted a lot of interest in the theory of active learning [27, 18, 3, 9, 22, 16, 23, 8, 28]. [sent-48, score-1.228]

24 Our algorithm for this setting proceeds in rounds; in round t we build a better approximation wt to the target function by using a passive SQ learning algorithm (e. [sent-51, score-0.349]

25 To perform passive statistical queries relative to Dt we use active SQs with a corresponding real valued filter. [sent-54, score-1.037]

26 This algorithm is computationally efficient and uses only poly(d, log(1/ )) active statistical queries of tolerance inverse-polynomial in the dimension d and log(1/ ). [sent-55, score-1.181]

27 2 For the special case of the uniform distribution over the unit ball we give a new, simpler and substantially more efficient active statistical learning algorithm. [sent-57, score-0.697]

28 The algorithm is computationally efficient and uses d log(1/ ) √ active SQs with tolerance of Ω(1/ d) and filter tolerance of Ω(1/ ). [sent-61, score-1.295]

29 However, compared to passive learning in the presence of RCN, our algorithms have exponentially better dependence on and essentially the same dependence on d and 1/(1 − 2η). [sent-65, score-0.326]

30 Standard approach to dealing with this issue does not always work in the active setting and for our log-concave and the uniform distribution algorithms we give a specialized argument that preserves the exponential improvement in the dependence on . [sent-67, score-0.716]

31 Differentially-private active learning: In many application of machine learning such as medical and financial record analysis, data is both sensitive and expensive to label. [sent-68, score-0.587]

32 We address the problem by defining a natural model of differentially-private active learning. [sent-70, score-0.563]

33 As usual, the goal is to minimize the number of label requests (such setup is referred to as pool-based active learning [33]). [sent-73, score-0.689]

34 Using a similar approach, we show that active SQ learning algorithms can be automatically transformed into differentially-private active learning algorithms. [sent-77, score-1.192]

35 Using our active statistical algorithms for halfspaces we obtain the first algorithms that are both differentially-private and give exponential improvements in the dependence of label complexity on the accuracy parameter . [sent-78, score-1.092]

36 Additional related work: As we have mentioned, most prior theoretical work on active learning focuses on either sample complexity bounds (without regard for efficiency) or the noiseless case. [sent-79, score-0.626]

37 In addition, several works give active learning algorithms with empirical evidence of robustness to certain types of noise [9, 28]; In [16, 23] online learning algorithms in the selective sampling framework are presented, where labels must be actively queried before they are revealed. [sent-81, score-0.938]

38 Under the assumption that the label conditional distribution is a linear function determined by a fixed target vector, they provide bounds on the regret of the algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. [sent-82, score-0.367]

39 In this setting they obtain exponential improvement in label complexity over passive learning. [sent-86, score-0.395]

40 In other words, the addition of linear noise defined by the target makes the problem easier for active sampling. [sent-90, score-0.673]

41 When learning with respect to a distribution P = (D, ψ), an active statistical learner has access to active statistical queries. [sent-96, score-1.308]

42 A query of this type is a pair of functions (χ, φ), where χ : X → [0, 1] is the filter function which for a point x, specifies the probability with which the label of x should be queried. [sent-97, score-0.313]

43 In addition, a query has two tolerance parameters: filter tolerance τ0 and query tolerance τ . [sent-102, score-1.485]

44 An active statistical learning algorithm can also ask target-independent queries with tolerance τ which are just queries over unlabeled samples. [sent-104, score-1.506]

45 Our definition generalizes the statistical query framework of Kearns [30] which does not include filtering function, in other words a query is just a function φ : X × {−1, 1} → [−1, 1] and it has a single tolerance parameter τ . [sent-108, score-0.847]

46 By definition, an active SQ (χ, φ) with tolerance τ relative to P is the same as a passive statistical query φ with tolerance τ relative to the distribution P|χ . [sent-109, score-1.785]

47 In particular, a (passive) SQ is equivalent to an active SQ with filter χ ≡ 1 and filter tolerance 1. [sent-110, score-0.916]

48 We note that from the definition of active SQ we can see that EP|χ [φ(x, )] = EP [φ(x, ) · χ(x)]/EP [χ(x)]. [sent-111, score-0.563]

49 This implies that an active statistical query can be estimated using two passive statistical queries. [sent-112, score-1.147]

50 However to estimate EP|χ [φ(x, )] with tolerance τ one needs to estimate EP [φ(x, ) · χ(x)] with tolerance τ · EP [χ(x)] which can be much lower than τ . [sent-113, score-0.706]

51 Tolerance of a SQ directly corresponds to the number of examples needed to evaluate it and therefore simulating active SQs passively might require many more labeled examples. [sent-114, score-0.737]

52 1 Simulating Active Statistical Queries We first note that a valid response to a target-independent query with tolerance τ can be obtained, with probability at least 1 − δ, using O(τ −2 log (1/δ)) unlabeled samples. [sent-116, score-0.796]

53 A natural way of simulating an active SQ is by filtering points drawn randomly from D: draw a random point x, let B be drawn from Bernoulli distribution with probability of success χ(x); ask for the label of x when B = 1. [sent-117, score-0.745]

54 There exists an active sampling algorithm that given functions χ : X → [0, 1], φ : X × {−1, 1} → [−1, 1], values τ0 > 0, τ > 0, δ > 0, and access to samples from P , with probability at least 1 − δ, outputs a valid response to active statistical query (χ, φ) with tolerance parameters (τ0 , τ ). [sent-124, score-1.951]

55 The algorithm uses −1 O(τ −2 log (1/δ)) labeled examples from P and O(τ0 τ −2 log (1/δ)) unlabeled samples from D. [sent-125, score-0.38]

56 4 A direct way to simulate all the queries of an active SQ algorithm is to estimate the response to each query using fresh samples and use the union bound to ensure that, with probability at least 1 − δ, all queries are answered correctly. [sent-126, score-1.212]

57 Such direct simulation of an algorithm that uses at most q queries can −1 be done using O(qτ −2 log(q/δ)) labeled examples and O(qτ0 τ −2 log (q/δ)) unlabeled samples. [sent-127, score-0.533]

58 2 Noise tolerance An important property of the simulation described in Theorem 2. [sent-135, score-0.389]

59 We now show that, as in the SQ model [30], active statistical queries can be simulated given examples from P η . [sent-139, score-0.889]

60 The algorithm uses O(τ −2 (1 − 2η)−2 log (1/δ)) labeled examples from P η and −1 O(τ0 τ −2 (1 − 2η)−2 log (1/δ)) unlabeled samples from D. [sent-144, score-0.38]

61 Note that the sample complexity of the resulting active sampling algorithm has informationtheoretically optimal quadratic dependence on 1/(1 − 2η), where η is the noise rate. [sent-145, score-0.755]

62 It is easy to see from 1−2η the proof, that any value η such that 1−2η ∈ [1 − τ /4, 1 + τ /4] can be used in place of η (with 1 η the tolerance of estimating EP|χ [ 2 (φ(x, 1) − φ(x, −1)) · ] set to (1 − 2η)τ /4). [sent-149, score-0.353]

63 Passive hypothesis testing requires Ω(1/ ) labeled examples and might be too expensive to be used with active learning algorithms. [sent-160, score-0.728]

64 It is unclear if there exists a general approach for dealing with unknown η in the active learning setting that does not increase substantially the labelled example complexity. [sent-161, score-0.563]

65 However, as we will demonstrate, in the context of specific active learning algorithms variants of this approach can be used to solve the problem. [sent-162, score-0.596]

66 Intuitively, Λ is “uncorrelated” with a query if the way that Λ deviates from its rate is almost orthogonal to the query on the target distribution. [sent-168, score-0.46]

67 There exists an active sampling algorithm that given functions χ and φ, values η, τ0 > 0, τ > 0, δ > 0, and access to samples from P Λ , with probability at least 1 − δ, outputs a valid response to active statistical query (χ, φ) with tolerance parameters (τ0 , τ ). [sent-179, score-1.951]

68 The algorithm uses O(τ −2 (1 − 2η)−2 log (1/δ)) labeled examples from P Λ and −1 O(τ0 τ −2 (1 − 2η)−2 log (1/δ)) unlabeled samples from D. [sent-180, score-0.38]

69 5 is that one can simulate an active SQ algorithm A using examples corrupted by noise Λ as long as Λ is (η, (1 − 2η)τ /4)-uncorrelated with all A’s queries of tolerance τ for some fixed η. [sent-182, score-1.311]

70 3 Simple examples Thresholds: We show that a classic example of active learning a threshold function on an interval can be easily expressed using active SQs. [sent-184, score-1.246]

71 We ask a query φ(x, ) = ( +1)/2 with filter χ(x) which is the indicator function of the interval [a, b] with tolerance 1/4 and filter tolerance b − a. [sent-187, score-0.988]

72 In each iteration only constant 1/4 tolerance is necessary and filter tolerance is never below . [sent-194, score-0.706]

73 At a high level, the A2 algorithm is an iterative, disagreement-based active learning algorithm. [sent-198, score-0.589]

74 This can be easily done via active statistical queries. [sent-203, score-0.631]

75 Note that while the number of active statistical queries needed to do this could be large, the number of labeled examples needed to simulate these queries is essentially the same as the number of labeled examples needed by the known A2 analyses [29]. [sent-204, score-1.265]

76 6 3 Learning of halfspaces In this section we outline our reduction from active learning to passive learning of homogeneous linear separators based on the analysis of Balcan and Long [8]. [sent-207, score-1.134]

77 Combining it with the SQ learning algorithm for halfspaces by Dunagan and Vempala [24], we obtain the first efficient noise-tolerant active learning of homogeneous halfspaces for any isotropic log-concave distribution. [sent-208, score-1.12]

78 One of the key point of this result is that it is relatively easy to harness the involved results developed for SQ framework to obtain new active statistical algorithms. [sent-209, score-0.631]

79 Further LearnHS outputs a homogeneous halfspace, runs in time polynomial in d,1/ and log(1/λ) and uses SQs of tolerance ≥ 1/poly(d, 1/ , log(1/λ)), where λ = ED [χ(x)]. [sent-217, score-0.487]

80 There exists an active SQ algorithm ActiveLearnHS-LogC (Algorithm 1) that for any isotropic log-concave distribution D on Rd , learns Hd over D to accuracy 1 − in time poly(d, log(1/ )) and using active SQs of tolerance ≥ 1/poly(d, log(1/ )) and filter tolerance Ω( ). [sent-221, score-1.971]

81 Algorithm 1 ActiveLearnHS-LogC: Active SQ learning of homogeneous halfspaces over isotropic log-concave densities 1: %% Constants c, C1 , C2 and C3 are determined by the analysis. [sent-222, score-0.382]

82 Therefore our algorithm can be used to learn halfspaces over general log-concave densities as long as the target halfspace passes through the mean of the density. [sent-225, score-0.314]

83 5) to obtain an efficient active learning algorithm for homogeneous halfspaces over log-concave densities in the presence of random classification noise of known rate. [sent-228, score-0.964]

84 3) we obtain an active algorithm which does not require the knowledge of the noise rate. [sent-230, score-0.665]

85 There exists a polynomial-time active learning algorithm that for any η ∈ [0, 1/2), learns Hd over any log-concave distributions with random classification noise of rate η to error using poly(d, log(1/ ), 1/(1 − 2η)) labeled examples and a polynomial number of unlabeled samples. [sent-233, score-0.941]

86 There exists an active SQ algorithm ActiveLearnHS-U that learns Hd over the uniform distribution on the (d − 1)-dimensional unit sphere to accuracy 1 − , uses (d + √ 1) log(1/ ) active SQs with tolerance of Ω(1/ d) and filter tolerance of Ω(1/ ) and runs in time d · poly(log (d/ )). [sent-239, score-1.921]

87 4 Differentially-private active learning In this section we show that active SQ learning algorithms can also be used to obtain differentially private active learning algorithms. [sent-240, score-1.815]

88 Here we consider algorithms that operate on S in an active way. [sent-248, score-0.596]

89 Let A be an algorithm that learns a class of functions H to accuracy 1 − over distribution D using M1 active SQs of tolerance τ and filter tolerance τ0 and M2 target-independent queries of tolerance τu . [sent-253, score-1.849]

90 There exists a learning algorithm A that given α > 0, δ > 0 and active access to database S ⊆ X × {−1, 1} is α-differentially private and uses at most O([ M1 + ατ M1 M1 M1 M2 M2 2 τ 2 ] log(M1 /δ)) labels. [sent-254, score-0.724]

91 2 is that for learning of homogeneous halfspaces over uniform or log-concave distributions we can obtain differential privacy while essentially preserving the label complexity. [sent-258, score-0.491]

92 4, we can efficiently and differentially-privately learn homogeneous halfspaces under the uniform distribution with privacy √ parameter α and error parameter by using only O(d d log(1/ ))/α + d2 log(1/ )) labels. [sent-261, score-0.391]

93 However, it is known that any passive learning algorithm, even ignoring privacy considerations and noise √ requires Ω d + 1 log 1 labeled examples. [sent-262, score-0.519]

94 5 Discussion Our work suggests that, as in passive learning, active statistical algorithms might be essentially as powerful as example-based efficient active learning algorithms. [sent-264, score-1.462]

95 It would be interesting to develop an active analogue of these techniques and give meaningful lower bounds based on them. [sent-267, score-0.621]

96 Specification and simulation of statistical query algorithms for efficiency and noise tolerance. [sent-278, score-0.426]

97 Selective sampling and active learning from single and multiple teachers. [sent-414, score-0.589]

98 A bound on the label complexity of agnostic active learning. [sent-448, score-0.736]

99 Rademacher complexities and bounding the excess risk in active learning. [sent-456, score-0.563]

100 Employing EM in pool-based active learning for text classification. [sent-467, score-0.563]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('active', 0.563), ('tolerance', 0.353), ('sq', 0.308), ('passive', 0.235), ('query', 0.213), ('halfspaces', 0.173), ('queries', 0.171), ('sqs', 0.151), ('lter', 0.136), ('unlabeled', 0.115), ('homogeneous', 0.102), ('label', 0.1), ('balcan', 0.095), ('learnhs', 0.094), ('tolerant', 0.092), ('isotropic', 0.083), ('privacy', 0.083), ('noise', 0.076), ('labeled', 0.071), ('statistical', 0.068), ('rcn', 0.067), ('ep', 0.062), ('separators', 0.061), ('examples', 0.06), ('poly', 0.059), ('actively', 0.059), ('hd', 0.057), ('halfspace', 0.057), ('private', 0.055), ('log', 0.054), ('pac', 0.052), ('uncorrelated', 0.048), ('access', 0.046), ('concept', 0.046), ('selective', 0.045), ('blum', 0.044), ('ipped', 0.043), ('guesses', 0.043), ('simulating', 0.043), ('classi', 0.041), ('beygelzimer', 0.041), ('disagreement', 0.039), ('rectangles', 0.039), ('kearns', 0.039), ('ask', 0.039), ('response', 0.038), ('differentially', 0.038), ('dunagan', 0.038), ('logconcave', 0.038), ('vitaly', 0.038), ('agnostic', 0.038), ('converted', 0.038), ('labels', 0.036), ('simulation', 0.036), ('dasgupta', 0.036), ('ci', 0.036), ('complexity', 0.035), ('queried', 0.034), ('database', 0.034), ('target', 0.034), ('hypothesis', 0.034), ('colt', 0.034), ('surviving', 0.033), ('uniform', 0.033), ('algorithms', 0.033), ('automatically', 0.033), ('give', 0.033), ('corrupted', 0.032), ('outputs', 0.032), ('thresholds', 0.031), ('learns', 0.03), ('classic', 0.03), ('interval', 0.03), ('simulate', 0.03), ('hypotheses', 0.029), ('dis', 0.029), ('dependence', 0.029), ('noiseless', 0.028), ('round', 0.028), ('dwork', 0.027), ('conditioned', 0.027), ('simulated', 0.027), ('analogues', 0.026), ('requests', 0.026), ('sampling', 0.026), ('algorithm', 0.026), ('gentile', 0.025), ('analogue', 0.025), ('characterizations', 0.025), ('margin', 0.025), ('exponential', 0.025), ('classes', 0.024), ('mcsherry', 0.024), ('vempala', 0.024), ('record', 0.024), ('densities', 0.024), ('savings', 0.024), ('realizable', 0.024), ('designing', 0.023), ('valid', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

2 0.38958457 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

3 0.22852167 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

4 0.20932163 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

5 0.18189195 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

Author: Ziteng Wang, Kai Fan, Jiaqi Zhang, Liwei Wang

Abstract: We study differentially private mechanisms for answering smooth queries on databases consisting of data points in Rd . A K-smooth query is specified by a function whose partial derivatives up to order K are all bounded. We develop an -differentially private mechanism which for the class of K-smooth queries has K accuracy O(n− 2d+K / ). The mechanism first outputs a summary of the database. To obtain an answer of a query, the user runs a public evaluation algorithm which contains no information of the database. Outputting the summary runs in time d O(n1+ 2d+K ), and the evaluation algorithm for answering a query runs in time d+2+ 2d K ˜ O(n 2d+K ). Our mechanism is based on L∞ -approximation of (transformed) smooth functions by low degree even trigonometric polynomials with small and efficiently computable coefficients. 1

6 0.16469049 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

7 0.16075775 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

8 0.11526166 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

9 0.11252446 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

10 0.11167423 171 nips-2013-Learning with Noisy Labels

11 0.10657256 14 nips-2013-A Stability-based Validation Procedure for Differentially Private Machine Learning

12 0.10150943 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

13 0.090161175 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

14 0.090079203 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

15 0.087407731 69 nips-2013-Context-sensitive active sensing in humans

16 0.083807804 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

17 0.071777202 230 nips-2013-Online Learning with Costly Features and Labels

18 0.068716556 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

19 0.066717453 54 nips-2013-Bayesian optimization explains human active search

20 0.065552637 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.174), (1, 0.035), (2, 0.008), (3, -0.06), (4, 0.102), (5, 0.063), (6, -0.05), (7, -0.056), (8, -0.009), (9, 0.267), (10, 0.034), (11, -0.083), (12, -0.142), (13, 0.132), (14, -0.02), (15, -0.251), (16, -0.246), (17, 0.136), (18, 0.153), (19, -0.056), (20, -0.101), (21, -0.002), (22, 0.064), (23, -0.101), (24, 0.133), (25, -0.041), (26, -0.118), (27, 0.109), (28, -0.094), (29, 0.034), (30, 0.139), (31, 0.029), (32, 0.023), (33, -0.008), (34, -0.091), (35, -0.007), (36, 0.105), (37, 0.206), (38, 0.025), (39, 0.027), (40, 0.019), (41, -0.048), (42, -0.032), (43, 0.021), (44, -0.051), (45, -0.019), (46, -0.04), (47, -0.038), (48, 0.041), (49, -0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97397774 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

2 0.91153842 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

3 0.83256733 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

4 0.77579582 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

5 0.62898552 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

6 0.6051963 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

7 0.51990247 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

8 0.49108848 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

9 0.43996039 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

10 0.43143311 69 nips-2013-Context-sensitive active sensing in humans

11 0.38268897 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

12 0.35642803 171 nips-2013-Learning with Noisy Labels

13 0.31914848 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

14 0.30611545 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

15 0.29320708 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

16 0.28804654 54 nips-2013-Bayesian optimization explains human active search

17 0.28308713 223 nips-2013-On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation

18 0.2761758 350 nips-2013-Wavelets on Graphs via Deep Learning

19 0.27474129 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

20 0.27395418 14 nips-2013-A Stability-based Validation Procedure for Differentially Private Machine Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.023), (16, 0.023), (33, 0.115), (34, 0.07), (41, 0.02), (49, 0.023), (56, 0.136), (70, 0.016), (76, 0.017), (85, 0.035), (89, 0.038), (93, 0.03), (95, 0.359)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8688823 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Guy van den Broeck, Adnan Darwiche

Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1

2 0.82384819 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1

same-paper 3 0.80647081 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

4 0.75993919 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Author: Po-Ling Loh, Martin J. Wainwright

Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1

5 0.71633428 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

Author: Ying Liu, Alan Willsky

Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1

6 0.69578803 185 nips-2013-Matrix Completion From any Given Set of Observations

7 0.65645826 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

8 0.63745034 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

9 0.63018715 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

10 0.60406858 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

11 0.598005 149 nips-2013-Latent Structured Active Learning

12 0.59281236 184 nips-2013-Marginals-to-Models Reducibility

13 0.5899055 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

14 0.5849759 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

15 0.57623947 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

16 0.57005835 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

17 0.56380922 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

18 0.56344718 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

19 0.56307846 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

20 0.56164747 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion