nips nips2013 nips2013-149 knowledge-graph by maker-knowledge-mining

149 nips-2013-Latent Structured Active Learning


Source: pdf

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper we present active learning algorithms in the context of structured prediction problems. [sent-7, score-0.838]

2 To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. [sent-8, score-0.858]

3 We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. [sent-9, score-0.497]

4 In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. [sent-10, score-0.28]

5 Prediction in structured models is typically performed by maximizing a scoring function over the space of all possible outcomes, an NP-hard task for most graphical models. [sent-20, score-0.24]

6 Traditional learning algorithms for structured problems tackle the supervised setting [16, 33, 11], where input-output pairs are given and each structured output is fully labeled. [sent-21, score-0.527]

7 Obtaining fully labeled examples might, however, be very cumbersome as structured models often involve a large number of random variables, e. [sent-22, score-0.426]

8 Thus, reducing the amount of labeled examples required for learning the scoring function is key for the success of structured prediction in real-world applications. [sent-31, score-0.529]

9 The active learning setting is particularly beneficial as it has the potential to considerably reduce the amount of supervision required to learn a good model, by querying only the most informative examples. [sent-32, score-0.622]

10 In the structured case, active learning can be generalized to query only subparts of the graph for each example, reducing the amount of necessary labeling even further. [sent-33, score-1.07]

11 While a variety of active learning approaches exists for the case of classification and regression, the structured case has been less popular, perhaps because of its intrinsic computational difficulties as we have to deal with exponentially sized output spaces. [sent-34, score-0.731]

12 Existing approaches typically consider the case where exact inference is possible [7], label the full output space [7, 22], or rely on computationally expensive processes that require inference for each possible outcome of each random variable [34]. [sent-35, score-0.366]

13 In particular, we propose to select which parts to label based on the entropy of the local marginal distributions. [sent-38, score-0.367]

14 Our active learning algorithms exploit recently developed weakly supervised methods for structured prediction [28], showing that we can benefit from unlabeled examples and exploit the marginal distributions computed during learning. [sent-39, score-1.201]

15 Furthermore, computation is re-used at each active learning iteration, improving efficiency significantly. [sent-40, score-0.505]

16 In particular, we match the performance of the state-of-the-art in this task [27] while only labeling ∼10% of the random variables. [sent-44, score-0.233]

17 We then propose our active learning algorithms, and show our experimental evaluation followed by a discussion on related work and conclusions. [sent-46, score-0.505]

18 2 Maximum Likelihood Structure Prediction We begin by reviewing structured prediction approaches that employ both fully labeled training sets as well as those that handle latent variables. [sent-47, score-0.65]

19 Of particular interest to us are probabilistic formulations since we employ entropies of local probability distributions as our criteria for deciding which parts of the graph to label during each active learning step. [sent-48, score-0.967]

20 , an image or a sentence), and let s ∈ S be the structured labeled space that we are interested in predicting (e. [sent-51, score-0.439]

21 Here we consider log-linear distributions pw (s|x) describing the probability over a structured label space S given an object x ∈ X as pw (s|x) ∝ exp w φ(x, s) . [sent-55, score-0.415]

22 1 Supervised Setting To define “good,” in the supervised setting we are given a training set D = {(xi , si )N } containing i=1 N pairs, each composed of an input x ∈ X and some fully labeled data s ∈ S. [sent-58, score-0.389]

23 2 Dealing with Latent Variables In the weakly supervised setting, we are given a training set D = {(xi , yi )N } containing N pairs, i=1 each composed of an input x ∈ X and some partially labeled data y ∈ Y ⊆ S. [sent-73, score-0.439]

24 We leverage the decomposition within the features to also approximate the entropy over the joint distribution q(x,y) (h) by local ones ranging over marginals. [sent-93, score-0.236]

25 To this end we change to the dual space, employ the entropy approximations and transform the resulting surrogate function back to the primal space where we obtain Lagrange multipliers λ which enforce the marginalization constraints. [sent-96, score-0.242]

26 the local beliefs d to solve the latent variable prediction problem, and performing a gradient step w. [sent-104, score-0.331]

27 The latter is equivalent to solving a supervised conditional random field problem given the distribution over latent variables inferred in the preceding latent variable prediction step. [sent-108, score-0.437]

28 We augment [28], and return not only the weights but also the local beliefs d which represent the joint distribution q(x,y) (h), i. [sent-109, score-0.251]

29 3 Algorithm 1 latent structured prediction Input: data D, initial weights w repeat repeat //solve latent variable prediction problem mind f2 + f3 s. [sent-115, score-0.709]

30 In the active learning setting, we assume a given training set DS = {(xi , yi )NL } containing NL pairs, each composed i=1 by an input x ∈ X and some partially labeled data y ∈ Y ⊆ S. [sent-119, score-0.775]

31 i=1 We are interested in answering the following question: which part of the graph for which example should we labeled in order to learn the best model with the least amount of supervision? [sent-122, score-0.293]

32 Towards this goal, we derive iterative algorithms which select the random variables to be labeled based on the local entropies. [sent-123, score-0.276]

33 This is intuitive, as entropy is a surrogate for uncertainty and useful for the considered application since the cost of labeling a random variable is independent of the selection. [sent-124, score-0.464]

34 Towards this goal, we need to compute the entropies of the marginal distributions over each latent variable, as well as the entropy over each random variable of the unlabeled examples. [sent-126, score-0.561]

35 In this paper we derive two active learning algorithms, each with a different trade-off between accuracy and computational complexity. [sent-128, score-0.505]

36 Separate active: Our first algorithm utilizes the labeled and weakly labeled examples to learn at each iteration. [sent-129, score-0.461]

37 Once the parameters are learned it performs inference over the unlabeled and partially labeled examples to query for the next random variable to label. [sent-130, score-0.515]

38 Thus, it requires a separate inference step for each active learning iteration. [sent-131, score-0.688]

39 Joint active: Our second active learning algorithm takes advantage of unlabeled examples during learning and no extra effort is required to compute the most informative random variable. [sent-135, score-0.773]

40 Note that this contrasts active learning algorithms which typically do not exploit unlabeled data during learning and require very expensive computations in order to select the next example or random variable to be labeled. [sent-136, score-0.745]

41 Let D1 = DS ∪ DU be the set of all training examples containing both fully labeled, partially labeled and unlabeled examples. [sent-137, score-0.45]

42 At each iteration we obtain Dt by querying the label of a random variable not being labeled in Dt−1 . [sent-138, score-0.405]

43 The entropies are readily computable in close form, as the local beliefs d are computed during learning. [sent-144, score-0.253]

44 The local entropies are |Hi | then given by H(di ) = − hi =1 di (hi ) log di (hi ), and we query the variable that has the highest entropy, i. [sent-146, score-0.481]

45 Note that this computation is linear in the number of unlabeled random variables and linear in the number of states. [sent-149, score-0.224]

46 Note that this algorithm is more expensive than the previous one as learning employs the fully, weakly and unlabeled examples. [sent-152, score-0.323]

47 However, as shown in our experimental evaluation, it can dramatically reduce the amount of labeling required to learn a good model. [sent-154, score-0.268]

48 Batch mode: The two previously defined active learning approaches are computationally expensive as for each sequential active learning step, a new model has to be learned and inference has to be performed over all latent variables. [sent-155, score-1.163]

49 We also investigate batch algorithms which label k random variables at each step of the algorithm. [sent-156, score-0.271]

50 Re-using computation: Warm starting the learning algorithm after each active learning query is important in order to reduce the number of iterations required for convergence. [sent-159, score-0.605]

51 Only afterwards and together with Lagrange multipliers from the other training images and the current weights, we perform the next iteration and another active step. [sent-164, score-0.639]

52 On the other hand, since we take advantage of all the unlabeled data during the joint active learning algorithm (Alg. [sent-165, score-0.754]

53 Without any further updates we directly start a new active step. [sent-167, score-0.505]

54 In our experimental evaluation we show that this choice results in dramatic speed ups when compared to randomly initializing the weights and messages during every active learning iteration. [sent-168, score-0.601]

55 3) requires a larger number of iterations to converge as it employs large amounts of unlabeled data. [sent-170, score-0.239]

56 After a few iterations, convergence for the following active learning steps improves significantly requiring about as much time as the separate approach (Alg. [sent-171, score-0.641]

57 4 Experimental Evaluation We demonstrate the performance of our algorithms on the task of predicting the 3D layout of rooms from a single image. [sent-173, score-0.231]

58 Existing approaches formulate the task as a structured prediction problem focusing on estimating the 3D box which best describes the layout. [sent-174, score-0.289]

59 , the existence of three dominant vanishing points which are orthonormal), and given the vanishing points, the problem can be formulated as inference in a pairwise graphical model composed of four random variables [27]. [sent-177, score-0.338]

60 Our features φ count for each face in the cuboid (given a particular configuration of the layout) the number of pixels with a certain label for OM and the probability that such label exists for GC and the task-loss denotes the pixel-wise prediction error. [sent-181, score-0.401]

61 5 s1 s2 s1 s3 s2 s3 s4 s4 Figure 1: Parameterization and factor graph for the 3D layout prediction task. [sent-182, score-0.32]

62 16 20 40 60 number of queries 80 0 (b) k = 4 random separate joint best 0. [sent-198, score-0.329]

63 22 pixelwise test error random pixelwise test error 0. [sent-199, score-0.456]

64 16 20 40 60 number of queries (c) k = 8 80 0 20 40 60 number of queries 80 (d) k = 12 Figure 2: Test set error as a function of the number of random variables labeled, when using joint vs separate active learning. [sent-203, score-1.002]

65 The different plots reflect scenarios where the top k random variables are labeled at each iteration (i. [sent-204, score-0.232]

66 Unless otherwise stated all experiments are performed by averaging over 20 runs of the algorithm, where the initial seed of 10 fully labeled images is selected at random. [sent-209, score-0.24]

67 Active learning: We begin our experimentation by comparing the two proposed active learning algorithms, i. [sent-210, score-0.505]

68 2(a), both active learning algorithms achieve much lower test error than an algorithm that selects which variables to label at random. [sent-216, score-0.75]

69 Also, note that the joint algorithm takes advantage of unlabeled data and achieves good performance after labeling only a few variables, improving significantly over the separate algorithm. [sent-217, score-0.618]

70 2 shows the performances of both active learning algorithms when labeling a batch of k random variables before re-learning. [sent-219, score-0.862]

71 Image vs random variable: Instead of labeling a random variable at a time, we also experiment with an algorithm that labels the four variables of an image at once. [sent-222, score-0.398]

72 Note that this setting is equivalent to labeling four random variables per image. [sent-223, score-0.298]

73 3(a), labeling the full image requires more labeling to achieve the same test error performance when compared to labeling random variables from possibly different examples. [sent-225, score-0.852]

74 3(b) and (c) show the performance of our active learning algorithms as a function of . [sent-227, score-0.505]

75 Our active learning algorithm thus prefers smaller values of . [sent-232, score-0.505]

76 4(a) we illustrate the number of CCCP iterations as a function of the number of queried examples for both active learning algorithms. [sent-240, score-0.635]

77 18 20 40 60 number of queries (b) 80 0 separate ε=0. [sent-264, score-0.239]

78 24 20 40 60 number of queries (c) 80 joint 0 0 20 40 state 60 (d) Marginal distribution Figure 3: Test set error as a function of the number of random variables labeled ((a)-(c)). [sent-272, score-0.425]

79 4(b) and (c) show the number of finished active learning iterations as a function of time for the joint and separate algorithm respectively. [sent-278, score-0.772]

80 Note that by reusing computation, a much larger number of active learning iterations finishes when given a specific time budget. [sent-279, score-0.591]

81 Over the years many different strategies have been proposed in the context of active learning algorithms to decide which example to label next. [sent-284, score-0.732]

82 An alternative way to classify active learning algorithms is related to the information revealed after querying for a label. [sent-287, score-0.586]

83 Active learning approaches have been proposed in the context of Neural Networks [6], Support Vector Machines [32], Gaussian processes [14], CRFs [7] and structured max-margin formulations [22]. [sent-292, score-0.226]

84 Contrasting many of the previously proposed approaches we consider active learning as an extension of a latent structured prediction setting, i. [sent-293, score-0.864]

85 Importantly, our active learning algorithm follows the recent ideas to unify CRFs and structured SVMs. [sent-296, score-0.687]

86 7 The first application of active learning in computer vision was developed by Kapoor et al. [sent-298, score-0.505]

87 In the context of structured models [8] proposed to use conditional entropies to decide which image to label next in a segmentation task. [sent-300, score-0.7]

88 In [36] the set of frames to label in a video sequence is selected based on the cost of labeling each frame and the cost of correcting errors. [sent-301, score-0.42]

89 GrabCut [23] popularized the use of “active learning” for figure ground segmentation, where the question of what to labeled next is answered by a human via an interactive segmentation system. [sent-304, score-0.304]

90 [31] propose to label that variable which most reduces the entropy of the entire “system,” i. [sent-306, score-0.294]

91 In [15], the next region to be labeled is selected based on a surrogate of uncertainty (i. [sent-309, score-0.251]

92 Entropy was used as an active learning criteria for tree-structured models [21], where marginal probabilities can be computed exactly. [sent-316, score-0.546]

93 [9] frame active learning as a semi-supervised learning problem over a graph. [sent-318, score-0.545]

94 They utilized the entropy as a metric for selecting which superpixel to label within their graph regularization approach. [sent-319, score-0.305]

95 [34] resort to the expected change as the criteria to select which parts to label in the graphical model. [sent-326, score-0.238]

96 6 Conclusions We have proposed active learning algorithms in the context of structure models which utilized local entropies in order to decide which subset of the output space for which example to label. [sent-328, score-0.817]

97 We have demonstrated the effectiveness of our approach in the problem of 3D room layout prediction given a single image, and we showed that state-of-the-art performance can be obtained while only employing ∼10% of the labelings. [sent-329, score-0.264]

98 Conditional random fields: Probabilistic models for segmenting and labeling sequence data. [sent-444, score-0.233]

99 Toward optimal active learning through sampling estimation of error reduction. [sent-500, score-0.505]

100 Support vector machine active learning with applications to text classification. [sent-560, score-0.505]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('active', 0.505), ('labeling', 0.233), ('pixelwise', 0.195), ('structured', 0.182), ('du', 0.173), ('labeled', 0.167), ('ds', 0.159), ('unlabeled', 0.159), ('layout', 0.157), ('label', 0.147), ('entropies', 0.144), ('schwing', 0.14), ('separate', 0.136), ('prediction', 0.107), ('queries', 0.103), ('entropy', 0.102), ('cccp', 0.101), ('segmentation', 0.092), ('joint', 0.09), ('weakly', 0.089), ('pollefeys', 0.084), ('hi', 0.081), ('supervised', 0.08), ('rooms', 0.074), ('img', 0.074), ('reuse', 0.072), ('latent', 0.07), ('vijayanarasimhan', 0.068), ('variables', 0.065), ('beliefs', 0.065), ('cohn', 0.061), ('hazan', 0.061), ('batch', 0.059), ('query', 0.059), ('lagrange', 0.059), ('graphical', 0.058), ('ln', 0.056), ('vanishing', 0.056), ('composed', 0.056), ('graph', 0.056), ('finished', 0.056), ('siddiquie', 0.056), ('vezhnevets', 0.056), ('image', 0.055), ('di', 0.054), ('multipliers', 0.053), ('weights', 0.052), ('queried', 0.051), ('fathi', 0.049), ('grabcut', 0.049), ('surrogate', 0.049), ('cvpr', 0.049), ('training', 0.047), ('bandit', 0.047), ('inference', 0.047), ('querying', 0.046), ('reusing', 0.045), ('coactive', 0.045), ('om', 0.045), ('urtasun', 0.045), ('interactive', 0.045), ('energies', 0.045), ('variable', 0.045), ('messages', 0.044), ('context', 0.044), ('local', 0.044), ('output', 0.044), ('kapoor', 0.043), ('pw', 0.043), ('tti', 0.043), ('hebert', 0.043), ('marginal', 0.041), ('iterations', 0.041), ('crfs', 0.041), ('frame', 0.04), ('fully', 0.039), ('employs', 0.039), ('wt', 0.039), ('hoiem', 0.039), ('employ', 0.038), ('examples', 0.038), ('repeat', 0.038), ('holistic', 0.037), ('effort', 0.036), ('decide', 0.036), ('var', 0.036), ('expensive', 0.036), ('supervision', 0.036), ('gc', 0.036), ('revealed', 0.035), ('interested', 0.035), ('uncertainty', 0.035), ('atlas', 0.035), ('lewis', 0.035), ('extra', 0.035), ('amount', 0.035), ('nl', 0.034), ('images', 0.034), ('parts', 0.033), ('test', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

2 0.38958457 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.2016518 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu

Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classifier drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classifier selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a fixed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random fields and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classification task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1

4 0.16401012 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

5 0.16380911 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

6 0.15927994 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

7 0.13844207 318 nips-2013-Structured Learning via Logistic Regression

8 0.12877183 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

9 0.12300334 75 nips-2013-Convex Two-Layer Modeling

10 0.096214287 211 nips-2013-Non-Linear Domain Adaptation with Boosting

11 0.095066778 230 nips-2013-Online Learning with Costly Features and Labels

12 0.091159403 69 nips-2013-Context-sensitive active sensing in humans

13 0.088205233 335 nips-2013-Transfer Learning in a Transductive Setting

14 0.084542021 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

15 0.083178289 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

16 0.080230705 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

17 0.079114884 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations

18 0.079017811 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators

19 0.07770016 148 nips-2013-Latent Maximum Margin Clustering

20 0.07723555 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.251), (1, 0.046), (2, -0.068), (3, -0.065), (4, 0.129), (5, 0.039), (6, -0.041), (7, -0.046), (8, 0.019), (9, 0.138), (10, -0.036), (11, -0.048), (12, -0.067), (13, 0.025), (14, -0.012), (15, -0.223), (16, -0.266), (17, 0.12), (18, 0.127), (19, -0.047), (20, -0.149), (21, 0.046), (22, 0.099), (23, -0.121), (24, 0.118), (25, -0.023), (26, -0.064), (27, 0.1), (28, -0.092), (29, 0.084), (30, 0.141), (31, 0.047), (32, 0.025), (33, 0.008), (34, -0.025), (35, 0.035), (36, 0.036), (37, 0.123), (38, 0.057), (39, 0.085), (40, 0.023), (41, -0.066), (42, -0.033), (43, -0.071), (44, -0.025), (45, -0.074), (46, -0.02), (47, 0.026), (48, 0.093), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97376245 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

2 0.9070127 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.87367141 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

4 0.73496926 60 nips-2013-Buy-in-Bulk Active Learning

Author: Liu Yang, Jaime Carbonell

Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1

5 0.6945464 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

6 0.61250538 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

7 0.52275413 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

8 0.51761115 69 nips-2013-Context-sensitive active sensing in humans

9 0.50220335 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

10 0.45433298 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

11 0.41842884 350 nips-2013-Wavelets on Graphs via Deep Learning

12 0.41689676 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

13 0.41166297 318 nips-2013-Structured Learning via Logistic Regression

14 0.40842831 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

15 0.40627572 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

16 0.40416884 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning

17 0.40253201 67 nips-2013-Conditional Random Fields via Univariate Exponential Families

18 0.39953041 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions

19 0.39128599 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

20 0.38595173 75 nips-2013-Convex Two-Layer Modeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.021), (16, 0.04), (33, 0.185), (34, 0.101), (36, 0.014), (41, 0.048), (46, 0.011), (49, 0.041), (56, 0.118), (70, 0.018), (76, 0.116), (85, 0.035), (89, 0.025), (93, 0.077), (95, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94539928 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

2 0.93367159 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling

Author: Mahito Sugiyama, Karsten Borgwardt

Abstract: Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search. 1

same-paper 3 0.91758746 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

4 0.91493791 14 nips-2013-A Stability-based Validation Procedure for Differentially Private Machine Learning

Author: Kamalika Chaudhuri, Staal A. Vinterbo

Abstract: Differential privacy is a cryptographically motivated definition of privacy which has gained considerable attention in the algorithms, machine-learning and datamining communities. While there has been an explosion of work on differentially private machine learning algorithms, a major barrier to achieving end-to-end differential privacy in practical machine learning applications is the lack of an effective procedure for differentially private parameter tuning, or, determining the parameter value, such as a bin size in a histogram, or a regularization parameter, that is suitable for a particular application. In this paper, we introduce a generic validation procedure for differentially private machine learning algorithms that apply when a certain stability condition holds on the training algorithm and the validation performance metric. The training data size and the privacy budget used for training in our procedure is independent of the number of parameter values searched over. We apply our generic procedure to two fundamental tasks in statistics and machine-learning – training a regularized linear classifier and building a histogram density estimator that result in end-toend differentially private solutions for these problems. 1

5 0.90501958 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

Author: Kohei Hayashi, Ryohei Fujimaki

Abstract: This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has the same form as those of mixture models. FAB/LFMs have several desirable properties (e.g., automatic hidden states selection and parameter identifiability) and empirically perform better than state-of-the-art Indian Buffet processes in terms of model selection, prediction, and computational efficiency. 1

6 0.89932764 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

7 0.88986313 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

8 0.88805175 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

9 0.88074124 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

10 0.87889028 268 nips-2013-Reflection methods for user-friendly submodular optimization

11 0.87685585 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

12 0.87499422 318 nips-2013-Structured Learning via Logistic Regression

13 0.8748886 99 nips-2013-Dropout Training as Adaptive Regularization

14 0.874143 201 nips-2013-Multi-Task Bayesian Optimization

15 0.87029165 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

16 0.86889434 75 nips-2013-Convex Two-Layer Modeling

17 0.86844838 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

18 0.86737365 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

19 0.8669697 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

20 0.86224461 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances