nips nips2010 nips2010-25 knowledge-graph by maker-knowledge-mining

25 nips-2010-Active Learning by Querying Informative and Representative Examples

Source: pdf

Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou

Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in ﬁnding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Most active learning approaches select either informative or representative unlabeled instances to query their labels. [sent-6, score-1.648]

2 Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in ﬁnding unlabeled instances that are both informative and representative. [sent-7, score-1.647]

3 We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. [sent-8, score-0.352]

4 The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. [sent-9, score-0.939]

5 Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. [sent-10, score-0.354]

6 1 Introduction In this work, we focus on the pool-based active learning, which selects an unlabeled instance from a given pool for manually labeling. [sent-11, score-0.789]

7 , informativeness and representativeness, that are widely used for active query selection. [sent-14, score-0.799]

8 Informativeness measures the ability of an instance in reducing the uncertainty of a statistical model, while representativeness measures if an instance well represents the overall input patterns of unlabeled data [16]. [sent-15, score-1.062]

9 Although several active learning algorithms [19, 8, 11] have been proposed to ﬁnd the unlabeled instances that are both informative and representative, they are usually ad hoc in measuring the informativeness and representativeness of an instance, leading to suboptimal performance. [sent-17, score-2.309]

10 In this paper, we propose a new active learning approach by QUerying Informative and Representative Examples (Q UIRE for short). [sent-18, score-0.329]

11 The proposed approach is based on the min-max view of active learning [11], which provides a systematic way for measuring and combining the informativeness and the representativeness. [sent-19, score-0.819]

12 The rest of this paper is organized as follows: Section 2 reviews the related work on active learning; Section 3 presents the proposed approach in details; experimental results are reported in Section 4; Section 5 concludes this work with issues to be addressed in the future. [sent-21, score-0.398]

13 Exemplar approaches include query-by-committee [17, 6, 10], uncertainty sampling [13, 12, 18, 2] and optimal experimental design [9, 20]. [sent-23, score-0.088]

14 The main weakness of these approaches is that they are unable to exploit the abundance of unlabeled data and the selection of query instances is solely determined by a small number of labeled examples, making it prone to sample bias. [sent-24, score-1.194]

15 Another school of active learning is to select the instances that are most representative to the unlabeled data. [sent-25, score-1.222]

16 These approaches aim to exploit the cluster structure of unlabeled data [14, 7], usually by a clustering method. [sent-26, score-0.525]

17 The main weakness of these approaches is that their performance heavily depends on the quality of clustering results [7]. [sent-27, score-0.142]

18 Several active learning algorithms tried to combine the informativeness measure with the representativeness measure for ﬁnding the optimal query instances. [sent-28, score-1.331]

19 In [19], the authors propose a sampling algorithm that exploits both the cluster information and the classiﬁcation margins of unlabeled instances. [sent-29, score-0.433]

20 One limitation of this approach is that since clustering is only performed on the instances within the classiﬁcation margin, it is unable to exploit the unlabeled instances outside the margin. [sent-30, score-1.127]

21 extended the active learning approach in [14] by dynamically balancing the uncertainty and the density of instances for query selection. [sent-32, score-0.926]

22 This approach is ad hoc in combining the measure of informativeness and representativeness for query selection, leading to suboptimal performance. [sent-33, score-1.269]

23 Our work is based on the min-max view of active learning, which was ﬁrst proposed in the study of batch mode active learning [11]. [sent-34, score-0.674]

24 3 QUIRE: QUery Informative and Representative Examples We start with a synthesized example that illustrates the importance of querying instances that are both informative and representative for active learning. [sent-36, score-1.247]

25 Figure 1 (a) shows a binary classiﬁcation problem with each class represented by a different legend. [sent-37, score-0.033]

26 We examine three different active learning algorithms by allowing them to sequentially select 15 data points. [sent-38, score-0.354]

27 Figure 1 (b) and (c) show the data points selected by an approach favoring informative instances (i. [sent-39, score-0.764]

28 , [18]) and by an approach favoring representative instances (i. [sent-41, score-0.748]

29 As indicated by Figure 1 (b), due to the sample bias, the approach preferring informative instances tends to choose the data points close to the horizontal line, leading to incorrect decision boundaries. [sent-44, score-0.771]

30 On the other hand, as indicated by Figure 1 (c), the approach preferring representative instances is able to identify the approximately correct decision boundary but with a slow convergence. [sent-45, score-0.76]

31 Figure 1 (d) shows the data points selected by the proposed approach that favors data points that are both informative and representative. [sent-46, score-0.32]

32 It is clear that the proposed algorithm is more efﬁcient in ﬁnding the accurate decision boundary than the other two approaches. [sent-47, score-0.123]

33 2 Active learning selects one instance xs from the pool of unlabeled data to query its class label. [sent-49, score-0.809]

34 For convenience, we divide the data set D into three parts: the labeled data Dl , the currently selected instance xs , and the rest of the unlabeled data Du . [sent-50, score-0.636]

35 We also use Da = Du ∪ {xs } to represent all the unlabeled instances. [sent-51, score-0.347]

36 We use y = [yl , ys , yu ] for the class label assignment of the entire data set, where yl , ys and yu are the class labels assigned to Dl , xs and Du , respectively. [sent-52, score-0.536]

37 Finally, we denote by ya = [ys , yu ] the class assignment for all the unlabeled instances. [sent-53, score-0.488]

38 1 The Framework To motivate the proposed approach, we ﬁrst re-examine the margin-based active learning from the viewpoint of min-max [11]. [sent-55, score-0.377]

39 Let f ∗ be a classiﬁcation model trained by the labeled examples, i. [sent-56, score-0.062]

40 , nl λ f ∗ = arg min |f |2 + ℓ(yi , f (xi )), (1) 2 H i=1 f ∈H where H is a reproducing kernel Hilbert space endowed with kernel function κ(·, ·) : Rd × Rd → R. [sent-58, score-0.161]

41 Given the classiﬁer f ∗ , the margin-based approach chooses the unlabeled instance closest to the decision boundary, i. [sent-60, score-0.482]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('representativeness', 0.448), ('unlabeled', 0.347), ('informativeness', 0.325), ('instances', 0.315), ('active', 0.302), ('representative', 0.229), ('informative', 0.225), ('favoring', 0.177), ('query', 0.172), ('querying', 0.142), ('uire', 0.135), ('xs', 0.117), ('nl', 0.114), ('xnl', 0.09), ('hoc', 0.088), ('ys', 0.085), ('nanjing', 0.079), ('du', 0.07), ('instance', 0.067), ('preferring', 0.064), ('yl', 0.064), ('ad', 0.063), ('labeled', 0.062), ('weakness', 0.061), ('uncertainty', 0.059), ('boundary', 0.057), ('dl', 0.055), ('criteria', 0.055), ('leading', 0.053), ('exploit', 0.046), ('suboptimal', 0.045), ('unable', 0.044), ('pool', 0.044), ('systematic', 0.043), ('measuring', 0.043), ('yu', 0.043), ('decision', 0.041), ('cluster', 0.04), ('lansing', 0.039), ('rongjin', 0.039), ('measures', 0.037), ('abundance', 0.036), ('deploy', 0.036), ('selection', 0.034), ('east', 0.034), ('rong', 0.034), ('synthesized', 0.034), ('xid', 0.034), ('class', 0.033), ('clustering', 0.033), ('assignment', 0.033), ('ya', 0.032), ('examples', 0.032), ('exemplar', 0.031), ('usually', 0.03), ('selects', 0.029), ('michigan', 0.029), ('undesirable', 0.029), ('approaches', 0.029), ('select', 0.029), ('classi', 0.028), ('combining', 0.028), ('viewpoint', 0.027), ('china', 0.027), ('da', 0.027), ('indicated', 0.027), ('approach', 0.027), ('endowed', 0.027), ('margins', 0.027), ('view', 0.026), ('dynamically', 0.026), ('prone', 0.026), ('combine', 0.025), ('balancing', 0.025), ('proposed', 0.025), ('termed', 0.024), ('concludes', 0.023), ('illustrative', 0.023), ('sequentially', 0.023), ('bias', 0.023), ('favors', 0.023), ('divide', 0.023), ('motivate', 0.023), ('solely', 0.022), ('serious', 0.022), ('rd', 0.022), ('probably', 0.022), ('measured', 0.021), ('convenience', 0.021), ('reviews', 0.021), ('nding', 0.021), ('measure', 0.02), ('reproducing', 0.02), ('selected', 0.02), ('prediction', 0.019), ('heavily', 0.019), ('batch', 0.019), ('incorrect', 0.019), ('tried', 0.019), ('exploits', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 25 nips-2010-Active Learning by Querying Informative and Representative Examples

Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou

2 0.33350444 23 nips-2010-Active Instance Sampling via Matrix Partition

Author: Yuhong Guo

Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classiﬁcation models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1

3 0.1717011 27 nips-2010-Agnostic Active Learning Without Constraints

Author: Alina Beygelzimer, John Langford, Zhang Tong, Daniel J. Hsu

Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classiﬁcation. 1

4 0.14722428 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

Author: Wei Wang, Zhi-hua Zhou

Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1 ), contrasting to single-view setting where the polynomial improveǫ ment is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is O( 1 ), where the order of 1/ǫ is independent of the parameter in Tsybakov noise, ǫ contrasting to previous polynomial bounds where the order of 1/ǫ is related to the parameter in Tsybakov noise. 1

5 0.1442169 22 nips-2010-Active Estimation of F-Measures

Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer

Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a ﬁxed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reﬂect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1

6 0.1436478 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

7 0.10433047 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

8 0.08452376 151 nips-2010-Learning from Candidate Labeling Sets

9 0.073327266 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

10 0.071390025 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation

11 0.068349607 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

12 0.06579221 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

13 0.065557525 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

14 0.065297268 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

15 0.058397904 114 nips-2010-Humans Learn Using Manifolds, Reluctantly

16 0.057715207 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

17 0.04884021 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

18 0.04763782 228 nips-2010-Reverse Multi-Label Learning

19 0.046533339 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

20 0.046087004 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.127), (1, 0.035), (2, 0.088), (3, -0.07), (4, -0.004), (5, 0.101), (6, -0.104), (7, -0.132), (8, 0.055), (9, -0.294), (10, 0.094), (11, -0.013), (12, -0.093), (13, -0.125), (14, -0.002), (15, -0.074), (16, -0.305), (17, -0.043), (18, -0.04), (19, 0.107), (20, -0.044), (21, 0.122), (22, -0.143), (23, 0.071), (24, 0.008), (25, -0.039), (26, 0.014), (27, -0.072), (28, -0.029), (29, -0.03), (30, 0.005), (31, -0.085), (32, -0.059), (33, 0.035), (34, -0.068), (35, -0.057), (36, -0.083), (37, -0.022), (38, 0.092), (39, 0.04), (40, 0.101), (41, 0.026), (42, 0.061), (43, -0.149), (44, -0.044), (45, 0.01), (46, -0.038), (47, -0.042), (48, 0.05), (49, -0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98181111 25 nips-2010-Active Learning by Querying Informative and Representative Examples

Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou

2 0.86010939 23 nips-2010-Active Instance Sampling via Matrix Partition

Author: Yuhong Guo

3 0.73221999 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Abstract: We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our ﬁrst approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reﬂects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our ﬁrst method’s preprocessing stage is more efﬁcient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classiﬁer as a query, our algorithm identiﬁes those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1

4 0.70262498 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

Author: Wei Wang, Zhi-hua Zhou

5 0.61921644 27 nips-2010-Agnostic Active Learning Without Constraints

Author: Alina Beygelzimer, John Langford, Zhang Tong, Daniel J. Hsu

6 0.59532881 22 nips-2010-Active Estimation of F-Measures

7 0.40552872 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

8 0.39359134 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

9 0.38674334 114 nips-2010-Humans Learn Using Manifolds, Reluctantly

10 0.35796878 151 nips-2010-Learning from Candidate Labeling Sets

11 0.3499611 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

12 0.3405343 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation

13 0.31587073 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

14 0.30331463 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

15 0.28877863 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

16 0.28312221 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

17 0.27807644 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

18 0.27791327 2 nips-2010-A Bayesian Approach to Concept Drift

19 0.25228772 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

20 0.25106341 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.033), (27, 0.027), (30, 0.035), (45, 0.308), (50, 0.044), (52, 0.02), (60, 0.029), (77, 0.013), (78, 0.114), (80, 0.235), (90, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88373297 25 nips-2010-Active Learning by Querying Informative and Representative Examples

Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou

2 0.81863242 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but ﬁlter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the ﬁltering loss of the ensemble as a theoretical justiﬁcation of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We ﬁnd that our approach signiﬁcantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1

3 0.81827426 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

4 0.8164413 151 nips-2010-Learning from Candidate Labeling Sets

Author: Jie Luo, Francesco Orabona

Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classiﬁers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difﬁcult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efﬁcient algorithm to solve it. Experiments conducted on artiﬁcial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.

5 0.80357218 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics

Author: Thomas Peel, Sandrine Anthoine, Liva Ralaivola

Abstract: We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernsteintype inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efﬁcient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions. 1

6 0.79944474 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

7 0.79843426 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

8 0.79167461 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

9 0.79005677 23 nips-2010-Active Instance Sampling via Matrix Partition

10 0.78784931 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

11 0.78434402 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

12 0.78174156 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

13 0.78149104 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

14 0.78082693 22 nips-2010-Active Estimation of F-Measures

15 0.77772665 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

16 0.77729589 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

17 0.77724433 177 nips-2010-Multitask Learning without Label Correspondences

18 0.77720147 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

19 0.77459395 243 nips-2010-Smoothness, Low Noise and Fast Rates

20 0.77452391 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average