nips nips2010 nips2010-23 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Recently, batch-mode active learning has attracted a lot of attention. [sent-2, score-0.277]
2 In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. [sent-3, score-0.978]
3 By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. [sent-4, score-0.549]
4 Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. [sent-5, score-0.436]
5 The proposed active learning approach is independent of employed classification models. [sent-6, score-0.302]
6 Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. [sent-7, score-0.365]
7 1 Introduction Active learning is well-motivated in many supervised learning scenarios where unlabeled instances are abundant and easy to retrieve but labels are difficult, time-consuming, or expensive to obtain. [sent-8, score-0.531]
8 For example, it is easy to gather large amounts of unlabeled documents or images from the Internet, whereas labeling them requires manual effort from experienced human annotators. [sent-9, score-0.328]
9 Randomly selecting unlabeled instances for labeling is inefficient in many situations, since non-informative or redundant instances might be selected. [sent-10, score-0.906]
10 Given a large pool of unlabeled instances, active learning provides a way to iteratively select the most informative unlabeled instances—the queries—from the pool to label. [sent-14, score-0.825]
11 Many researchers have addressed the active learning problem in various ways [13]. [sent-15, score-0.277]
12 Most have focused on selecting a single most informative unlabeled instance to query each time. [sent-16, score-0.363]
13 The ultimate goal for most such approaches is to select instances that could lead to a classifier with low generalization error. [sent-17, score-0.43]
14 Towards this, a few variants of a mutual information criterion have been employed in the literature to guide the active instance sampling process. [sent-18, score-0.546]
15 The approaches in [4][10] select the instance to maximize the increase of mutual information and the mutual information, respectively, between the selected set of instances and the remainder based on Gaussian process models. [sent-19, score-0.804]
16 The approach proposed in [5] seeks the instance whose optimistic label provides maximum mutual information about the labels of the remaining unlabeled instances. [sent-20, score-0.46]
17 This approach implicitly exploits the clustering information contained in the unlabeled data in an optimistic way. [sent-22, score-0.243]
18 The single instance selection active learning methods require tedious retraining with each single instance being labeled. [sent-23, score-0.48]
19 Thus, a batch-mode active learning strategy that selects multiple instances each time is more appropriate under these circumstances. [sent-29, score-0.627]
20 The challenge in batchmode active learning is how to properly assemble the optimal query batch. [sent-30, score-0.363]
21 Simply using a single instance selection strategy to select a batch of queries in each iteration does not work well, since it fails to take the information overlap between the multiple instances into account. [sent-31, score-0.711]
22 Principles for batch mode active learning need to be developed to address the multi-instance selection specifically. [sent-32, score-0.522]
23 Several sophisticated batch-mode active learning methods have been proposed for classification. [sent-33, score-0.277]
24 Most of these approaches use greedy heuristics to ensure the overall informativeness of the batch by taking both the individual informativeness and the diversity of the selected instances into account. [sent-34, score-0.731]
25 Schohn and Cohn [12] select instances according to their proximity to the dividing hyperplane for a linear SVM. [sent-35, score-0.404]
26 Brinker [2] considers an approach for SVMs that explicitly takes the diversity of the selected instances into account, in addition to individual informativeness. [sent-36, score-0.387]
27 [14] propose a representative sampling approach for SVM active learning, which also incorporates a diversity measure. [sent-38, score-0.341]
28 Specifically, they query cluster centroids for instances that lie close to the decision boundary. [sent-39, score-0.369]
29 [9] propose a novel batch-mode active learning scheme on SVMs that exploits semi-supervised kernel learning. [sent-43, score-0.306]
30 In particular, a kernel function is first learned from a mixture of labeled and unlabeled examples, and then is used to effectively identify the informative and diverse instances via a min-max framework. [sent-44, score-0.694]
31 Instead of using heuristic measures, Guo and Schuurmans [6] treat batch construction for logistic regression as a discriminative optimization problem, and attempt to construct the most informative batch directly. [sent-45, score-0.52]
32 Overall, these batch-mode active learning approaches all make batch selection decisions directly based on the classifiers employed. [sent-46, score-0.507]
33 In this paper, we propose a novel batch-mode active learning approach that makes query selection decisions independent of the classification model employed. [sent-47, score-0.399]
34 The idea is to select a batch of queries in each iteration by maximizing a general mutual information measure between the labeled instances and the unlabeled instances. [sent-48, score-1.051]
35 By employing a Gaussian process framework, this mutual information maximization problem can be further formulated as a matrix partition problem. [sent-49, score-0.439]
36 Although the matrix partition problem is an NP-hard combinatorial optimization, it can first be relaxed into a continuous optimization problem and then a good local solution can be obtained by exploiting an effective local optimization. [sent-50, score-0.346]
37 Unlike most active learning methods studied in the literature, our query selection method does not require knowledge of the employed classifier. [sent-52, score-0.4]
38 Our empirical studies show that the proposed batch-mode active learning approach can achieve superior or comparable performance to discriminative batch-mode active learning methods that have been optimized on specific classifiers. [sent-53, score-0.642]
39 Section 3 introduces the proposed matrix partition approach for batch-mode active learning. [sent-56, score-0.481]
40 In this section, we provide an overview of Gaussian processes and some of their important properties which we will exploit later to construct our active learning approach. [sent-60, score-0.306]
41 Given a set of instances X = [x1 ; x2 ; · · · ; xt ], a data modeling function f (·) can be viewed as a single sample from a Gaussian distribution with a mean function µ(·), and a covariance function C(·, ·). [sent-67, score-0.385]
42 In this section, we propose to conduct instance selective sampling using a maximum mutual information strategy which can then be formulated into a matrix partition problem. [sent-76, score-0.518]
43 Nevertheless, in active learning setting, we have a large number of unlabeled instances available that come from the same distribution as the future test instances. [sent-80, score-0.808]
44 Thus we can select instances that lead to a labeled set which is most informative about the large set of unlabeled instances instead. [sent-81, score-1.069]
45 Then the joint distribution over a finite number of instances XQ can be represented using the joint multivariate Gaussian distribution over variables fQ , which is given in (2). [sent-86, score-0.351]
46 Note that for a given data set, the overall number of instances does not change during the active learning process. [sent-91, score-0.66]
47 We simply move b instances from the unlabeled set U into the labeled set L in each iteration. [sent-92, score-0.614]
48 Based on this observation, our maximum mutual information instance selection strategy can be formulated as I(XL , XU ) Q∗ = arg max I(XL∪Q , XU \Q ) = arg max ln |ΣL L | + ln |ΣU |Q|=b,Q⊆U |Q|=b,Q⊆U U | (6) where L = L∪Q and U = U \Q. [sent-94, score-0.547]
49 This also suggests the mutual information criterion depends only on the covariance matrices computed using the kernel functions over the instances. [sent-95, score-0.313]
50 Our maximum mutual information strategy attempts to select the batch of b instances from the unlabeled set U to label, to maximize the log determinants of the covariance matrices over the produced sets L and U . [sent-96, score-1.023]
51 2 Matrix Partition Let Σ be the covariance matrix over all the instances indexed by V = L ∪ U = L ∪ U . [sent-98, score-0.488]
52 Without losing any generality, we assume the instances are arranged in the order of [U, L], such that Σ= ΣU U ΣLU ΣU L ΣLL (7) The instance selection problem formulated in (6) selects a subset of b instances indexed by Q from U and moves them into the labeled set L. [sent-100, score-0.918]
53 This problem is actually equivalent to partitioning matrix Σ into submatrices ΣL L , ΣU U , ΣL U and ΣU L by reordering the instances in U . [sent-101, score-0.472]
54 Since L is fixed, the actual matrix partition is conduct on covariance matrix ΣU U . [sent-102, score-0.414]
55 We let M˜ denote the first u − b rows of M , and Mb denote b the last b rows of M , such that M˜ΣU U M˜ = ΣU b b U , Mb ΣU U Mb = ΣQQ (8) Obviously Mb selects b instances from U to form Q. [sent-104, score-0.406]
56 According to (8) we then have ΣU U = T ΣT , ΣL L = BΣB (10) Finally, the maximum mutual information problem given in (6) can be equivalently formulated into the following matrix partition problem max M s. [sent-106, score-0.415]
57 ln |BΣB | + ln |T ΣT | M ∈ {0, 1}u×u , M 1 = 1, M 1 = 1 4 (11) After solving this problem to obtain an optimal M ∗ , the instance selection can be determined from ∗ the last b rows of M ∗ , i. [sent-108, score-0.364]
58 However, the optimization problem (11) is an NP-hard combinatorial optimization problem over an integer matrix M . [sent-111, score-0.22]
59 ln |BΣB | + ln |T ΣT | (12) 0 ≤ M ≤ 1, M 1 = 1, M 1 = 1 (13) Note a determinant is a log concave function on positive definite matrices [1]. [sent-114, score-0.322]
60 However, the quadratic matrix function X = BΣB is matrix convex given the matrix Σ is positive definite. [sent-116, score-0.309]
61 Here u is the number of unlabeled instances, and we typically assume it is a large number. [sent-122, score-0.208]
62 The line search procedure, 5 Algorithm 1 Matrix Partition Input: l: the number of labeled instances; u the number of unlabeled instances; Σ: covariance matrix given in form of (7); b: batch size; M (0) ; < 1e − 8. [sent-135, score-0.61]
63 The overall algorithm for optimizing the matrix partition problem (12) is given in Algorithm 1. [sent-155, score-0.264]
64 When the number of unlabeled instances, u, is large, computing the log-determinant of the (u − b) × (u − b) matrix, T ΣT , is likely to run into overflow or underflow. [sent-157, score-0.208]
65 For positive definite matrices, such as the matrices we have, one can use Cholesky factorization to first produce a triangular matrix and then compute the log-determinant of the original matrix using the logarithms of the diagonal values of the triangular matrix. [sent-162, score-0.322]
66 By solving the matrix partition problem in (12) using Algorithm 1, an optimal matrix M ∗ is returned. [sent-166, score-0.307]
67 In order to determine which set of b instances to select, we need to round M ∗ to a {0,1}-valued M ∗ , while maintaining the permutation constraints M ∗ 1 = 1 and M ∗ 1 = 1. [sent-168, score-0.323]
68 In ∗ this procedure, we focused on rounding the last b rows, Mb , since they are the ones used to pick b instances for labeling. [sent-170, score-0.323]
69 The procedure is described in Algorithm 2, which returns the indices of the selected b instances as well. [sent-171, score-0.323]
70 6 4 Experiments To investigate the empirical performance of the proposed batch-mode active learning algorithm, we conducted two sets of experiments on a few UCI datasets and the 20 newsgroups dataset. [sent-172, score-0.334]
71 Note the proposed active learning method is in general independent of the specific classification model employed. [sent-173, score-0.277]
72 We have also compared our approach to one transductive experimental design method which is formulated from regression problems and whose instance selection process is independent of evaluation classification models [15]. [sent-176, score-0.263]
73 We consider a hard case of active learning, where we start active learning from only a few labeled instances. [sent-179, score-0.637]
74 We then randomly select 2/3 of the remaining instances as the unlabeled set, using all the other instances for testing. [sent-181, score-0.935]
75 All the algorithms start with the same initial labeled set, unlabeled set and testing set. [sent-182, score-0.291]
76 For a fixed batch size b, each algorithm repeatedly select b instances to label each time and evaluate the produced classifier on testing data after each new labeling, with maximum 110 instances to select in total. [sent-183, score-0.962]
77 These results show that the proposed active learning method, Matrix, overperformed svmD, Fisher and Design on most data sets, except an overall lose to svmD on pima, a tie with Fisher and Design on hepatitis, and a tie with Design on flare. [sent-191, score-0.913]
78 Matrix is mostly tied with Discriminative on all data sets, with a slight pointwise win on crx and a slight overall lose on german. [sent-192, score-0.93]
79 Table 1: Comparison of the active learning algorithms on UCI data with batch size = 10. [sent-195, score-0.431]
80 Matrix vs svmD Matrix vs Fisher Matrix vs Discriminative Matrix vs Design win% lose% overall win% lose% overall win% lose% overall win% lose% overall cleve 63. [sent-198, score-0.552]
81 8 0 win pima Data set Table 2: Average running time (in minutes) Method Matrix Discriminative cleve 8. [sent-219, score-0.65]
82 27 Table 3: Comparison of the active learning algorithms on Newsgroup data with batch size = 20. [sent-233, score-0.431]
83 Matrix vs svmD Matrix vs Fisher Matrix vs Random Matrix vs Design win% lose% overall win% lose% overall win% lose% overall win% lose% overall Autos 86. [sent-236, score-0.512]
84 7 win Sport Data set Next we conducted experiments on 20 newsgroups dataset for document categorization. [sent-253, score-0.621]
85 We then randomly select 1000 instances (500 from each class) from the remaining ones as the unlabeled set, using all the other instances for testing. [sent-276, score-0.935]
86 All the algorithms start with the same initial labeled set, unlabeled set and testing set. [sent-277, score-0.291]
87 For a fixed batch size b, each algorithm repeatedly select b instances to label each time with maximum 300 instances to select in total. [sent-278, score-0.962]
88 Note the unlabeled sets used for this set of experiments are much larger than the ones used for experiments on UCI datasets. [sent-281, score-0.208]
89 However, we coped with this problem using a subsampling assisted method, where we first select a subset of 400 instances from the unlabeled set and then restrain our instance selection to this subset. [sent-286, score-0.746]
90 This is equivalent to solving the matrix partition optimization in (12) with additional constraints on Mb , such that the columns of Mb corresponding to instances outside of this subset of 400 instances are all set to 0. [sent-287, score-0.895]
91 For the experiments, we chose the 400 instances as the ones with top entropy terms under the current classification model. [sent-288, score-0.355]
92 It tied with Fisher regarding overall measure, but had a slight win regarding pointwise measure. [sent-292, score-0.736]
93 5 Conclusions In this paper, we propose a novel batch-mode active learning approach that makes query selection decisions independent of the classification model employed. [sent-295, score-0.399]
94 It is formulated as a matrix partition optimization problem under a Gaussian process framework. [sent-297, score-0.325]
95 To tackle the formulated combinatorial optimization problem, we developed an effective local optimization technique. [sent-298, score-0.204]
96 Our empirical studies show the proposed flexible batch-mode active learning approach can achieve comparable or superior performance to discriminative batch-mode active learning methods that have been optimized on specific classifiers. [sent-299, score-0.642]
97 A future extension for this work is to consider batch-mode active learning with structured data by exploiting different kernel functions. [sent-300, score-0.306]
98 Incorporating diversity in active learning with support vector machines. [sent-308, score-0.341]
99 Batch mode active learning and its application to medical image classification. [sent-342, score-0.316]
100 Semi-supervised SVM batch mode active learning for image retrieval. [sent-349, score-0.47]
wordName wordTfidf (topN-words)
[('win', 0.564), ('instances', 0.323), ('active', 0.277), ('unlabeled', 0.208), ('tie', 0.202), ('lose', 0.172), ('mutual', 0.159), ('mb', 0.159), ('batch', 0.154), ('svmd', 0.141), ('qq', 0.121), ('ln', 0.113), ('fq', 0.107), ('matrix', 0.103), ('partition', 0.101), ('xq', 0.098), ('hoi', 0.089), ('discriminative', 0.088), ('labeled', 0.083), ('select', 0.081), ('xu', 0.076), ('fisher', 0.075), ('xl', 0.074), ('vs', 0.068), ('documents', 0.068), ('informativeness', 0.065), ('diversity', 0.064), ('covariance', 0.062), ('crx', 0.06), ('overall', 0.06), ('gaussian', 0.058), ('instance', 0.058), ('hepatitis', 0.053), ('formulated', 0.052), ('selection', 0.052), ('labeling', 0.052), ('guo', 0.051), ('classi', 0.051), ('informative', 0.051), ('paired', 0.05), ('query', 0.046), ('pima', 0.046), ('submatrices', 0.046), ('conduct', 0.045), ('optimization', 0.045), ('design', 0.043), ('queries', 0.043), ('jin', 0.041), ('autos', 0.04), ('batchmode', 0.04), ('cleve', 0.04), ('ochange', 0.04), ('placements', 0.04), ('yuhong', 0.04), ('triangular', 0.04), ('backtracking', 0.04), ('mode', 0.039), ('fl', 0.038), ('fu', 0.038), ('determinant', 0.036), ('matrices', 0.036), ('sport', 0.035), ('linesearch', 0.035), ('retraining', 0.035), ('schohn', 0.035), ('optimistic', 0.035), ('local', 0.035), ('transductive', 0.034), ('uci', 0.033), ('dg', 0.033), ('entropy', 0.032), ('regarding', 0.031), ('denotes', 0.03), ('unseen', 0.03), ('sensor', 0.03), ('processes', 0.029), ('conducted', 0.029), ('kernel', 0.029), ('rows', 0.028), ('logistic', 0.028), ('multivariate', 0.028), ('newsgroups', 0.028), ('ll', 0.028), ('combinatorial', 0.027), ('criterion', 0.027), ('selects', 0.027), ('ultimate', 0.026), ('linearization', 0.026), ('gradient', 0.026), ('xj', 0.026), ('german', 0.026), ('krause', 0.026), ('pointwise', 0.026), ('employed', 0.025), ('decisions', 0.024), ('concave', 0.024), ('hardware', 0.024), ('subsampling', 0.024), ('slight', 0.024), ('process', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
2 0.33350444 25 nips-2010-Active Learning by Querying Informative and Representative Examples
Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1
3 0.14223388 27 nips-2010-Agnostic Active Learning Without Constraints
Author: Alina Beygelzimer, John Langford, Zhang Tong, Daniel J. Hsu
Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1
4 0.14146295 22 nips-2010-Active Estimation of F-Measures
Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer
Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a fixed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reflect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1
5 0.12721786 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
Author: Wei Wang, Zhi-hua Zhou
Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1 ), contrasting to single-view setting where the polynomial improveǫ ment is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is O( 1 ), where the order of 1/ǫ is independent of the parameter in Tsybakov noise, ǫ contrasting to previous polynomial bounds where the order of 1/ǫ is related to the parameter in Tsybakov noise. 1
6 0.12390757 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
7 0.12219014 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
8 0.11770437 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
9 0.11756103 151 nips-2010-Learning from Candidate Labeling Sets
10 0.090912342 217 nips-2010-Probabilistic Multi-Task Feature Selection
11 0.077169336 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
12 0.074533902 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
13 0.072164327 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
14 0.071574777 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
15 0.071136527 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
16 0.070925489 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
17 0.068411559 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
18 0.06586767 108 nips-2010-Graph-Valued Regression
19 0.065366551 194 nips-2010-Online Learning for Latent Dirichlet Allocation
20 0.065195926 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
topicId topicWeight
[(0, 0.2), (1, 0.05), (2, 0.102), (3, -0.059), (4, -0.026), (5, 0.088), (6, -0.065), (7, -0.101), (8, 0.02), (9, -0.279), (10, 0.102), (11, 0.031), (12, -0.022), (13, -0.069), (14, -0.002), (15, -0.058), (16, -0.263), (17, -0.013), (18, -0.002), (19, 0.111), (20, -0.071), (21, 0.055), (22, -0.161), (23, 0.112), (24, -0.014), (25, -0.021), (26, 0.077), (27, -0.033), (28, -0.027), (29, 0.01), (30, -0.008), (31, -0.133), (32, -0.088), (33, 0.026), (34, -0.065), (35, -0.015), (36, -0.089), (37, -0.036), (38, 0.066), (39, 0.04), (40, 0.106), (41, 0.017), (42, 0.079), (43, -0.128), (44, -0.056), (45, -0.065), (46, -0.024), (47, -0.031), (48, 0.093), (49, -0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.95541894 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
2 0.94843113 25 nips-2010-Active Learning by Querying Informative and Representative Examples
Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1
3 0.75539196 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman
Abstract: We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1
4 0.67752373 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
Author: Wei Wang, Zhi-hua Zhou
Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1 ), contrasting to single-view setting where the polynomial improveǫ ment is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is O( 1 ), where the order of 1/ǫ is independent of the parameter in Tsybakov noise, ǫ contrasting to previous polynomial bounds where the order of 1/ǫ is related to the parameter in Tsybakov noise. 1
5 0.60837924 22 nips-2010-Active Estimation of F-Measures
Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer
Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a fixed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reflect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1
6 0.57375443 27 nips-2010-Agnostic Active Learning Without Constraints
7 0.49853531 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
8 0.48491931 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
9 0.44583234 151 nips-2010-Learning from Candidate Labeling Sets
10 0.44523573 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
11 0.40839922 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
12 0.39165506 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
13 0.38275748 217 nips-2010-Probabilistic Multi-Task Feature Selection
14 0.36934099 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
15 0.3457725 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
16 0.34571135 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
17 0.34278008 2 nips-2010-A Bayesian Approach to Concept Drift
18 0.34063196 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
19 0.34061784 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
20 0.33436835 228 nips-2010-Reverse Multi-Label Learning
topicId topicWeight
[(13, 0.041), (27, 0.082), (30, 0.04), (31, 0.199), (35, 0.013), (45, 0.232), (50, 0.082), (52, 0.027), (60, 0.028), (77, 0.037), (78, 0.083), (90, 0.044)]
simIndex simValue paperId paperTitle
1 0.9146961 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch
Author: Zeeshan Syed, John V. Guttag
Abstract: Cardiovascular disease is the leading cause of death globally, resulting in 17 million deaths each year. Despite the availability of various treatment options, existing techniques based upon conventional medical knowledge often fail to identify patients who might have benefited from more aggressive therapy. In this paper, we describe and evaluate a novel unsupervised machine learning approach for cardiac risk stratification. The key idea of our approach is to avoid specialized medical knowledge, and assess patient risk using symbolic mismatch, a new metric to assess similarity in long-term time-series activity. We hypothesize that high risk patients can be identified using symbolic mismatch, as individuals in a population with unusual long-term physiological activity. We describe related approaches that build on these ideas to provide improved medical decision making for patients who have recently suffered coronary attacks. We first describe how to compute the symbolic mismatch between pairs of long term electrocardiographic (ECG) signals. This algorithm maps the original signals into a symbolic domain, and provides a quantitative assessment of the difference between these symbolic representations of the original signals. We then show how this measure can be used with each of a one-class SVM, a nearest neighbor classifier, and hierarchical clustering to improve risk stratification. We evaluated our methods on a population of 686 cardiac patients with available long-term electrocardiographic data. In a univariate analysis, all of the methods provided a statistically significant association with the occurrence of a major adverse cardiac event in the next 90 days. In a multivariate analysis that incorporated the most widely used clinical risk variables, the nearest neighbor and hierarchical clustering approaches were able to statistically significantly distinguish patients with a roughly two-fold risk of suffering a major adverse cardiac event in the next 90 days. 1
same-paper 2 0.85447925 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
3 0.8141641 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
Author: Alessandro Chiuso, Gianluigi Pillonetto
Abstract: We introduce a new Bayesian nonparametric approach to identification of sparse dynamic linear systems. The impulse responses are modeled as Gaussian processes whose autocovariances encode the BIBO stability constraint, as defined by the recently introduced “Stable Spline kernel”. Sparse solutions are obtained by placing exponential hyperpriors on the scale factors of such kernels. Numerical experiments regarding estimation of ARMAX models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization. 1
4 0.80027699 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
5 0.78947467 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma
Abstract: This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from query-document matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1
6 0.78736955 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
7 0.78678566 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics
8 0.78395689 151 nips-2010-Learning from Candidate Labeling Sets
9 0.78101909 158 nips-2010-Learning via Gaussian Herding
10 0.77867496 282 nips-2010-Variable margin losses for classifier design
11 0.77762312 63 nips-2010-Distributed Dual Averaging In Networks
12 0.77525991 1 nips-2010-(RF)^2 -- Random Forest Random Field
13 0.77496076 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
14 0.7746917 177 nips-2010-Multitask Learning without Label Correspondences
15 0.77411711 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
16 0.77399254 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
17 0.77336282 222 nips-2010-Random Walk Approach to Regret Minimization
18 0.77275378 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
19 0.77230126 22 nips-2010-Active Estimation of F-Measures
20 0.7720136 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning