nips nips2010 nips2010-22 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer
Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a fixed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reflect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this case, new test instances have to be drawn and labeled at a cost. [sent-5, score-0.287]
2 An active estimation procedure selects instances according to an instrumental sampling distribution. [sent-6, score-0.78]
3 An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. [sent-7, score-0.478]
4 We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. [sent-8, score-0.526]
5 Finally, when a model has been trained actively, the labeled data is biased towards small-margin instances which would incur a pessimistic bias on any cross-validation estimate. [sent-13, score-0.219]
6 For a given binary classifier and sample of size n, let ntp and nf p denote the number of true and false positives, respectively, and nf n the number of false negatives. [sent-19, score-0.447]
7 (1) α(ntp + nf p ) + (1 − α)(ntp + nf n ) Precision and recall are special cases for α = 1 and α = 0, respectively. [sent-21, score-0.284]
8 We will now introduce the class of generalized risk functionals that we study in this paper. [sent-24, score-0.349]
9 Like any risk functional, the generalized risk is parameterized with a function : Y × Y → R determining either the loss or—alternatively—the gain that is incurred for a pair of predicted and 1 true label. [sent-29, score-0.511]
10 In addition, the generalized risk is parameterized with a function w that assigns a weight w(x, y, fθ ) to each instance. [sent-30, score-0.297]
11 For instance, precision sums over instances with fθ (x) = 1 with weight 1 and gives no consideration to other instances. [sent-31, score-0.234]
12 Note that the generalized risk (Equation 2) reduces to the regular risk for w(x, y, fθ ) = 1. [sent-34, score-0.487]
13 On a sample of size n, a consistent estimator can be obtained by replacing the cumulative distribution function with the empirical distribution function. [sent-35, score-0.279]
14 The quantity ˆ Gn = n i=1 (fθ (xi ), yi )w(xi , yi , fθ ) n i=1 w(xi , yi , fθ ) (3) is a consistent estimate of the generalized risk G defined by Equation 2. [sent-41, score-0.642]
15 ˆ Consistency means asymptotical unbiasedness; that is, the expected value of the estimate Gn converges in distribution to the true risk G for n → ∞. [sent-44, score-0.284]
16 We now observe that Fα -measures—including precision and recall—are consistent empirical estimates of generalized risks for appropriately chosen functions w. [sent-45, score-0.399]
17 Fα is a consistent estimate of the generalized risk with Y = {0, 1}, w(x, y, fθ ) = αfθ (x) + (1 − α)y and = 1 − 0/1 , where 0/1 denotes the zero-one loss. [sent-47, score-0.387]
18 The claim follows from Proposition 1 since n i=1 (1 ˆ Gn = = α − 0/1 (fθ (xi ), yi )) (αfθ (xi ) + (1 − α)yi ) n i=1 (αfθ (xi ) + (1 − α)yi ) n i=1 fθ (xi )yi = n n α (ntp + nf p ) fθ (xi ) + (1 − α) i=1 yi i=1 ntp . [sent-49, score-0.478]
19 + (1 − α) (ntp + nf n ) Having established and motivated the generalized risk functional, we now turn towards the problem of acquiring a consistent estimate with minimal estimation error on a fixed labeling budget n. [sent-50, score-0.798]
20 Instead, we study an active estimation process that selects test instances according to an instrumental distribution q. [sent-55, score-0.776]
21 When instances are sampled from q, an estimator of the generalized risk can be defined as ˆ Gn,q = n p(xi ) i=1 q(xi ) (fθ (xi ), yi )w(xi , yi , fθ ) n p(xi ) i=1 q(xi ) w(xi , yi , fθ ) (4) i) where (xi , yi ) are drawn from q(x)p(y|x). [sent-56, score-0.947]
22 Weighting factors p(xi ) compensate for the discrepancy q(x between test and instrumental distributions. [sent-57, score-0.316]
23 Because of the weighting factors, Slutsky’s Theorem again implies that Equation 4 defines a consistent estimator for G, under the precondition that for all x ∈ X with p(x) > 0 it holds that q(x) > 0. [sent-58, score-0.21]
24 Note that Equation 3 is a special case of Equation 4, using the instrumental distribution q = p. [sent-59, score-0.225]
25 ˆ The estimate Gn,q given by Equation 4 depends on the selected instances (xi , yi ), which are drawn ˆ according to the distribution q(x)p(y|x). [sent-60, score-0.399]
26 Our overall goal is to determine the instrumental distribution q such that the expected deviation from the generalized risk is minimal for fixed labeling costs n: ˆ Gn,q − G q ∗ = arg min E q 2 2 . [sent-62, score-0.705]
27 2 Active Estimation through Variance Minimization The bias-variance decomposition expresses the estimation error as a sum of a squared bias and a variance term [5]: ˆ ˆ E (Gn,q − G)2 = E Gn,q − G 2 +E ˆ ˆ Gn,q − E Gn,q 2 ˆ ˆ = Bias2 [Gn,q ] + Var[Gn,q ]. [sent-63, score-0.231]
28 Lemma 2 states that the active risk estimator Gn,q is asymptotically normally distributed, and characterizes its variance in the limit. [sent-70, score-0.659]
29 In the following, we will consequently derive a 2 ˆ sampling distribution q ∗ that minimizes the asymptotic variance σq of the estimator Gn,q . [sent-78, score-0.466]
30 1 Optimal Sampling Distribution 2 The following theorem derives the sampling distribution that minimizes the asymptotic variance σq : Theorem 1 (Optimal Sampling Distribution). [sent-80, score-0.41]
31 The instrumental distribution that minimizes the 2 ˆ asymptotic variance σq of the generalized risk estimator Gn,q is given by q ∗ (x) ∝ p(x) 2 w(x, y, fθ )2 ( (fθ (x), y) − G) p(y|x)dy. [sent-81, score-0.816]
32 Since F -measures are estimators of generalized risks according to Corollary 1, we can now derive their variance-minimizing sampling distributions. [sent-83, score-0.357]
33 1: Compute optimal sampling distribution q ∗ according to Corollary 2, 3, or 4, respectively. [sent-87, score-0.203]
34 According to Corollary 1, Fα estimates a generalized risk with Y = {0, 1}, w(x, y, fθ ) = αfθ (x) + (1 − α)y and = 1 − 0/1 . [sent-94, score-0.343]
35 The sampling distribution that minimizes σq for recall resolves to q ∗ (x) ∝ p(x) p(fθ (x)|x)(1 − G)2 p(x) (1 − p(fθ (x)|x))G2 : f (x) = 1 : f (x) = 0. [sent-97, score-0.328]
36 The sampling distribution that minimizes σq for precision resolves to q ∗ (x) ∝ p(x)fθ (x) (1 − 2G)p(fθ (x)|x) + G2 . [sent-99, score-0.367]
37 Note that for standard risks (that is, w = 1) Theorem 1 coincides with the optimal sampling distribution derived in [7]. [sent-101, score-0.297]
38 We now turn towards a setting in which a large pool D of unlabeled test instances is available. [sent-104, score-0.28]
39 Drawing instances from the pool replaces generating 1 them under the test distribution; that is, p(x) = m for all x ∈ D. [sent-106, score-0.28]
40 This approximation constitutes an analogy to active learning: In active learning, the model-based output probability p(y|x; θ) serves as the basis on which the least confident instances are selected. [sent-109, score-0.609]
41 Note that as long as p(x) > 0 implies q(x) > 0, the weighting factors ensure that such approximations do not introduce an asymptotic bias in our estimator (Equation 4). [sent-110, score-0.273]
42 Finally, Theorem 1 and its corollaries depend on the true generalized risk G. [sent-111, score-0.365]
43 G is replaced by an intrinsic generalized risk calculated from Equation 2, 1 where the integral over X is replaced by a sum over the pool, p(x) = m , and p(y|x) ≈ p(y|x; θ). [sent-112, score-0.297]
44 Algorithm 1 summarizes the procedure for active estimation of F -measures. [sent-113, score-0.321]
45 3 Confidence Intervals ˆ Lemma 2 shows that the estimator Gn,q is asymptotically normally distributed and character2 izes its asymptotic variance. [sent-121, score-0.252]
46 , (xn , yn ) drawn from the distribution q(x)p(y|x) by computing empirical variance 2 Sn,q = 1 n n p(xi ) i=1 q(xi ) i=1 p(xi ) q(xi ) 2 w(xi , yi , fθ )2 ˆ (fθ (xi ), yi ) − Gn,q 2 . [sent-125, score-0.351]
47 As in the standard case of drawing test instances xi from the original distribution p, such confidence intervals are approximate for finite n, but become exact for n → ∞. [sent-127, score-0.333]
48 3 Empirical Studies We compare active estimation of Fα -measures according to Algorithm 1 (denoted activeF ) to estimation based on a sample of instances drawn uniformly from the pool (denoted passive). [sent-128, score-0.708]
49 We also consider the active estimator for risks presented in [7]. [sent-129, score-0.457]
50 Instances are drawn according to the opti∗ mal sampling distribution q0/1 for zero-one risk (Derivation 1 in [7]); the Fα -measure is computed ∗ according to Equation 4 using q = q0/1 (denoted activeerr ). [sent-130, score-0.828]
51 We collect 169,612 emails from an email service provider between June 2007 and April 2010; of these, 42,165 emails received by February 2008 are used for training. [sent-142, score-0.29]
52 The Reuters-21578 text classification task [4] allows us to study the effect of class skew, and serves as a prototypical domain for active learning. [sent-146, score-0.358]
53 We employ an active learner that always queries the example with minimal functional margin p(fθ (x)|x; θ) − maxy=fθ (x) p(y|x; θ) [9]. [sent-148, score-0.261]
54 2 Empirical Results We study the performance of active and passive estimates as a function of (a) the precision-recall trade-off parameter α, (b) the discrepancy between training and test distribution, and (c) class skew in the test distribution. [sent-154, score-0.867]
55 Point (b) is of interest because active estimates require the approximation p(y|x) ≈ p(y|x; θ); this assumption is violated when training and test distributions differ. [sent-155, score-0.371]
56 For the spam filtering domain, Figure 1 shows the average absolute estimation error for F0 (recall), F0. [sent-157, score-0.311]
57 5 , and F1 (precision) estimates on a test set of 33,296 emails received between February 2008 and October 2008. [sent-158, score-0.263]
58 The active generalized risk estimate activeF significantly outperforms the passive estimate passive for all three measures. [sent-159, score-1.239]
59 In order to reach the estimation accuracy of passive with a labeling budget of n = 800, activeF requires fewer than 150 (recall), 200 (F0. [sent-160, score-0.546]
60 2 estimation error (absolute) passive activeF estimation error (absolute) estimation error (absolute) 0. [sent-172, score-0.707]
61 05 800 0 200 400 600 labeling costs n 800 Figure 1: Spam filtering: Estimation error over labeling costs. [sent-182, score-0.346]
62 5 estimation error (absolute) Optimal Sampling Distribution (class ratio: 5/95) 25 passive activeF 0. [sent-193, score-0.435]
63 Ratio of passive and active estimation error, error bars indicate standard deviation (center). [sent-199, score-0.707]
64 activeF are at least as accurate as those of activeerr , and more accurate for high α values. [sent-201, score-0.354]
65 Figure 2 (left) shows the sampling distribution q ∗ (x) for recall, precision and F0. [sent-203, score-0.267]
66 5 -measure in the spam filtering domain as a function of the classifier’s confidence, characterized by the log-odds ratio log p(y=1|x;θ) . [sent-204, score-0.207]
67 The figure also shows the optimal sampling distribution for zero-one risk as used p(y=0|x;θ) in activeerr (denoted “0/1-Risk”). [sent-205, score-0.716]
68 We observe that the precision estimator dismisses all examples with fθ (x) = 0; this is intuitive because precision is a function of true-positive and false-positive examples only. [sent-206, score-0.311]
69 By contrast, the recall estimator selects examples on both sides of the decision boundary, as it has to estimate both the true positive and the false negative rate. [sent-207, score-0.244]
70 The optimal sampling distribution for zero-one risk is symmetric, it prefers instances close to the decision boundary. [sent-208, score-0.501]
71 We keep the training set of emails fixed and move the time interval from which test instances are drawn increasingly further away into the future, thereby creating a growing gap between training and test distribution. [sent-210, score-0.502]
72 Specifically, we divide 127,447 emails received between February 2008 and April 2010 into ten different test sets spanning approximately 2. [sent-211, score-0.262]
73 Figure 2 (center, red curve) shows the discrepancy between training and test distribution measured in terms of the exponentiated average log-likelihood of the test labels given the model parameters θ. [sent-213, score-0.257]
74 A ˆ n,q∗ −G| value above one indicates that the active estimate is more accurate than a passive estimate. [sent-217, score-0.574]
75 The active estimate consistently outperforms the passive estimate; its advantage diminishes when training and test distributions diverge and the assumption of p(y|x) ≈ p(y|x; θ) becomes less accurate. [sent-218, score-0.747]
76 In the spam filtering domain we artificially sub-sampled data to different ratios of spam and non-spam emails. [sent-220, score-0.287]
77 Figure 2 (right) shows the performance of activeF , passive, and activeerr for F0. [sent-221, score-0.354]
78 Furthermore, activeF outperforms activeerr for imbalanced classes, while the approaches perform comparably when classes are balanced. [sent-224, score-0.413]
79 02 200 400 600 labeling costs n 800 passive activeF activeerr 0. [sent-233, score-0.836]
80 005 0 200 400 600 labeling costs n 800 estimation error (absolute) 0. [sent-236, score-0.319]
81 02 passive activeF estimation error (absolute) estimation error (absolute) 0. [sent-239, score-0.571]
82 Estimation error over class ratio for all ten classes, logarithmic scale (right). [sent-247, score-0.191]
83 Figure 3 shows the estimation error of activeF , passive, and activeerr for an infrequent class (“crude”, 4. [sent-251, score-0.59]
84 Figure 3 (right) shows the estimation error of activeF , passive, and activeerr on all ten one-versus-rest problems as a function of the problem’s class skew. [sent-255, score-0.587]
85 We again observe that activeF outperforms passive consistently, and activeF outperforms activeerr for strongly skewed class distributions. [sent-256, score-0.814]
86 Our experimental findings show that for estimating F -measures their varianceminimizing sampling distribution performs worse than the sampling distributions characterized by Theorem 1, especially for skewed class distributions. [sent-260, score-0.393]
87 We use the model itself to approximate this distribution and decide on instances whose class labels are queried. [sent-264, score-0.245]
88 Specifically, Bach derives a sampling distribution for active learning under the assumption that the current model gives a good approximation to the conditional probability p(y|x) [1]. [sent-266, score-0.466]
89 To compensate for the bias incurred by the instrumental distribution, several active learning algorithms use importance weighting: for regression [8], exponential family models [1], or SVMs [2]. [sent-267, score-0.504]
90 Finally, the proposed active estimation approach can be considered an instance of the general principle of importance sampling [6], which we employ in the context of generalized risk estimation. [sent-268, score-0.736]
91 5 Conclusions Fα -measures are defined as empirical estimates; we have shown that they are consistent estimates of a generalized risk functional which Proposition 1 identifies. [sent-269, score-0.419]
92 Generalized risks can be estimated actively by sampling test instances from an instrumental distribution q. [sent-270, score-0.669]
93 An analysis of the sources of estimation error leads to an instrumental distribution q ∗ that minimizes estimator variance. [sent-271, score-0.531]
94 The optimal sampling distribution depends on the unknown conditional p(y|x); the active generalized risk estimator approximates this conditional by the model to be evaluated. [sent-272, score-0.879]
95 Our empirical study supports the conclusion that the advantage of active over passive evaluation is particularly strong for skewed classes. [sent-273, score-0.61]
96 The advantage of active evaluation is also correlated to the quality of the model as measured by the model-based likelihood of the test labels. [sent-274, score-0.32]
97 In our experiments, active evaluation consistently outperformed passive evaluation, even for the greatest divergence between training and test distribution that we could observe. [sent-275, score-0.728]
98 Let G0 = n i=1 vi wi with vi = p(xi ) q(xi ) , wi = w(xi , yi , fθ ) and i vi i wi and Wn = ˆ n,q = = (fθ (xi ), yi ). [sent-280, score-1.22]
99 Furthermore, T f (G E [wi ] , E [wi ]) Σ f (G E [wi ] , E [wi ]) = Var [wi i vi ] − 2G Cov [wi vi , wi i vi ] + G2 Var [wi vi ] 2 2 2 2 2 2 = E wi 2 vi − 2G E wi i vi + G2 E wi vi i p(x) q(x) = 2 2 w(x, y, fθ )2 ( (fθ (x), y) − G) p(y|x)q(x)dydx. [sent-290, score-1.814]
100 Support vector machine active learning with applications to text classification. [sent-341, score-0.267]
wordName wordTfidf (topN-words)
[('activef', 0.424), ('activeerr', 0.354), ('passive', 0.299), ('active', 0.235), ('wi', 0.212), ('risk', 0.19), ('instrumental', 0.171), ('ntp', 0.165), ('var', 0.145), ('instances', 0.139), ('vi', 0.138), ('emails', 0.133), ('spam', 0.124), ('estimator', 0.121), ('sampling', 0.118), ('nf', 0.114), ('labeling', 0.113), ('generalized', 0.107), ('risks', 0.101), ('wn', 0.098), ('precision', 0.095), ('sawade', 0.094), ('estimation', 0.086), ('yi', 0.085), ('pool', 0.081), ('xi', 0.08), ('asymptotic', 0.071), ('landwehr', 0.071), ('costs', 0.07), ('corollaries', 0.068), ('corollary', 0.067), ('test', 0.06), ('equation', 0.06), ('recall', 0.056), ('distribution', 0.054), ('unde', 0.053), ('variance', 0.053), ('discrepancy', 0.053), ('dx', 0.053), ('class', 0.052), ('absolute', 0.051), ('resolves', 0.051), ('skewed', 0.051), ('error', 0.05), ('ltering', 0.05), ('consistent', 0.05), ('drawn', 0.05), ('minimizes', 0.049), ('infrequent', 0.048), ('budget', 0.048), ('dydx', 0.047), ('estimates', 0.046), ('ten', 0.045), ('ratio', 0.044), ('bias', 0.042), ('february', 0.042), ('cov', 0.042), ('scheffer', 0.041), ('slutsky', 0.041), ('gn', 0.041), ('estimate', 0.04), ('weighting', 0.039), ('domain', 0.039), ('potsdam', 0.038), ('labeled', 0.038), ('bars', 0.037), ('yamada', 0.036), ('digit', 0.034), ('lemma', 0.033), ('normally', 0.033), ('theorem', 0.033), ('text', 0.032), ('compensate', 0.032), ('derives', 0.032), ('skew', 0.032), ('frequent', 0.032), ('dence', 0.032), ('according', 0.031), ('center', 0.03), ('imbalanced', 0.03), ('training', 0.03), ('fraction', 0.029), ('outperforms', 0.029), ('diverge', 0.029), ('claim', 0.029), ('proposition', 0.028), ('asymptotically', 0.027), ('conditional', 0.027), ('false', 0.027), ('actively', 0.026), ('functional', 0.026), ('maxy', 0.026), ('vn', 0.026), ('april', 0.025), ('consistently', 0.025), ('evaluation', 0.025), ('received', 0.024), ('yn', 0.024), ('incurred', 0.024), ('coincides', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 22 nips-2010-Active Estimation of F-Measures
Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer
Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a fixed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reflect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1
2 0.21915999 27 nips-2010-Agnostic Active Learning Without Constraints
Author: Alina Beygelzimer, John Langford, Zhang Tong, Daniel J. Hsu
Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1
3 0.1442169 25 nips-2010-Active Learning by Querying Informative and Representative Examples
Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1
4 0.14146295 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
5 0.11307698 243 nips-2010-Smoothness, Low Noise and Fast Rates
Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1
6 0.1099285 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
7 0.095036253 151 nips-2010-Learning from Candidate Labeling Sets
8 0.087350138 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
9 0.08423762 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
10 0.083766833 265 nips-2010-The LASSO risk: asymptotic results and real world examples
11 0.08094722 269 nips-2010-Throttling Poisson Processes
12 0.078405716 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
13 0.077366717 282 nips-2010-Variable margin losses for classifier design
14 0.076001577 191 nips-2010-On the Theory of Learnining with Privileged Information
15 0.074568212 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
16 0.074490786 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
17 0.072474532 235 nips-2010-Self-Paced Learning for Latent Variable Models
18 0.071723111 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
19 0.07102631 252 nips-2010-SpikeAnts, a spiking neuron network modelling the emergence of organization in a complex system
20 0.07015489 108 nips-2010-Graph-Valued Regression
topicId topicWeight
[(0, 0.203), (1, 0.026), (2, 0.116), (3, -0.019), (4, 0.006), (5, 0.112), (6, -0.142), (7, -0.106), (8, -0.018), (9, -0.228), (10, 0.013), (11, -0.063), (12, -0.095), (13, -0.046), (14, -0.07), (15, -0.002), (16, -0.083), (17, 0.03), (18, -0.012), (19, 0.034), (20, -0.008), (21, 0.028), (22, -0.041), (23, 0.066), (24, 0.025), (25, 0.021), (26, -0.001), (27, 0.012), (28, 0.028), (29, -0.079), (30, -0.038), (31, 0.001), (32, -0.037), (33, 0.038), (34, -0.126), (35, -0.008), (36, -0.009), (37, -0.18), (38, 0.07), (39, 0.049), (40, 0.011), (41, -0.065), (42, 0.073), (43, 0.017), (44, 0.032), (45, 0.03), (46, -0.082), (47, 0.014), (48, -0.073), (49, 0.1)]
simIndex simValue paperId paperTitle
same-paper 1 0.95078665 22 nips-2010-Active Estimation of F-Measures
Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer
Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a fixed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reflect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1
2 0.80899179 27 nips-2010-Agnostic Active Learning Without Constraints
Author: Alina Beygelzimer, John Langford, Zhang Tong, Daniel J. Hsu
Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. 1
3 0.69981694 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case
Author: Wei Wang, Zhi-hua Zhou
Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1 ), contrasting to single-view setting where the polynomial improveǫ ment is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is O( 1 ), where the order of 1/ǫ is independent of the parameter in Tsybakov noise, ǫ contrasting to previous polynomial bounds where the order of 1/ǫ is related to the parameter in Tsybakov noise. 1
4 0.68052882 25 nips-2010-Active Learning by Querying Informative and Representative Examples
Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1
5 0.63781732 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
6 0.62483245 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
7 0.57333523 142 nips-2010-Learning Bounds for Importance Weighting
8 0.56185734 191 nips-2010-On the Theory of Learnining with Privileged Information
9 0.4940879 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities
10 0.49055994 243 nips-2010-Smoothness, Low Noise and Fast Rates
11 0.47724456 202 nips-2010-Parallelized Stochastic Gradient Descent
12 0.46330857 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
13 0.46160856 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
14 0.46056089 158 nips-2010-Learning via Gaussian Herding
15 0.45483655 151 nips-2010-Learning from Candidate Labeling Sets
16 0.43770242 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
17 0.43229628 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
18 0.43177384 235 nips-2010-Self-Paced Learning for Latent Variable Models
19 0.42865816 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
20 0.41604987 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
topicId topicWeight
[(13, 0.04), (17, 0.015), (27, 0.046), (30, 0.036), (35, 0.014), (45, 0.242), (50, 0.064), (52, 0.03), (60, 0.047), (77, 0.047), (78, 0.051), (90, 0.04), (93, 0.247)]
simIndex simValue paperId paperTitle
Author: Achintya Kundu, Vikram Tankasali, Chiranjib Bhattacharyya, Aharon Ben-tal
Abstract: In this paper we consider the problem of learning an n × n kernel matrix from m(≥ 1) similarity matrices under general convex loss. Past research have extensively studied the m = 1 case and have derived several algorithms which require sophisticated techniques like ACCP, SOCP, etc. The existing algorithms do not apply if one uses arbitrary losses and often can not handle m > 1 case. We present several provably convergent iterative algorithms, where each iteration requires either an SVM or a Multiple Kernel Learning (MKL) solver for m > 1 case. One of the major contributions of the paper is to extend the well known Mirror Descent(MD) framework to handle Cartesian product of psd matrices. This novel extension leads to an algorithm, called EMKL, which solves the problem in 2 O( m ǫlog n ) iterations; in each iteration one solves an MKL involving m kernels 2 and m eigen-decomposition of n × n matrices. By suitably defining a restriction on the objective function, a faster version of EMKL is proposed, called REKL, which avoids the eigen-decomposition. An alternative to both EMKL and REKL is also suggested which requires only an SVM solver. Experimental results on real world protein data set involving several similarity matrices illustrate the efficacy of the proposed algorithms.
same-paper 2 0.8277595 22 nips-2010-Active Estimation of F-Measures
Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer
Abstract: We address the problem of estimating the Fα -measure of a given model as accurately as possible on a fixed labeling budget. This problem occurs whenever an estimate cannot be obtained from held-out training data; for instance, when data that have been used to train the model are held back for reasons of privacy or do not reflect the test distribution. In this case, new test instances have to be drawn and labeled at a cost. An active estimation procedure selects instances according to an instrumental sampling distribution. An analysis of the sources of estimation error leads to an optimal sampling distribution that minimizes estimator variance. We explore conditions under which active estimates of Fα -measures are more accurate than estimates based on instances sampled from the test distribution. 1
3 0.78919321 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
Author: Dahua Lin, Eric Grimson, John W. Fisher
Abstract: We present a novel method for constructing dependent Dirichlet processes. The approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models. 1
4 0.73867702 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
5 0.73195761 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
6 0.73180318 282 nips-2010-Variable margin losses for classifier design
7 0.73007423 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
8 0.72970223 23 nips-2010-Active Instance Sampling via Matrix Partition
10 0.72876883 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
11 0.72862363 158 nips-2010-Learning via Gaussian Herding
12 0.7283361 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
13 0.72813857 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
14 0.72763389 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
15 0.72739422 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
16 0.72720307 243 nips-2010-Smoothness, Low Noise and Fast Rates
17 0.72651672 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
18 0.72566217 1 nips-2010-(RF)^2 -- Random Forest Random Field
19 0.72523016 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
20 0.72521335 287 nips-2010-Worst-Case Linear Discriminant Analysis