nips nips2013 nips2013-359 knowledge-graph by maker-knowledge-mining

359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields


Source: pdf

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). [sent-6, score-0.489]

2 For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. [sent-7, score-0.485]

3 V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. [sent-8, score-0.288]

4 However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. [sent-9, score-0.18]

5 We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. [sent-10, score-0.299]

6 Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. [sent-11, score-0.429]

7 1 Introduction Real-world data are often presented as a graph where the nodes in the graph bear labels that vary smoothly along edges. [sent-15, score-0.438]

8 For example, for scientific publications, the content of one paper is highly correlated with the content of papers that it references or is referenced by, the field of interest of a scholar is highly correlated with other scholars s/he coauthors with, etc. [sent-16, score-0.093]

9 Many of these networks can be described using an undirected graph with nonnegative edge weights set to be the strengths of the connections between nodes. [sent-17, score-0.185]

10 The model for label prediction in this paper is the harmonic function on the Gaussian random field (GRF) by Zhu et al. [sent-18, score-0.096]

11 It can generalize two popular and intuitive algorithms: label propagation (Zhu & Ghahramani, 2002), and random walk with absorptions (Wu et al. [sent-20, score-0.167]

12 GRFs can be seen as a Gaussian process (GP) (Rasmussen & Williams, 2006) with its (maybe improper) prior covariance matrix whose (pseudo)inverse is set to be the graph Laplacian. [sent-22, score-0.136]

13 Active learning addresses these issues by making automated decisions on which nodes to query for labels from experts or the crowd. [sent-24, score-0.31]

14 Some popular criteria are empirical risk minimization (Settles, 2010; Zhu et al. [sent-25, score-0.281]

15 Namely, we show that greedy reduction of Σ-optimality provides a (1 − 1/e) approximation bound to the global optimum. [sent-29, score-0.238]

16 Finally, we show that Σ-optimality outperforms other approaches for active learning with GRFs for classification. [sent-31, score-0.18]

17 1 V-optimality on Gaussian Random Fields Ji & Han (2012) proposed greedy variance minimization as a cheap and high profile surrogate active classification criterion. [sent-33, score-0.426]

18 To decide which node to query next, the active learning algorithm finds the unlabeled node which leads to the smallest average predictive variance on all other unlabeled nodes. [sent-34, score-0.671]

19 The motivation behind Voptimality can be paraphrased as the expected risk minimization with the L2 -surrogate loss (Section 2. [sent-37, score-0.19]

20 The greedy solution to the set optimization problem in V-optimality is comparable to the global solution up to a constant (Theorem 1). [sent-40, score-0.193]

21 The greedy application of V-optimality can also be interpreted as a heuristic which selects nodes that have high correlation to nodes with high variances (Observation 4). [sent-42, score-0.557]

22 (1978) shows that any submodular, monotone and normalized set function yields a (1 − 1/e) global optimality guarantee for greedy solutions. [sent-45, score-0.283]

23 2 Σ-optimality on Gaussian Random Fields We define Σ-optimality on GRFs to be another variance minimization criterion that minimizes the sum of all entries in the predictive covariance matrix. [sent-50, score-0.23]

24 As we will show in Lemma 7, the predictive covariance matrix is nonnegative entry-wise and thus the definition is proper. [sent-51, score-0.169]

25 (2012) in the context of active surveying, which is to determine the proportion of nodes belonging to one class. [sent-53, score-0.394]

26 However, we focus on its performance as a criterion in active classification heuristics. [sent-54, score-0.239]

27 The survey-risk of Σ-optimality replaces the L2 -risk of V-optimality as an alternative surrogate risk for the 0/1-risk. [sent-55, score-0.186]

28 We also prove that the greedy application of Σ-optimality has a similar theoretical bound as Voptimality. [sent-56, score-0.16]

29 Finally, greedy application of both Σ-optimality and V-optimality need O(N ) time per query candidate evaluation after one-time inverse of a N × N matrix. [sent-60, score-0.218]

30 Intuitively, this means that with more labels acquired, the conditional correlation between unlabeled nodes decreases even when their Markov blanket has not formed. [sent-65, score-0.414]

31 Suppose the dataset can be represented in the form of a connected undirected graph G = (V, E) where each node has an (either known or unknown) label and each edge eij has a fixed nonnegative weight wij (= wji ) that reflects the proximity, similarity, etc. [sent-68, score-0.365]

32 Define the graph Laplacian of G to be L = diag (W 1) − W , i. [sent-70, score-0.093]

33 2 The binary GRF is a Bayesian model to generate yi ∈ {0, +1} for every node vi according to, β 1 2 p(y) ∝ exp − = exp − y T Ly . [sent-75, score-0.133]

34 , y | | )T ; A GRF infers the output distribution on unlabeled nodes, yu = (yu1 , . [sent-83, score-0.183]

35 2) (−L−1 Lu u where yu = ˆ y ) is the vector of predictive means on unlabeled nodes and Lu is the principal submatrix consisting of the unlabeled row and column indices in L, that is, the lower-right L Lu block of L = . [sent-87, score-0.53]

36 (v− Lu Lu We use L(v− ) and Lu interchangeably because and u partition the set of all nodes v. [sent-89, score-0.179]

37 LP stands for label propagation, because the predictive mean on a node is the probability of a random walk leaving that node hitting a positive label before hitting a zero label. [sent-91, score-0.398]

38 (2003) proposed the harmonic predictor which looks at predictive means in one-versus-all comparisons. [sent-93, score-0.131]

39 The risk function, whose input variable is the labeled subset , is: RV ( ) = Ey yu (yui − yui )2 = E E ˆ ui ∈u (yui − yui )2 y ˆ = tr(L−1 ). [sent-99, score-0.652]

40 3) ui ∈u This risk is written with a subscript V because minimizing (2. [sent-101, score-0.187]

41 3) is also the V-optimality criterion, which minimizes mean prediction variance in active learning. [sent-102, score-0.18]

42 In active learning, we strive to select a subset of nodes to query for labels, constrained by a given budget C, such that the risk is minimized. [sent-103, score-0.562]

43 2) is to determine the proportion of nodes belonging to class 1, as would happen when performing a survey. [sent-108, score-0.214]

44 For active surveying, the risk would be: RΣ ( ) = Ey yu yui − ui ∈u yui ˆ 2 2 ˆ = E E 1T yu − 1T yu |y = 1T L−1 1, u (2. [sent-109, score-0.987]

45 5) ui ∈u which could substitute the risk R( ) in (2. [sent-110, score-0.187]

46 4) and yield another heuristic for selecting nodes in batch active learning. [sent-111, score-0.359]

47 6) (v− : | |≤C Further, we will also consider the application of Σ-optimality in active classification because (2. [sent-114, score-0.18]

48 5) are approximations of the real objective (the 0/1 risk), greedy reduction of the Σ-optimality criterion outperforms greedy reduction of the V-optimality criterion in active classification (Section 3. [sent-118, score-0.708]

49 As will be shown later in the theoretical results, the reduction of both risks are submodular set functions and the greedy sequential update algorithm yields a solution that has a guaranteed approximation ratio to the optimum (Theorem 1). [sent-126, score-0.284]

50 At the k-th query decision, denote the covariance matrix conditioned on the previous (k − 1) queries as C = (L(v− (k−1) ) )−1 . [sent-127, score-0.168]

51 By Shur’s Lemma (or the GP-regression update rule), the one-step lookahead covariance matrix conditioned on (k−1) ∪ {v}, denoted as C = (L(v−( (k−1) ∪{v})) )−1 , has the following update formula: 1 C 0 =C− · C:v Cv: , (2. [sent-128, score-0.096]

52 However, in the special case of GPs with kernel matrix equal to the inverse of a graph Laplacian (with = ∅ or δ > 0), the GRF does provide such theoretical guarantees, both for V-optimality and Σ-optimality. [sent-136, score-0.093]

53 The following theoretical results concern greedy maximization of the risk reduction function (which is shown to be submodular): R∆ ( ) = R(∅) − R( ) for either R(·) = RV (·) or RΣ (·). [sent-138, score-0.35]

54 Theorem 1 (Near-optimal guarantee for greedy applications of V/Σ-optimality). [sent-139, score-0.16]

55 In risk reduction, R∆ ( g ) ≥ (1 − 1/e) · R∆ ( ∗ ), (3. [sent-140, score-0.145]

56 1) where R∆ ( ) = R(∅) − R( ) for either R(·) = RV (·) or RΣ (·), e is Euler’s number, g is the greedy optimizer, and ∗ is the true global optimizer under the constraint | ∗ | ≤ | g |. [sent-141, score-0.193]

57 Walker (2003) describes a suppressor as a variable, knowing which will suddenly create a strong correlation between the predictors. [sent-149, score-0.121]

58 Formally, the condition is: corr(yi , yj | 1 ∪ 2 ) ≤ corr(yi , yj | 1 ) ∀vi , vj ∈ v, ∀ 1 , 2 ⊂ v. [sent-155, score-0.11]

59 It is well known for Markov random fields that the labels of two nodes on a graph become independent given labels of their Markov blanket. [sent-159, score-0.418]

60 Here we establish that GRF boasts more than that: the correlation between any two nodes decreases as more nodes get labeled, even before a Markov blanket is formed. [sent-160, score-0.435]

61 1 Both the V/Σ-optimality are approximations to the 0/1 risk minimization objective. [sent-167, score-0.19]

62 Unfortunately, we cannot theoretically reason why greedy Σ-optimality outperforms V-optimality in the experiments. [sent-168, score-0.16]

63 9) suggest that both the greedy Σ/V-optimality selects nodes that (1) have high variance and (2) are highly correlated to high-variance nodes, conditioned on the labeled nodes. [sent-176, score-0.38]

64 So, the Σ-optimality additionally favors nodes that (3) have consistent global influence, i. [sent-184, score-0.212]

65 Although the properties of V-optimality fall into the more general class of spectral functions (Friedland & Gaubert, 2011), we have seen no proof of either the suppressor-free condition or the submodularity of Σ-optimality on GRFs. [sent-202, score-0.115]

66 The GRF prediction operator L−1 Lul maps y ∈ [0, 1]| | to yu = −L−1 Lul y ∈ ˆ u u [0, 1]|u| . [sent-215, score-0.098]

67 ˆ u As both Lu ≥ 0 and −Lul ≥ 0, we have y ≥ 0 ⇒ yu ≥ 0 and y ≥ y ⇒ yu ≥ yu . [sent-228, score-0.294]

68 1, Figure 1 shows the first few nodes selected by different optimality criteria. [sent-254, score-0.269]

69 This graph is constructed by a breadth-first search from a random node in a larger DBLP coauthorship network graph that we will introduce in the next section. [sent-255, score-0.338]

70 On this toy graph, both criteria pick the same center node to query first. [sent-256, score-0.283]

71 However, for the second and third queries, Voptimality weighs the uncertainty of the candidate node more, choosing outliers, whereas Σ-optimality favors nodes with universal influence over the graph and goes to cluster centers. [sent-257, score-0.362]

72 The node labels were first simulated using the model in order to compare the active learning criteria directly without raising questions of model fit. [sent-262, score-0.434]

73 We simulated the binary labels with the GRF-sigmoid model and performed active learning with the GRF / LP model for predictions. [sent-264, score-0.253]

74 05, which maximizes the average classification accuracy increases from 50 random training nodes to 200 random training nodes using the GRF / LP model for predictions. [sent-267, score-0.358]

75 Figure 2 shows the binary classification accuracy versus the number of queries on both the DBLP coauthorship graph 6 0. [sent-268, score-0.222]

76 and the CORA citation graph that we will describe below. [sent-291, score-0.182]

77 Figure 2 can be a surprise due to the reasoning behind the L2 surrogate loss, especially when the predictive means are trapped between [−1, 1], but we see here that our reasoning in Sections (3. [sent-293, score-0.124]

78 1) can lead to the greedy survey loss actually making a better active learning objective. [sent-295, score-0.374]

79 Despite the fact that larger β and δ increase label independence on the graph structure and undermine the effectiveness of both V/Σ-optimality heuristics, we have seen that whenever the V-optimality establishes a superiority over random selections, Σ-optimality yields better performance. [sent-297, score-0.141]

80 6 Real-World Experiments The active learning heuristics to be compared are:4 1. [sent-298, score-0.18]

81 The new Σ-optimality with greedy sequential updates: minv 1 (Luk \{v } )−1 1 . [sent-299, score-0.201]

82 , 2008): maxv L−1 v ,v (L k ∪{v } )−1 uk (1) maxv yv ˆ v ,v (2) yv . [sent-304, score-0.22]

83 Selected nodes maximize (1) the average prediction confidence in expectation: maxv Eyv ˆ yk . [sent-309, score-0.298]

84 5 The nodes represent scholars and the weighted edges are the number of papers bearing both scholars’ names. [sent-315, score-0.272]

85 The largest connected component has 1711 nodes and 2898 edges. [sent-316, score-0.179]

86 The node labels were hand assigned in Ji & Han (2012) to one of the four expertise areas of the scholars: machine learning, data mining, information retrieval, and databases. [sent-317, score-0.163]

87 6 This is a citation graph of 2708 publications, each of which is classified into one of seven classes: case based, genetic algorithms, neural networks, probabilistic methods, reinforcement learning, rule learning, and theory. [sent-321, score-0.182]

88 We took its largest connected component, with 2485 nodes and 5069 undirected and unweighted edges. [sent-323, score-0.228]

89 6 This is another citation graph of 3312 publications, each of which is classified into one of six classes: agents, artificial intelligence, databases, information retrieval, machine learning, human computer interaction. [sent-367, score-0.182]

90 We took its largest connected component, with 2109 nodes and 3665 undirected and unweighted edges. [sent-369, score-0.228]

91 The node choices by both criteria were also visually inspected after embedding the graph to the 2-dimensional space using OpenOrd method developed by Martin et al. [sent-374, score-0.274]

92 We also performed real-world experiments on the root-mean-square-error of the class proportion estimations, which is the survey risk that the Σ-optimality minimizes. [sent-377, score-0.214]

93 7 Conclusion For active learning on GRFs, it is common to use variance minimization criteria with greedy onestep lookahead heuristics. [sent-380, score-0.529]

94 V-optimality and Σ-optimality are two criteria based on statistics of the predictive covariance matrix. [sent-381, score-0.217]

95 They both are also risk minimization criteria: V-optimality minimizes the L2 risk (2. [sent-382, score-0.335]

96 Therefore, risk reduction is submodular and the greedy one-step lookahead heuristics can achieve a (1 − 1/e) global optimality ratio. [sent-389, score-0.605]

97 While the V-optimality on GRFs inherits from label propagation (and random walk with absorptions) and have good empirical performance, it is not directly minimizing the 0/1 classification risk. [sent-391, score-0.126]

98 A variance minimization criterion to active learning on graphs. [sent-409, score-0.284]

99 Learning from labeled and unlabeled data with label propagation. [sent-436, score-0.174]

100 Combining active learning and semisupervised learning using gaussian fields and harmonic functions. [sent-439, score-0.228]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('grf', 0.513), ('grfs', 0.326), ('active', 0.18), ('nodes', 0.179), ('yui', 0.163), ('greedy', 0.16), ('risk', 0.145), ('eer', 0.14), ('lul', 0.14), ('lij', 0.125), ('loo', 0.123), ('lu', 0.101), ('yu', 0.098), ('graph', 0.093), ('mig', 0.093), ('scholars', 0.093), ('criteria', 0.091), ('optimality', 0.09), ('node', 0.09), ('dblp', 0.089), ('citation', 0.089), ('opt', 0.088), ('rv', 0.085), ('unlabeled', 0.085), ('submodularity', 0.083), ('han', 0.083), ('predictive', 0.083), ('suppressor', 0.082), ('submodular', 0.079), ('cora', 0.076), ('labels', 0.073), ('cvv', 0.07), ('friedland', 0.07), ('gaubert', 0.07), ('lii', 0.07), ('lji', 0.07), ('surveying', 0.07), ('unc', 0.07), ('walker', 0.068), ('queries', 0.067), ('zhu', 0.064), ('lp', 0.062), ('coauthorship', 0.062), ('yk', 0.062), ('ji', 0.061), ('krause', 0.06), ('criterion', 0.059), ('query', 0.058), ('classi', 0.057), ('maxv', 0.057), ('lookahead', 0.053), ('yv', 0.053), ('kempe', 0.053), ('garnett', 0.051), ('undirected', 0.049), ('nemhauser', 0.049), ('harmonic', 0.048), ('label', 0.048), ('publications', 0.047), ('luk', 0.047), ('openord', 0.047), ('shur', 0.047), ('suppressors', 0.047), ('voptimality', 0.047), ('settles', 0.045), ('rhs', 0.045), ('minimization', 0.045), ('reduction', 0.045), ('toy', 0.044), ('yi', 0.043), ('covariance', 0.043), ('nonnegative', 0.043), ('das', 0.043), ('rand', 0.043), ('ui', 0.042), ('wij', 0.042), ('absorptions', 0.041), ('minv', 0.041), ('cvt', 0.041), ('surrogate', 0.041), ('lemma', 0.041), ('labeled', 0.041), ('fields', 0.039), ('propagation', 0.039), ('vt', 0.039), ('yj', 0.039), ('walk', 0.039), ('correlation', 0.039), ('blanket', 0.038), ('xiaojin', 0.038), ('psd', 0.038), ('gp', 0.036), ('monotonicity', 0.036), ('roman', 0.036), ('proportion', 0.035), ('survey', 0.034), ('laplacian', 0.033), ('global', 0.033), ('condition', 0.032), ('citeseer', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

2 0.16380911 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

3 0.16075775 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

4 0.10937922 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1

5 0.10257807 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu

Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classifier drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classifier selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a fixed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random fields and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classification task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1

6 0.10219528 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

7 0.10127161 171 nips-2013-Learning with Noisy Labels

8 0.093864687 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

9 0.093177937 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

10 0.088334151 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

11 0.088270806 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

12 0.075047858 66 nips-2013-Computing the Stationary Distribution Locally

13 0.072765365 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

14 0.072156861 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

15 0.072030753 60 nips-2013-Buy-in-Bulk Active Learning

16 0.071737573 289 nips-2013-Scalable kernels for graphs with continuous attributes

17 0.070849285 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

18 0.070090987 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

19 0.069433123 268 nips-2013-Reflection methods for user-friendly submodular optimization

20 0.068564296 7 nips-2013-A Gang of Bandits


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.189), (1, 0.036), (2, -0.008), (3, -0.002), (4, 0.096), (5, 0.112), (6, -0.059), (7, -0.159), (8, 0.019), (9, 0.036), (10, -0.01), (11, -0.115), (12, 0.036), (13, 0.013), (14, -0.002), (15, -0.148), (16, -0.123), (17, 0.13), (18, 0.05), (19, -0.091), (20, -0.007), (21, 0.013), (22, 0.01), (23, -0.072), (24, 0.086), (25, 0.013), (26, -0.01), (27, 0.112), (28, -0.061), (29, -0.012), (30, -0.001), (31, 0.02), (32, -0.025), (33, -0.008), (34, -0.041), (35, 0.024), (36, 0.036), (37, -0.033), (38, -0.037), (39, 0.008), (40, 0.046), (41, -0.054), (42, -0.015), (43, 0.04), (44, -0.064), (45, 0.007), (46, 0.014), (47, 0.043), (48, -0.01), (49, -0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.957196 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

2 0.70263898 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.69902718 149 nips-2013-Latent Structured Active Learning

Author: Wenjie Luo, Alex Schwing, Raquel Urtasun

Abstract: In this paper we present active learning algorithms in the context of structured prediction problems. To reduce the amount of labeling necessary to learn good models, our algorithms operate with weakly labeled data and we query additional examples based on entropies of local marginals, which are a good surrogate for uncertainty. We demonstrate the effectiveness of our approach in the task of 3D layout prediction from single images, and show that good models are learned when labeling only a handful of random variables. In particular, the same performance as using the full training set can be obtained while only labeling ∼10% of the random variables. 1

4 0.69424832 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

Author: Sivan Sabato, Anand D. Sarwate, Nati Srebro

Abstract: We propose a learning setting in which unlabeled data is free, and the cost of a label depends on its value, which is not known in advance. We study binary classification in an extreme case, where the algorithm only pays for negative labels. Our motivation are applications such as fraud detection, in which investigating an honest transaction should be avoided if possible. We term the setting auditing, and consider the auditing complexity of an algorithm: the number of negative labels the algorithm requires in order to learn a hypothesis with low relative error. We design auditing algorithms for simple hypothesis classes (thresholds and rectangles), and show that with these algorithms, the auditing complexity can be significantly lower than the active label complexity. We also show a general competitive approach for learning with outcome-dependent costs. 1

5 0.62780398 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1

6 0.62435734 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

7 0.61478335 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

8 0.60421813 60 nips-2013-Buy-in-Bulk Active Learning

9 0.56324583 279 nips-2013-Robust Bloom Filters for Large MultiLabel Classification Tasks

10 0.54905498 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

11 0.54869699 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

12 0.53102332 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

13 0.52504534 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

14 0.52179325 350 nips-2013-Wavelets on Graphs via Deep Learning

15 0.51154244 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

16 0.50835609 171 nips-2013-Learning with Noisy Labels

17 0.50437242 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

18 0.47663713 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

19 0.47136006 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

20 0.46673799 7 nips-2013-A Gang of Bandits


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.018), (16, 0.036), (33, 0.115), (34, 0.083), (41, 0.027), (49, 0.035), (56, 0.123), (70, 0.039), (76, 0.286), (85, 0.048), (89, 0.035), (93, 0.028), (95, 0.063)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77475166 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling

Author: Mahito Sugiyama, Karsten Borgwardt

Abstract: Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search. 1

same-paper 2 0.76684552 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1

3 0.71754199 14 nips-2013-A Stability-based Validation Procedure for Differentially Private Machine Learning

Author: Kamalika Chaudhuri, Staal A. Vinterbo

Abstract: Differential privacy is a cryptographically motivated definition of privacy which has gained considerable attention in the algorithms, machine-learning and datamining communities. While there has been an explosion of work on differentially private machine learning algorithms, a major barrier to achieving end-to-end differential privacy in practical machine learning applications is the lack of an effective procedure for differentially private parameter tuning, or, determining the parameter value, such as a bin size in a histogram, or a regularization parameter, that is suitable for a particular application. In this paper, we introduce a generic validation procedure for differentially private machine learning algorithms that apply when a certain stability condition holds on the training algorithm and the validation performance metric. The training data size and the privacy budget used for training in our procedure is independent of the number of parameter values searched over. We apply our generic procedure to two fundamental tasks in statistics and machine-learning – training a regularized linear classifier and building a histogram density estimator that result in end-toend differentially private solutions for these problems. 1

4 0.70690346 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Author: Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Isbell, Andrea L. Thomaz

Abstract: A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.

5 0.69649822 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

Author: Kohei Hayashi, Ryohei Fujimaki

Abstract: This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has the same form as those of mixture models. FAB/LFMs have several desirable properties (e.g., automatic hidden states selection and parameter identifiability) and empirically perform better than state-of-the-art Indian Buffet processes in terms of model selection, prediction, and computational efficiency. 1

6 0.62963486 149 nips-2013-Latent Structured Active Learning

7 0.6255455 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

8 0.59287244 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

9 0.58622169 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

10 0.58571714 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

11 0.58305687 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

12 0.57976079 268 nips-2013-Reflection methods for user-friendly submodular optimization

13 0.57913452 184 nips-2013-Marginals-to-Models Reducibility

14 0.57876956 249 nips-2013-Polar Operators for Structured Sparse Estimation

15 0.57774657 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

16 0.57761997 355 nips-2013-Which Space Partitioning Tree to Use for Search?

17 0.57506645 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

18 0.57450646 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

19 0.57426476 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

20 0.57408589 309 nips-2013-Statistical Active Learning Algorithms