nips nips2012 nips2012-10 knowledge-graph by maker-knowledge-mining

10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification


Source: pdf

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. [sent-2, score-0.294]

2 We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. [sent-3, score-0.44]

3 More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. [sent-4, score-0.312]

4 From a mathematical point of view, signed networks are graphs whose edges carry a sign representing the positive or negative nature of the relationship between the incident nodes. [sent-7, score-0.523]

5 The domain of social networks and e-commerce offers several examples of signed relationships: Slashdot users can tag other users as friends or foes, Epinions users can rate other users positively or negatively, Ebay users develop trust and distrust towards sellers in the network. [sent-9, score-0.497]

6 The availability of signed networks has stimulated the design of link classification algorithms, especially in the domain of social networks. [sent-11, score-0.393]

7 Early studies of signed social networks are from the Fifties. [sent-12, score-0.317]

8 , [8] and [1] model dislike and distrust relationships among individuals as (signed) weighted edges in a graph. [sent-15, score-0.255]

9 Many heuristics for link classification in social networks are based on a form of social balance summarized by the motto “the enemy of my enemy is my friend”. [sent-20, score-0.409]

10 This is equivalent to saying that the signs on the edges of a social graph tend to be consistent with some twoclustering of the nodes. [sent-21, score-0.492]

11 By consistency we mean the following: The nodes of the graph can be partitioned into two sets (the two clusters) in such a way that edges connecting nodes ∗ This work was supported in part by the PASCAL2 Network of Excellence under EC grant 216886 and by “Dote Ricerca”, FSE, Regione Lombardia. [sent-22, score-0.462]

12 1 from the same set are positive, and edges connecting nodes from different sets are negative. [sent-24, score-0.304]

13 Despite that, social network theorists and practitioners found this to be a reasonable bias in many social contexts, and recent experiments with online social networks reported a good predictive power for algorithms based on the twoclustering assumption [11, 13, 14, 3]. [sent-26, score-0.335]

14 In the case of undirected signed graphs G = (V, E), the best performing heuristics exploiting the two-clustering bias are based on spectral decompositions of the signed adiacency matrix. [sent-28, score-0.54]

15 In order to obtain scalable algorithms with formal performance guarantees, we focus on the active learning protocol, where training labels are obtained by querying a desired subset of edges. [sent-30, score-0.296]

16 In the recent work [2], a simple stochastic model for generating edge labels by perturbing some unknown two-clustering of the graph nodes was introduced. [sent-32, score-0.385]

17 For this model, the authors proved that querying the edges of a low-stretch spanning tree of the input graph G = (V, E) is sufficient to predict the remaining edge labels making a number of mistakes within a factor of order (log |V |)2 log log |V | from the theoretical optimum. [sent-33, score-1.096]

18 Second, the tree-based analysis of [2] does not generalize to query budgets larger than |V | − 1 (the edge set size of a spanning tree). [sent-36, score-0.436]

19 In this paper we introduce a different active learning approach for link classification that can accomodate a large spectrum of query budgets. [sent-37, score-0.305]

20 We show that on any graph with Ω(|V |3/2 ) edges, a query budget of O(|V |3/2 ) is sufficient to predict the remaining edge labels within a constant factor from the optimum. [sent-38, score-0.551]

21 Hence, a query budget of Θ(|V |), of the same order as the algorithm based on low-strech trees, achieves an optimality factor O(|V |1/3 ) with a running time of just O(|E|). [sent-40, score-0.272]

22 Edge labels can collectively be represented by the associated signed adjacency matrix Y , where Yi,j = 0 whenever (i, j) ∈ E. [sent-44, score-0.316]

23 We define a simple stochastic model for assigning binary labels Y to the edges of G. [sent-46, score-0.338]

24 Hence, our stochastic labeling model assumes that edge labels are obtained by perturbing an underlying labeling which is initially consistent with an arbitrary (and unknown) two-clustering. [sent-49, score-0.299]

25 More formally, given an undirected and connected graph G = (V, E), the labels Yi,j ∈ {−1, +1}, for (i, j) ∈ E, are assigned as follows. [sent-50, score-0.272]

26 First, the nodes in V are arbitrarily partitioned into two sets, and labels Yi,j are initially assigned consistently with this partition (within-cluster edges are positive and between-cluster edges are negative). [sent-51, score-0.619]

27 Note that the consistency is equivalent to the following multiplicative rule: For any (i, j) ∈ E, the label Yi,j is equal to the product of signs on the edges of any path connecting i to j in G. [sent-52, score-0.396]

28 A learning algorithm in the link classification setting receives a training set of signed edges and, out of this information, builds a prediction model for the labels of the remaining edges. [sent-57, score-0.669]

29 Then, a set of 2p|E| edges is selected uniformly at random and the labels of these edges are set randomly (i. [sent-63, score-0.563]

30 An active learner for link classification first constructs a query set E0 of edges, and then receives the labels of all edges in the query set. [sent-70, score-0.847]

31 Based on this training information, the learner builds a prediction model for the labels of the remaining edges E \ E0 . [sent-71, score-0.433]

32 We assume that the only labels ever revealed to the learner are those in the query set. [sent-72, score-0.317]

33 It is clear from Fact 1 that any active learning algorithm that queries the labels of at most a constant fraction of the total number of edges will make on average Ω(p|E|) mistakes. [sent-74, score-0.486]

34 We often write VG and EG to denote, respectively, the node set and the edge set of some underlying graph G. [sent-75, score-0.254]

35 , [4]), which turns out to be useful in the important special case when the active learner is afforded to query only |V | − 1 labels. [sent-90, score-0.272]

36 Let Eflip ⊂ E denote the (random) subset of edges whose labels have been flipped in a p-stochastic assignment, and consider the following class of active learning algorithms parameterized by an arbitrary spanning tree T = (VT , ET ) of G. [sent-91, score-0.663]

37 Hence, the number of mistakes MT made by our active learner on the set of test edges E \ ET can be deterministically bounded by MT ≤ |Eflip | + I e ∈ PathT (e ) I e ∈ Eflip (1) e ∈E\ET e∈E where I · denotes the indicator of the Boolean predicate at argument. [sent-95, score-0.473]

38 A quantity which can be related to MT is the average stretch of a spanning tree T which, for our purposes, reduces to 1 . [sent-96, score-0.31]

39 e ∈E\ET PathT (e ) |E| |V | − 1 + 3 A stunning result of [4] shows that every connected, undirected and unweighted graph has a spanning tree with an average stretch of just O log2 |V | log log |V | . [sent-97, score-0.439]

40 If our active learner uses a spanning tree with the same low stretch, then the following result holds. [sent-98, score-0.368]

41 Let (G, Y ) = ((V, E), Y ) be a labeled graph with p-stochastic assigned labels Y . [sent-100, score-0.242]

42 If the active learner queries the edges of a spanning tree T = (VT , ET ) with average stretch O log2 |V | log log |V | , then E MT ≤ p|E| × O log2 |V | log log |V | . [sent-101, score-0.687]

43 Recall that Fact 1 implies that this factor cannot be smaller than a constant when the query set size is a constant fraction of |E|. [sent-103, score-0.237]

44 A key aspect in the analysis of prediction performance is the ability to select a query set so that each test edge creates a short circuit with a training path. [sent-107, score-0.303]

45 Given a test edge (i, j) and a path Path(i, j) whose edges are queried edges, we say that we are predicting label Yi,j using path Path(i, j) Since (i, j) closes Path(i, j) into a circuit, in this case we also say that (i, j) is predicted using the circuit. [sent-110, score-0.664]

46 Let (G, Y ) = ((V, E), Y ) be a labeled graph with p-stochastic assigned labels Y . [sent-112, score-0.242]

47 Given query set E0 ⊆ E, the number M of mistakes made when predicting test edges (i, j) ∈ E \ E0 using training paths Path(i, j) whose length is uniformly bounded by satisfies EM ≤ p |E \ E0 | . [sent-113, score-0.576]

48 (i,j)∈E\E0 1 − (1 − p) ≤ For instance, if the input graph G = (V, E) has diameter DG and the queried edges are those of a breadth-first spanning tree, which can be generated in O(|E|) time, then the above fact holds with |E0 | = |V | − 1, and = 2 DG . [sent-116, score-0.603]

49 Hence, one may think of adding to the training spanning tree new edges so as to reduce the length of the circuits used for prediction, at the cost of increasing the size of the query set. [sent-121, score-0.671]

50 The precise tradeoff between prediction accuracy (as measured by the expected number of mistakes) and fraction of queried edges is the main theoretical concern of this paper. [sent-123, score-0.347]

51 In particular, we demonstrate that treeCutter achieves a good upper bound on the number of mistakes on any graph such that |E| ≥ 3|V | + |V |. [sent-125, score-0.239]

52 Moreover, the total time for predicting the test edges scales linearly with the number of such edges, i. [sent-127, score-0.25]

53 The actual setting of k depends on the graph topology and the desired fraction of query set edges, and plays a crucial role in determining the prediction performance. [sent-132, score-0.302]

54 Setting k ≤ DG makes treeCutter reduce to querying only the edges of a breadth-first spanning tree of G, otherwise it operates in a more involved way by splitting G into smaller node-disjoint subtrees. [sent-133, score-0.569]

55 4 In a preliminary step (Line 1 in Figure 1), treeCutter draws an arbitrary breadth-first spanning tree T = (VT , ET ). [sent-134, score-0.257]

56 Then treeCutter removes (Line 6) Ti from T along with all edges of ET which are incident to nodes of Ti , and then iterates until VT gets empty. [sent-141, score-0.323]

57 For each T ∈ T , the algorithm queries all the labels of ET , each edge (i, j) ∈ EG \ ET such that i, j ∈ VT is set to be a test edge, and label Yi,j is predicted using PathT (i, j) (note that this coincides with PathT (i, j), since T ⊆ T ), that ˆ is, Yi,j = πT (i, j). [sent-144, score-0.333]

58 , such that EG (T , T ) is not empty, we query the label of an arbitrarily selected edge (i , i ) ∈ EG (T , T ) (Lines 8 and 9 in Figure 1). [sent-147, score-0.315]

59 Each edge (u, v) ∈ EG (T , T ) whose label has not been previously queried is then part of the ˆ test set, and its label will be predicted as Yu,v ← πT (u, i ) · Yi ,i · πT (i , v) (Line 11). [sent-148, score-0.302]

60 Draw an arbitrary breadth-first spanning tree T of G 2. [sent-153, score-0.257]

61 T ← extractTreelet(T, k), and query all labels in ET 4. [sent-155, score-0.274]

62 If EG (T , T ) ≡ ∅ query the label of an arbitrary edge (i , i ) ∈ EG (T , T ) 10. [sent-161, score-0.315]

63 For any integer k ≥ 2, the number M of mistakes made by treeCutter on |2 any graph G(V, E) with |E| ≥ 2|V | − 2 + |V 2 + |V | satisfies EM ≤ min{4k + 1, 2DG }p|E|, k k while the query set size is bounded by |V | − 1 + |V |2 2k2 + |V | 2k ≤ |E| 2 . [sent-180, score-0.424]

64 This new subroutine just selects the star T centered on the node of G having largest degree, and queries all labels of the edges in ET . [sent-186, score-0.417]

65 The next result shows that this algorithm gets a constant optimality factor while using a query set of size O(|V |3/2 ). [sent-187, score-0.272]

66 The number M of mistakes made by starMaker on any given graph G(V, E) 3 with |E| ≥ 2|V | − 2 + 2|V | 2 satisfies EM ≤ 5 p|E|, while the query set size is upper bounded 3 by |V | − 1 + |V | 2 ≤ |E| . [sent-189, score-0.4]

67 Lines 7–11 are instead replaced by the following procedure: a graph G = (VG , EG ) is created such that: (1) each node in VG corresponds to a tree in T , (2) there exists an edge in EG if and only if the two corresponding trees of T are connected by at least one edge of EG . [sent-192, score-0.565]

68 Finally, for each pair of distinct stars S , S ∈ S connected by at least one edge in EG , the label of an arbitrary edge in EG (S , S ) is queried. [sent-196, score-0.331]

69 For any integer k ≥ 2 and for any graph G = (V, E) with |E| ≥ 2|V | − 2 + 2 |V |−1 k +1 3 2 , the number M of mistakes made by treeletStar(k) on G satisfies EM = O(min{k, DG }) p|E|, while the query set size is bounded by |V | − 1 + |V |−1 k +1 3 2 ≤ |E| 2 . [sent-199, score-0.424]

70 On the other hand, a truly constant optimality factor is obtained by querying as few as O(|V |3/2 ) edges (provided the graph has sufficiently many edges). [sent-201, score-0.525]

71 As a direct consequence (and surprisingly enough), on graphs which are only moderately dense we need not observe too many edges in order to achieve a constant optimality factor. [sent-202, score-0.299]

72 It is instructive to compare the bounds obtained by treeletStar to the ones we can achieve by using the cccc algorithm of [2], or the low-stretch spanning trees given in Theorem 1. [sent-203, score-0.264]

73 2 3 The resulting optimality factor is of order 1−α 2 |V |, where α ∈ (0, 1] is the fraction of α queried edges out of the total number of edges. [sent-205, score-0.458]

74 For instance, in order to obtain an optimality factor which is lower than |V |, cccc has to query in the worst case a fraction of edges that goes to one as |V | → ∞. [sent-207, score-0.596]

75 Next, we compare to query sets produced by low-stretch spanning trees. [sent-210, score-0.322]

76 A low-stretch spanning tree achieves a polylogarithmic optimality factor by querying |V | − 1 edge labels. [sent-211, score-0.569]

77 The results in [4] show that we cannot hope to get a better optimality factor using a single low-stretch spanning tree combined by the analysis in (1). [sent-212, score-0.368]

78 However, we can get a constant optimality factor by increasing the query set size to O(|V |3/2 ). [sent-214, score-0.272]

79 Recall the different lower bound conditions on the graph density that must hold to ensure that the |2 query set size is not larger than the test set size. [sent-218, score-0.263]

80 This corresponds to querying only the edges of the initial spanning tree T and predicting all remaining edges (i, j) via the parity of PathT (i, j). [sent-227, score-0.883]

81 The spanning tree T used by treeCutter is a shortest-path spanning tree generated by a breadth-first visit of the graph (assuming all edges have unit length). [sent-228, score-0.907]

82 Each path has value 0 if it contains at least one test edge, otherwise its value equals the product of queried labels on the path edges. [sent-238, score-0.348]

83 We performed a first set of experiments on synthetic signed graphs created from a subset of the USPS digit recognition dataset. [sent-241, score-0.231]

84 The edges were labeled as follows: all edges incident to nodes with the same USPS label were labeled +1; all edges incident to nodes with different USPS labels were labeled −1. [sent-244, score-1.105]

85 Finally, we randomly pruned the positive edges so to achieve an unbalance of about 20% between the two classes. [sent-245, score-0.27]

86 3 Starting from this edge label assignment, which is consistent with the two-clustering associated with the USPS labels, we generated a p-stochastic label assignment by flipping the labels of a random subset of the edges. [sent-246, score-0.347]

87 Specifically, we used the three following synthetic datasets: DELTA0: No flippings (p = 0), 1,000 nodes and 9,138 edges; DELTA100: 100 randomly chosen labels of DELTA0 are flipped; DELTA250: 250 randomly chosen labels of DELTA0 are flipped. [sent-247, score-0.282]

88 We also used three real-world datasets: MOVIELENS: A signed graph we created using Movielens ratings. [sent-248, score-0.333]

89 This matrix was sparsified by 3 4 This is similar to the class unbalance of real-world signed networks —see below. [sent-251, score-0.274]

90 So on DELTA0 treeCutter does not make mistakes whenever the training set contains at least one spanning tree. [sent-269, score-0.326]

91 The resulting graph has 6,040 nodes and 824,818 edges (12. [sent-275, score-0.383]

92 This graph has 26,996 nodes and 290,509 edges (24. [sent-278, score-0.383]

93 EPINIONS: The biggest strongly connected component of a snapshot of the Epinions signed network,6 similar to the one used in [13, 12]. [sent-280, score-0.233]

94 This graph has 41,441 nodes and 565,900 edges (26. [sent-281, score-0.383]

95 We removed the reciprocal edges with mismatching labels (which turned out to be only a few), and considered the remaining edges as undirected. [sent-284, score-0.587]

96 is the fraction of negative edges, |V |/|E| is the fraction of edges queried by treeCutter(|V |), and Avgdeg is the average degree of the nodes of the network. [sent-286, score-0.442]

97 treeCutter(|V |) uses a single spanning tree, and thus we only have a single query set size value. [sent-309, score-0.322]

98 A correlation clustering approach to link classification in signed networks. [sent-337, score-0.279]

99 Exploiting longer cycles for link prediction in signed networks. [sent-344, score-0.279]

100 Computing global structural balance in large-scale signed social networks. [sent-358, score-0.322]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('asymexp', 0.509), ('treecutter', 0.434), ('edges', 0.225), ('signed', 0.203), ('eg', 0.163), ('query', 0.161), ('spanning', 0.161), ('patht', 0.15), ('mistakes', 0.137), ('vt', 0.124), ('epinions', 0.12), ('treeletstar', 0.12), ('ytrain', 0.12), ('edge', 0.114), ('labels', 0.113), ('slashdot', 0.105), ('ipped', 0.103), ('graph', 0.102), ('tree', 0.096), ('extracttreelet', 0.09), ('starmaker', 0.09), ('social', 0.088), ('querying', 0.087), ('queried', 0.083), ('link', 0.076), ('path', 0.076), ('optimality', 0.074), ('active', 0.068), ('vg', 0.066), ('visit', 0.066), ('dg', 0.064), ('di', 0.063), ('cccc', 0.06), ('dipartimento', 0.06), ('nodes', 0.056), ('movielens', 0.055), ('stretch', 0.053), ('uz', 0.053), ('ht', 0.052), ('heuristics', 0.048), ('erent', 0.047), ('childrent', 0.045), ('huttenlocher', 0.045), ('twoclustering', 0.045), ('unbalance', 0.045), ('trees', 0.043), ('learner', 0.043), ('incident', 0.042), ('subtrees', 0.042), ('queries', 0.041), ('label', 0.04), ('assignment', 0.04), ('degli', 0.04), ('milano', 0.04), ('parity', 0.04), ('studi', 0.04), ('italy', 0.039), ('fraction', 0.039), ('node', 0.038), ('factor', 0.037), ('usps', 0.036), ('labeling', 0.036), ('ip', 0.036), ('rooted', 0.033), ('kleinberg', 0.033), ('mistake', 0.033), ('stars', 0.033), ('signs', 0.032), ('diameter', 0.032), ('su', 0.031), ('balance', 0.031), ('users', 0.03), ('adiacency', 0.03), ('avgdeg', 0.03), ('distrust', 0.03), ('harary', 0.03), ('informatica', 0.03), ('zappella', 0.03), ('connected', 0.03), ('spectral', 0.029), ('leskovec', 0.029), ('universit', 0.029), ('mt', 0.028), ('training', 0.028), ('created', 0.028), ('labeled', 0.027), ('undirected', 0.027), ('sign', 0.027), ('subtree', 0.027), ('ecai', 0.026), ('enemy', 0.026), ('heider', 0.026), ('vitale', 0.026), ('networks', 0.026), ('predicted', 0.025), ('predicting', 0.025), ('amortized', 0.024), ('remaining', 0.024), ('integer', 0.024), ('connecting', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

2 0.10451657 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

3 0.10277354 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

4 0.091279767 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain

Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1

5 0.086495563 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

6 0.085088119 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

7 0.082082726 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

8 0.078952439 180 nips-2012-Learning Mixtures of Tree Graphical Models

9 0.068651535 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

10 0.064084865 32 nips-2012-Active Comparison of Prediction Models

11 0.062955528 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

12 0.062047582 215 nips-2012-Minimizing Uncertainty in Pipelines

13 0.058318365 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

14 0.057667099 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

15 0.057433948 280 nips-2012-Proper losses for learning from partial labels

16 0.056638673 346 nips-2012-Topology Constraints in Graphical Models

17 0.055333767 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

18 0.053031787 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

19 0.052460242 194 nips-2012-Learning to Discover Social Circles in Ego Networks

20 0.052000657 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.139), (1, 0.026), (2, 0.005), (3, -0.036), (4, -0.025), (5, -0.044), (6, -0.008), (7, -0.029), (8, -0.161), (9, 0.094), (10, -0.009), (11, 0.015), (12, 0.002), (13, 0.005), (14, -0.024), (15, 0.009), (16, -0.009), (17, 0.031), (18, -0.001), (19, 0.059), (20, -0.079), (21, 0.053), (22, 0.039), (23, -0.018), (24, 0.003), (25, 0.018), (26, -0.025), (27, -0.026), (28, 0.036), (29, -0.09), (30, -0.053), (31, -0.036), (32, -0.039), (33, -0.0), (34, -0.039), (35, -0.128), (36, -0.016), (37, -0.006), (38, 0.029), (39, -0.017), (40, -0.004), (41, -0.017), (42, -0.038), (43, 0.041), (44, 0.05), (45, 0.022), (46, 0.01), (47, 0.065), (48, -0.003), (49, -0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94703293 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

2 0.69385093 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Author: Anima Anandkumar, Ragupathyraj Valluvan

Abstract: Graphical model selection refers to the problem of estimating the unknown graph structure given observations at the nodes in the model. We consider a challenging instance of this problem when some of the nodes are latent or hidden. We characterize conditions for tractable graph estimation and develop efficient methods with provable guarantees. We consider the class of Ising models Markov on locally tree-like graphs, which are in the regime of correlation decay. We propose an efficient method for graph estimation, and establish its structural consistency −δη(η+1)−2 when the number of samples n scales as n = Ω(θmin log p), where θmin is the minimum edge potential, δ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and η is a parameter which depends on the minimum and maximum node and edge potentials in the Ising model. The proposed method is practical to implement and provides flexibility to control the number of latent variables and the cycle lengths in the output graph. We also present necessary conditions for graph estimation by any method and show that our method nearly matches the lower bound on sample requirements. Keywords: Graphical model selection, latent variables, quartet methods, locally tree-like graphs. 1

3 0.66257715 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

Author: Antonino Freno, Mikaela Keller, Marc Tommasi

Abstract: Statistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches.

4 0.66256958 180 nips-2012-Learning Mixtures of Tree Graphical Models

Author: Anima Anandkumar, Furong Huang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable is hidden and each mixture component can have a potentially different Markov graph structure and parameters over the observed variables. We propose a novel method for estimating the mixture components with provable guarantees. Our output is a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. The sample and computational requirements for our method scale as poly(p, r), for an r-component mixture of pvariate graphical models, for a wide class of models which includes tree mixtures and mixtures over bounded degree graphs. Keywords: Graphical models, mixture models, spectral methods, tree approximation.

5 0.62525505 215 nips-2012-Minimizing Uncertainty in Pipelines

Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi

Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1

6 0.62465632 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

7 0.62092674 260 nips-2012-Online Sum-Product Computation Over Trees

8 0.56945121 346 nips-2012-Topology Constraints in Graphical Models

9 0.56812042 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

10 0.53489852 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

11 0.51061851 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

12 0.48688003 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

13 0.47782266 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

14 0.47475103 32 nips-2012-Active Comparison of Prediction Models

15 0.47254771 194 nips-2012-Learning to Discover Social Circles in Ego Networks

16 0.47245377 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

17 0.47021493 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

18 0.46705747 253 nips-2012-On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks

19 0.46425736 298 nips-2012-Scalable Inference of Overlapping Communities

20 0.4598814 196 nips-2012-Learning with Partially Absorbing Random Walks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.044), (21, 0.021), (27, 0.012), (38, 0.089), (39, 0.013), (42, 0.018), (53, 0.385), (54, 0.022), (55, 0.012), (74, 0.08), (76, 0.082), (80, 0.092), (92, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.75710338 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

same-paper 2 0.72870022 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

3 0.72426593 359 nips-2012-Variational Inference for Crowdsourcing

Author: Qiang Liu, Jian Peng, Alex Ihler

Abstract: Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers’ reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions. 1

4 0.65614921 12 nips-2012-A Neural Autoregressive Topic Model

Author: Hugo Larochelle, Stanislas Lauly

Abstract: We describe a new model for learning meaningful representations of text documents from an unlabeled collection of documents. This model is inspired by the recently proposed Replicated Softmax, an undirected graphical model of word counts that was shown to learn a better generative model and more meaningful document representations. Specifically, we take inspiration from the conditional mean-field recursive equations of the Replicated Softmax in order to define a neural network architecture that estimates the probability of observing a new word in a given document given the previously observed words. This paradigm also allows us to replace the expensive softmax distribution over words with a hierarchical distribution over paths in a binary tree of words. The end result is a model whose training complexity scales logarithmically with the vocabulary size instead of linearly as in the Replicated Softmax. Our experiments show that our model is competitive both as a generative model of documents and as a document representation learning algorithm. 1

5 0.65196121 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

6 0.57565767 30 nips-2012-Accuracy at the Top

7 0.54976088 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

8 0.44591072 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

9 0.44181043 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

10 0.43193451 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

11 0.43086678 160 nips-2012-Imitation Learning by Coaching

12 0.42892542 293 nips-2012-Relax and Randomize : From Value to Algorithms

13 0.42801109 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

14 0.42643809 260 nips-2012-Online Sum-Product Computation Over Trees

15 0.42405984 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

16 0.42013299 298 nips-2012-Scalable Inference of Overlapping Communities

17 0.41995388 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

18 0.41872832 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

19 0.41607279 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family

20 0.41363916 168 nips-2012-Kernel Latent SVM for Visual Recognition