nips nips2013 nips2013-316 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Edoardo M. Airoldi, Thiago B. Costa, Stanley H. Chan
Abstract: Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.
Reference: text
sentIndex sentText sentNum sentScore
1 Stochastic blockmodel approximation of a graphon: Theory and consistent estimation Edoardo M. [sent-1, score-0.097]
2 Statistics Harvard University Abstract Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. [sent-7, score-0.16]
3 This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. [sent-9, score-0.666]
4 In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. [sent-10, score-0.63]
5 This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. [sent-11, score-0.122]
6 We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity. [sent-12, score-1.354]
7 1 Introduction Revealing hidden structures of a graph is the heart of many data analysis problems. [sent-13, score-0.05]
8 From the wellknown small-world network to the recent large-scale data collected from online service providers such as Wikipedia, Twitter and Facebook, there is always a momentum in seeking better and more informative representations of the graphs [1, 14, 29, 3, 26, 12]. [sent-14, score-0.09]
9 The root of the non-parametric model discussed in this paper is in the theory of exchangeable random arrays [2, 15, 16], and it is presented in [11] as a link connecting de Finetti’s work on partial exchangeability and graph limits [20, 6]. [sent-16, score-0.204]
10 In a nutshell, the theory predicts that every convergent sequence of graphs (Gn ) has a limit object that preserves many local and global properties of the graphs in the sequence. [sent-17, score-0.096]
11 To construct an n-vertex random graph G(n, w) for a given w, we first assign a random label ui ∼ Uniform[0, 1] to each vertex i ∈ {1, . [sent-20, score-0.362]
12 , n}, and connect any two vertices i and j with probability w(ui , uj ), i. [sent-23, score-0.338]
13 , Pr (G[i, j] = 1 | ui , uj ) = w(ui , uj ), i, j = 1, . [sent-25, score-0.872]
14 As an example, we note that the stochastic block-model is the case where w(x, y) is a piecewise constant function. [sent-29, score-0.077]
15 The problem of interest is defined as follows: Given a sequence of 2T observed directed graphs G1 , . [sent-30, score-0.064]
16 [19] proposed a Bayesian estimator without a consistency proof; Choi and 1 w G2T w(ui , uj ) × ui uj G1 (ui , uj ) Figure 1: [Left] Given a graphon w : [0, 1]2 → [0, 1], we draw i. [sent-36, score-1.855]
17 samples ui , uj from Uniform[0,1] and assign Gt [i, j] = 1 with probability w(ui , uj ), for t = 1, . [sent-39, score-0.901]
18 [Right] A random graph generated by the graphon shown in the middle. [sent-44, score-0.646]
19 Rows and columns of the graph are ordered by increasing ui , instead of i for better visualization. [sent-45, score-0.298]
20 Wolfe [9] studied the consistency properties, but did not provide algorithms to estimate the graphon. [sent-46, score-0.05]
21 To the best of our knowledge, the only method that estimates graphons consistently, besides ours, is USVT [8]. [sent-47, score-0.136]
22 The proposed approximation procedure requires w to be piecewise Lipschitz. [sent-50, score-0.056]
23 The proposed method is called the stochastic blockmodel approximation (SBA) algorithm, as the idea of using a two-dimensional step function for approximation is equivalent to using the stochastic block models [10, 22, 13, 7, 25]. [sent-52, score-0.238]
24 The SBA algorithm is defined up to permutations of the nodes, so the estimated graphon is not canonical. [sent-53, score-0.615]
25 However, this does not affect the consistency properties of the SBA algorithm, as the consistency is measured w. [sent-54, score-0.064]
26 2 Stochastic blockmodel approximation: Procedure In this section we present the proposed SBA algorithm and discuss its basic properties. [sent-58, score-0.101]
27 1 Assumptions on graphons We assume that w is piecewise Lipschitz, i. [sent-60, score-0.168]
28 , w(u, v) = w(v, u), so that symmetric graphons can be considered as a special case. [sent-68, score-0.136]
29 Consequently, a random graph G(n, w) generated by w is directed, i. [sent-69, score-0.05]
30 2 Similarity of graphon slices The intuition of the proposed SBA algorithm is that if the graphon is smooth, neighboring crosssections of the graphon should be similar. [sent-73, score-1.874]
31 In other words, if two labels ui and uj are close i. [sent-74, score-0.56]
32 , |ui − uj | ≈ 0, then the difference between the row slices |w(ui , ·) − w(uj , ·)| and the column slices |w(·, ui ) − w(·, uj )| should also be small. [sent-76, score-0.996]
33 To measure the similarity between two labels using the graphon slices, we define the following distance dij = 1 2 1 0 1 2 [w(x, ui ) − w(x, uj )] dx + 2 0 2 [w(ui , y) − w(uj , y)] dy . [sent-77, score-1.406]
34 (2) Thus, dij is small only if both row and column slices of the graphon are similar. [sent-78, score-0.874]
35 The usage of dij for graphon estimation will be discussed in the next subsection. [sent-79, score-0.832]
36 But before we proceed, it should be noted that in practice dij has to be estimated from the observed graphs G1 , . [sent-80, score-0.299]
37 To derive an estimator dij of dij , it is helpful to express dij in a way that the estimators can be easily obtained. [sent-84, score-0.667]
38 To this end, we let 1 1 w(x, ui )w(x, uj )dx cij = rij = and 0 w(ui , y)w(uj , y)dy, 0 and express dij as dij = 1 (cii −cij −cji +cjj )+(rii −rij −rji +rjj ) . [sent-85, score-1.125]
39 Inspecting this expression, 2 we consider the following estimators for cij and rij : 1 Gt1 [k, i] Gt2 [k, j] , (3) ck = 2 ij T 1≤t1 ≤T T < ∆2 for some precision parameter ∆ > 0. [sent-86, score-0.148]
40 If dip ,iv < ∆2 , then we assign iv to the same block as ip . [sent-87, score-0.238]
41 However, as will be shown in Section 3, although the clustering algorithm is not globally optimal, the estimated graphon w is still guaranteed to be a consistent estimate of the true graphon w as n → ∞. [sent-94, score-1.243]
42 4 Main algorithm Algorithm 1 Stochastic blockmodel approximation Input: A set of observed graphs G1 , . [sent-97, score-0.141]
43 while Ω = ∅ do Randomly choose a vertex ip from Ω and assign it as the pivot for Bk : Bk ← ip . [sent-109, score-0.165]
44 for Every other vertices iv ∈ Ω\{ip } do Compute the distance estimate dip ,iv . [sent-110, score-0.167]
45 If dip ,iv ≤ ∆2 , then assign iv as a member of Bk : Bk ← iv . [sent-111, score-0.224]
46 end while Algorithm 1 illustrates the pseudo-code for the proposed stochastic block-model approximation. [sent-114, score-0.069]
47 The complexity of this algorithm is O(T SKn), where T is half the number of observations, S is the size of the neighborhood, K is the number of blocks and n is number of vertices of the graph. [sent-115, score-0.106]
48 3 Stochastic blockmodel approximation: Theory of estimation In this section we present the theoretical aspects of the proposed SBA algorithm. [sent-116, score-0.121]
49 We will first discuss the properties of the estimator dij , and then show the consistency of the estimated graphon w. [sent-117, score-0.882]
50 1 Concentration analysis of dij Our first theorem below shows that the proposed estimator dij is both unbiased, and is concentrated around its expected value dij . [sent-120, score-0.706]
51 Further, for any ǫ > 0, Sǫ2 Pr dij − dij > ǫ ≤ 8e− 32/T +8ǫ/3 , (7) where S is the size of the neighborhood S, and 2T is the number of observations. [sent-125, score-0.446]
52 The basic idea of the k proof is to zoom-in a microscopic term of rij and show that it is unbiased. [sent-128, score-0.103]
53 The concentration inequality follows from a similar idea to bound the variance k of rij and apply Bernstein’s inequality. [sent-130, score-0.103]
54 4 That Gt1 [i, k] and Gt2 [j, k] are conditionally independent on uk is a critical fact for the success of the proposed algorithm. [sent-131, score-0.099]
55 It also explains why at least 2 independently observed graphs are necessary, for otherwise we cannot separate the probability in the second equality above marked with (a). [sent-132, score-0.064]
56 2 Choosing the number of blocks The performance of the Algorithm 1 is sensitive to the number of blocks it defines. [sent-134, score-0.16]
57 On the one hand, it is desirable to have more blocks so that the graphon can be finely approximated. [sent-135, score-0.676]
58 But on the other hand, if the number of blocks is too large then each block will contain only few vertices. [sent-136, score-0.127]
59 This is bad because in order to estimate the value on each block, a sufficient number of vertices in each block is required. [sent-137, score-0.091]
60 A precise relationship between the ∆ and K, the number of blocks generated the algorithm, is given in Theorem 2. [sent-139, score-0.08]
61 Let ∆ be the accuracy parameter and K be the number of blocks estimated by Algorithm 1, then √ S∆4 QL 2 − Pr K > ≤ 8n2 e 128/T +16∆2 /3 , (8) ∆ where L is the Lipschitz constant and Q is the number of Lipschitz blocks in w. [sent-141, score-0.179]
62 If S ∈ Θ(n) and ∆ ∈ ω n n 2 iv =1 jv =1 n n iv =1 jv =1 log(n) n lim E[MAE(w)] = 0 n→∞ (w(uiv , ujv ) − w(uiv , ujv )) |w(uiv , ujv ) − w(uiv , ujv )| . [sent-175, score-0.604]
63 The goal of Theorem 3 is to show convergence of |w(ui , uj ) − w(ui , uj )|. [sent-180, score-0.624]
64 The idea is to consider the following two quantities: 1 w(ui , uj ) = w(uix , ujx ), |B i | | B j | w(ui , uj ) = 1 |B i | | B j | ix ∈Bi jx ∈Bj ix ∈Bi jy ∈Bj 1 (G1 [ix , jy ] + G2 [ix , jy ] + . [sent-181, score-1.027]
65 + G2T [ix , jy ]) , 2T so that if we can bound |w(ui , uj ) − w(ui , uj )| and |w(ui , uj ) − w(ui , uj )|, then consequently |w(ui , uj ) − w(ui , uj )| can also be bounded. [sent-184, score-1.955]
66 The bound for the first term |w(ui , uj ) − w(ui , uj )| is shown in Lemma 1: By Algorithm 1, any vertex iv ∈ Bi is guaranteed to be within a distance ∆ from the pivot of Bi . [sent-185, score-0.754]
67 Since w(ui , uj ) is an average over Bi and Bj , by Theorem 1 a probability bound involving ∆ can be obtained. [sent-186, score-0.312]
68 The bound for the second term |w(ui , uj )− w(ui , uj )| is shown in Lemma 2. [sent-187, score-0.624]
69 Different from Lemma 1, here we need to consider two possible situations: either the intermediate estimate w(ui , uj ) is close to the ground truth w(ui , uj ), or w(ui , uj ) is far from the ground truth w(ui , uj ). [sent-188, score-1.266]
70 For any iv ∈ Bi and jv ∈ Bj , Pr |w(ui , uj ) − w(uiv , ujv )| > 8∆1/2 L1/4 ≤ 32|Bi | |Bj |e Lemma 2. [sent-193, score-0.529]
71 For any iv ∈ Bi and jv ∈ Bj , Pr |wij − w ij | > 8∆1/2 L1/4 ≤ 2e−256(T |Bi | |Bj | √ L∆) 4 − 32/TS∆ 2 /3 +8∆ + 32|Bi |2 |Bj |2 e . [sent-194, score-0.132]
72 1 Estimating stochastic blockmodels Accuracy as a function of growing graph size. [sent-201, score-0.16]
73 Our first experiment is to evaluate the proposed SBA algorithm for estimating stochastic blockmodels. [sent-202, score-0.106]
74 For this purpose, we generate (arbitrarily) a graphon 0. [sent-203, score-0.596]
75 Since USVT and LG use only one observed graph whereas the proposed SBA require at least 2 observations, in order to make the comparison fair, we use half of the nodes for SBA by generating two independent n × n observed graphs. [sent-220, score-0.106]
76 Figure 2(b) shows the estimation error of SBA algorithm as T grows for graphs of size 200 vertices. [sent-223, score-0.068]
77 9 400 n 600 800 −3 0 1000 (a) Growing graph size, n 5 10 15 20 2T 25 30 35 40 (b) Growing no. [sent-236, score-0.05]
78 To this end, we consider a sequence of K, and for each K we generate a graphon w of K × K blocks. [sent-243, score-0.596]
79 The experiment is repeated over 100 trials so that in every trial a different graphon is generated. [sent-246, score-0.614]
80 The result shown in Figure 3(a) indicates that while estimation error increases as K grows, the proposed SBA algorithm still attains the lowest MAE for all K. [sent-247, score-0.06]
81 blocks, K 5 10 % missing links 15 20 (b) Missing links Figure 3: (a) As K increases, MAE of all three algorithm increases but SBA still attains the lowest MAE. [sent-266, score-0.155]
82 (b) Estimation of graphon in the presence of missing links: As the amount of missing links increases, estimation error also increases. [sent-268, score-0.78]
83 2 Estimation with missing edges Our next experiment is to evaluate the performance of proposed SBA algorithm when there are missing edges in the observed graph. [sent-270, score-0.218]
84 To model missing edges, we construct an n × n binary matrix M with probability Pr[M [i, j] = 0] = ξ, where 0 ≤ ξ ≤ 1 defines the percentage of missing edges. [sent-271, score-0.126]
85 Given ξ, 2T matrices are generated with missing edges, and the observed graphs are defined as M1 ⊙ G1 , . [sent-272, score-0.127]
86 The goal is to study how well SBA can reconstruct the graphon w in the presence of missing links. [sent-276, score-0.659]
87 7 The modification of the proposed SBA algorithm for the case missing links is minimal: when computing (6), instead of averaging over all ix ∈ Bi and jy ∈ Bj , we only average ix ∈ Bi and jy ∈ Bj that are not masked out by all M ′ s. [sent-277, score-0.445]
88 Here, we consider the graphon given in (12), with n = 200 and T = 1. [sent-279, score-0.596]
89 It is evident that SBA outperforms its counterparts at a lower rate of missing links. [sent-280, score-0.063]
90 3 Estimating continuous graphons Our final experiment is to evaluate the proposed SBA algorithm in estimating continuous graphons. [sent-282, score-0.197]
91 Here, we consider two of the graphons reported in [8]: 1 w1 (u, v) = , and w2 (u, v) = uv, 1 + exp{−50(u2 + v 2 )} where u, v ∈ [0, 1]. [sent-283, score-0.136]
92 This suggests that in practice the choice of the algorithm should depend on the expected structure of the graphon to be estimated: If the graph generated by the graphon demonstrates some low-rank properties, then USVT is likely to be a better option. [sent-286, score-1.242]
93 For more structured or complex graphons the proposed procedure is recommended. [sent-287, score-0.16]
94 8 200 400 n 600 800 −2 0 1000 (a) graphon w1 200 400 n 600 800 1000 (b) graphon w2 Figure 4: Comparison between SBA and USVT in estimating two continuous graphons w1 and w2 . [sent-300, score-1.347]
95 The algorithm was evaluated experimentally, and we found that the algorithm is effective in estimating block structured graphons. [sent-305, score-0.066]
96 A nonparametric view of network models and Newman-Girvan and other modularities. [sent-338, score-0.051]
97 Modeling homophily and stochastic equivalence in symmetric relational data. [sent-410, score-0.086]
98 Random function priors for exchangeable arrays with applications to graphs and relational data. [sent-451, score-0.202]
99 Bayesian models of graphs, arrays and other exchangeable random structures, 2013. [sent-477, score-0.128]
100 Bayesian model averaging of stochastic block models to estimate the graphon function and motif frequencies in a w-graph model. [sent-482, score-0.72]
wordName wordTfidf (topN-words)
[('graphon', 0.596), ('sba', 0.375), ('uj', 0.312), ('ui', 0.248), ('usvt', 0.238), ('dij', 0.216), ('mae', 0.195), ('graphons', 0.136), ('rij', 0.103), ('bj', 0.095), ('uiv', 0.085), ('ujv', 0.085), ('jy', 0.083), ('exchangeable', 0.083), ('blocks', 0.08), ('blockmodel', 0.077), ('ix', 0.077), ('uk', 0.075), ('lg', 0.075), ('bk', 0.074), ('bi', 0.074), ('iv', 0.072), ('missing', 0.063), ('slices', 0.062), ('jv', 0.06), ('dip', 0.051), ('graph', 0.05), ('graphs', 0.048), ('pr', 0.048), ('block', 0.047), ('optspace', 0.045), ('arrays', 0.045), ('stochastic', 0.045), ('growing', 0.042), ('wolfe', 0.042), ('unpublished', 0.042), ('ip', 0.039), ('links', 0.038), ('vertex', 0.035), ('exgm', 0.034), ('harvard', 0.032), ('piecewise', 0.032), ('consistency', 0.032), ('choi', 0.03), ('cij', 0.03), ('assign', 0.029), ('seas', 0.028), ('network', 0.027), ('lemma', 0.027), ('relational', 0.026), ('vertices', 0.026), ('orbanz', 0.026), ('limits', 0.026), ('mse', 0.025), ('proposed', 0.024), ('nonparametric', 0.024), ('pivot', 0.023), ('blockmodels', 0.023), ('lov', 0.023), ('airoldi', 0.021), ('fienberg', 0.021), ('bickel', 0.021), ('lipschitz', 0.021), ('lloyd', 0.021), ('estimation', 0.02), ('dy', 0.02), ('estimated', 0.019), ('estimator', 0.019), ('estimating', 0.019), ('pj', 0.019), ('sz', 0.019), ('experiment', 0.018), ('estimate', 0.018), ('edges', 0.017), ('generates', 0.017), ('latent', 0.016), ('attains', 0.016), ('tool', 0.016), ('observed', 0.016), ('precision', 0.015), ('theorem', 0.015), ('observations', 0.015), ('fairness', 0.015), ('attentions', 0.015), ('rjj', 0.015), ('homophily', 0.015), ('nowicki', 0.015), ('rii', 0.015), ('rji', 0.015), ('wellknown', 0.015), ('sussman', 0.015), ('graphlet', 0.015), ('completion', 0.015), ('gap', 0.014), ('globally', 0.014), ('neighborhood', 0.014), ('dx', 0.014), ('borgs', 0.014), ('chayes', 0.014), ('motif', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
Author: Edoardo M. Airoldi, Thiago B. Costa, Stanley H. Chan
Abstract: Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.
2 0.18826334 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
Author: Jie Liu, David Page
Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1
3 0.087147251 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
4 0.083865471 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
Author: Daniel Hernández-Lobato, José Miguel Hernández-Lobato
Abstract: A probabilistic model based on the horseshoe prior is proposed for learning dependencies in the process of identifying relevant features for prediction. Exact inference is intractable in this model. However, expectation propagation offers an approximate alternative. Because the process of estimating feature selection dependencies may suffer from over-fitting in the model proposed, additional data from a multi-task learning scenario are considered for induction. The same model can be used in this setting with few modifications. Furthermore, the assumptions made are less restrictive than in other multi-task methods: The different tasks must share feature selection dependencies, but can have different relevant features and model coefficients. Experiments with real and synthetic data show that this model performs better than other multi-task alternatives from the literature. The experiments also show that the model is able to induce suitable feature selection dependencies for the problems considered, only from the training data. 1
5 0.06060442 7 nips-2013-A Gang of Bandits
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Multi-armed bandit problems formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, content may be served to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a consistent increase in prediction performance obtained by exploiting the network structure. 1
6 0.05908991 293 nips-2013-Sign Cauchy Projections and Chi-Square Kernel
7 0.056320284 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
8 0.048951212 277 nips-2013-Restricting exchangeable nonparametric distributions
9 0.04680286 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
10 0.042746879 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
11 0.042677246 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
12 0.042588916 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
13 0.042472649 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
14 0.040794197 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
15 0.037608113 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
16 0.036389451 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
17 0.036066815 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
18 0.035422396 320 nips-2013-Summary Statistics for Partitionings and Feature Allocations
19 0.034937978 109 nips-2013-Estimating LASSO Risk and Noise Level
20 0.034243535 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
topicId topicWeight
[(0, 0.101), (1, 0.043), (2, 0.019), (3, 0.017), (4, 0.011), (5, 0.035), (6, 0.043), (7, -0.023), (8, 0.009), (9, 0.002), (10, 0.04), (11, 0.003), (12, 0.053), (13, -0.036), (14, -0.009), (15, -0.018), (16, -0.01), (17, 0.017), (18, -0.074), (19, 0.006), (20, -0.02), (21, -0.026), (22, -0.021), (23, -0.062), (24, 0.019), (25, 0.01), (26, 0.012), (27, -0.017), (28, -0.034), (29, -0.004), (30, 0.046), (31, 0.041), (32, 0.073), (33, -0.074), (34, -0.029), (35, -0.006), (36, -0.095), (37, 0.01), (38, 0.05), (39, 0.055), (40, 0.039), (41, 0.032), (42, 0.094), (43, 0.125), (44, 0.032), (45, -0.028), (46, 0.023), (47, 0.032), (48, -0.041), (49, -0.081)]
simIndex simValue paperId paperTitle
same-paper 1 0.87915218 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
Author: Edoardo M. Airoldi, Thiago B. Costa, Stanley H. Chan
Abstract: Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.
2 0.58260167 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
Author: Jie Liu, David Page
Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1
3 0.52106452 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
Author: Tai Qin, Karl Rohe
Abstract: Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of the tuning parameter. Moreover, our results show how the “star shape” in the eigenvectors–a common feature of empirical networks–can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical models that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models. 1
4 0.47602159 320 nips-2013-Summary Statistics for Partitionings and Feature Allocations
Author: Isik B. Fidaner, Taylan Cemgil
Abstract: Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.
5 0.46731219 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling
Author: Matthew Johnson, James Saunderson, Alan Willsky
Abstract: Sampling inference methods are computationally difficult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going “Hogwild” and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean. 1
6 0.45039096 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
7 0.444646 245 nips-2013-Pass-efficient unsupervised feature selection
8 0.44408494 277 nips-2013-Restricting exchangeable nonparametric distributions
9 0.43092594 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
10 0.42122886 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
11 0.41954443 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
12 0.41797057 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
13 0.41725028 161 nips-2013-Learning Stochastic Inverses
14 0.40816012 308 nips-2013-Spike train entropy-rate estimation using hierarchical Dirichlet process priors
15 0.40304202 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
16 0.40105247 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
17 0.40061206 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
18 0.39835081 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
19 0.39448577 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
20 0.39291066 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
topicId topicWeight
[(2, 0.012), (14, 0.261), (16, 0.023), (33, 0.091), (34, 0.118), (41, 0.045), (49, 0.037), (56, 0.095), (70, 0.03), (85, 0.086), (89, 0.035), (93, 0.041), (95, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.74082178 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
Author: Edoardo M. Airoldi, Thiago B. Costa, Stanley H. Chan
Abstract: Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This procedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochastic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.
2 0.72951728 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections
Author: Matthew Lawlor, Steven W. Zucker
Abstract: Association field models have attempted to explain human contour grouping performance, and to explain the mean frequency of long-range horizontal connections across cortical columns in V1. However, association fields only depend on the pairwise statistics of edges in natural scenes. We develop a spectral test of the sufficiency of pairwise statistics and show there is significant higher order structure. An analysis using a probabilistic spectral embedding reveals curvature-dependent components. 1
3 0.62388384 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
Author: Yangqing Jia, Joshua T. Abbott, Joseph Austerweil, Thomas Griffiths, Trevor Darrell
Abstract: Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a significant advantage results from combining visual classifiers with the ability to identify an appropriate level of abstraction using Bayesian generalization. 1
4 0.59704942 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth
Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1
5 0.59362364 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
Author: Xiaoxiao Guo, Satinder Singh, Richard L. Lewis
Abstract: We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a finite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent’s behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Specifically, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent’s performance relative to other approaches, including an approach that transfers policies. 1
6 0.59231234 196 nips-2013-Modeling Overlapping Communities with Node Popularities
7 0.59054691 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
8 0.58900136 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
9 0.58751082 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
10 0.58736396 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
11 0.58692867 168 nips-2013-Learning to Pass Expectation Propagation Messages
12 0.58359915 347 nips-2013-Variational Planning for Graph-based MDPs
13 0.58283514 332 nips-2013-Tracking Time-varying Graphical Structure
14 0.58261067 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
15 0.58119887 173 nips-2013-Least Informative Dimensions
16 0.58056819 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
17 0.58021331 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
18 0.57888228 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
19 0.57885087 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
20 0.57858217 357 nips-2013-k-Prototype Learning for 3D Rigid Structures