nips nips2013 nips2013-104 knowledge-graph by maker-knowledge-mining

104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models


Source: pdf

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Stochastic block models characterize observed network relationships via latent community memberships. [sent-7, score-0.244]

2 In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. [sent-8, score-0.241]

3 We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. [sent-9, score-0.29]

4 To allow scalable learning, we derive an online stochastic variational inference algorithm. [sent-10, score-0.219]

5 Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. [sent-11, score-0.486]

6 Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. [sent-12, score-0.387]

7 1 Introduction A wide range of statistical models have been proposed for the discovery of hidden communities within observed networks. [sent-14, score-0.231]

8 The simplest stochastic block models [20] create communities by clustering nodes, aiming to identify demographic similarities in social networks, or proteins with related functional interactions. [sent-15, score-0.235]

9 We propose a novel hierarchical Dirichlet process relational (HDPR) model, which allows mixed membership in an unbounded collection of latent communities. [sent-18, score-0.261]

10 By adapting the HDP [18], we allow data-driven inference of the number of communities underlying a given network, and growth in the community structure as additional nodes are observed. [sent-19, score-0.483]

11 The infinite relational model (IRM) [10] previously adapted the Dirichlet process to define a nonparametric relational model, but restrictively associates each node with only one community. [sent-20, score-0.286]

12 The more flexible nonparametric latent feature model (NLFM) [14] uses an Indian buffet process (IBP) [7] to associate nodes with a subset of latent communities. [sent-21, score-0.21]

13 The infinite multiple membership relational model (IMRM) [15] also uses an IBP to allow multiple memberships, but uses a non-conjugate observation model to allow more scalable inference for sparse networks. [sent-22, score-0.229]

14 The nonparametric metadata dependent relational (NMDR) model [11] employs a logistic stick-breaking prior on the node-specific community frequencies, and thereby models relationships between communities and metadata. [sent-23, score-0.534]

15 In contrast, the conditionally conjugate structure of our HDPR model allows us to easily develop a stochastic variational inference algorithm [17, 2, 9]. [sent-25, score-0.18]

16 Its online structure, which incrementally updates global community parameters based on random subsets of the full graph, is highly scalable; our experiments consider social networks with tens of thousands of nodes. [sent-26, score-0.264]

17 1 While the HDPR is more broadly applicable, our focus in this paper is on assortative models for undirected networks, which assume that the probability of linking distinct communities is small. [sent-27, score-0.321]

18 Our work builds on stochastic variational inference methods developed for the assortative MMSB (aMMSB) [6], but makes three key technical innovations. [sent-29, score-0.266]

19 First, adapting work on HDP topic models [19], we develop a nested family of variational bounds which assign positive probability to dynamically varying subsets of the unbounded collection of global communities. [sent-30, score-0.22]

20 Finally, we derive a structured mean field variational bound which models dependence among the pair of community assignments associated with each edge. [sent-32, score-0.336]

21 In this paper, we use our assortative HDPR (aHDPR) model to recover latent communities in social networks previously examined with the aMMSB [6], and demonstrate substantially improved perplexity scores and link prediction accuracy. [sent-34, score-0.633]

22 We also use our learned community structure to visualize business and governmental relationships extracted from the LittleSis database [13]. [sent-35, score-0.16]

23 2 Assortative Hierarchical Dirichlet Process Relational Models We introduce the assortative HDP relational (aHDPR) model, a nonparametric generalization of the aMMSB for discovering shared memberships in an unbounded collection of latent communities. [sent-36, score-0.319]

24 We focus on undirected binary graphs with N nodes and E = N (N − 1)/2 possible edges, and let yij = yji = 1 if there is an edge between nodes i and j. [sent-37, score-0.407]

25 For some experiments, we assume the yij variables are only partially observed to compare the predictive performance of different models. [sent-38, score-0.227]

26 Letting βk denote the expected frequency of community k, and γ > 0 the concentration, we define a stick-breaking representation of β: k−1 β k = vk (1 − v ), vk ∼ Beta(1, γ), k = 1, 2, . [sent-41, score-0.282]

27 (1) =1 Adapting a two-layer hierarchical DP [18], the mixed community memberships for each node i are then drawn from DP with base measure β, πi ∼ DP(αβ). [sent-44, score-0.253]

28 To generate a possible edge yij between nodes i and j, we first sample a pair of indicator variables from their corresponding community membership distributions, sij ∼ Cat(πi ), rij ∼ Cat(πj ). [sent-46, score-0.705]

29 We then determine edge presence as follows: p(yij = 1 | sij = rij = k) = wk , p(yij = 1 | sij = rij ) = . [sent-47, score-0.446]

30 (2) For our assortative aHDPR model, each community has its own self-connection probability wk ∼ Beta(τa , τb ). [sent-48, score-0.332]

31 Our HDPR model could easily be generalized to more flexible likelihoods in which each pair of communities k, have their own interaction probability [1], but motivated by work on the aMMSB [6], we do not pursue this generalization here. [sent-50, score-0.214]

32 3 Scalable Variational Inference Previous applications of the MMSB associate a pair of community assignments, sij and rij , with each potential edge yij . [sent-51, score-0.593]

33 In assortative models these variables are strongly dependent, since present edges only have non-negligible probability for consistent community assignments. [sent-52, score-0.302]

34 To improve accuracy and reduce local optima, we thus develop a structured variational method based on joint configurations of these assignment pairs, which we denote by eij = (sij , rij ). [sent-53, score-0.339]

35 Left: Conventional representation, in which source sij and receiver rij community assignments are independently sampled. [sent-56, score-0.354]

36 Right: Blocked representation in which eij = (sij , rij ) denotes the pair of community assignments underlying yij . [sent-57, score-0.57]

37 (3) For the nonparametric aHDPR model, the number of latent community parameters wk , βk , and the dimensions of the community membership distributions πi , are both infinite. [sent-60, score-0.55]

38 1 Variational Bounds via Nested Truncations We begin by defining categorical edge assignment distributions q(eij | φij ) = Cat(eij | φij ), where φijk = q(eij = (k, )) = q(sij = k, rij = ). [sent-63, score-0.149]

39 For the global community parameters, we define an untruncated factorized variational distribution: q(β, w | v ∗ , λ) = ∞ k−1 ∗ δvk (vk )Beta(wk | λka , λkb ), ∗ βk (v ∗ ) = vk k=1 (1 − v ∗ ). [sent-68, score-0.366]

40 (4) =1 Our later derivations show that for communities k > K above the truncation level, the optimal variational parameters equal the prior: λka = τa , λkb = τb . [sent-69, score-0.37]

41 Matched to this, we associate a (K + 1)-dimensional community membership distribution πi to each node, where the final component contains the sum of all mass not assigned to the first K. [sent-72, score-0.27]

42 The overall variational objective is then L(q) = + + ∗ Eq [log p(wk | τa , τb )] − Eq [log q(wk | λka , λkb )] + Eq [log p(vk | γ)] ∗ i Eq [log p(πi | α, β(v ))] − Eq [log q(πi | θi )] ij Eq [log p(yij |w, eij )] + Eq [log p(eij |πi , πj )] − Eq [log q(eij |φij )]. [sent-74, score-0.236]

43 k (5) Unlike truncations of the global stick-breaking process [4], our variational bounds are nested, so that lower-order approximations are special cases of higher-order ones with some zeroed parameters. [sent-75, score-0.164]

44 2 Structured Variational Inference with Linear Time and Storage Complexity Conventional, coordinate ascent variational inference algorithms iteratively optimize each parameter given fixed values for all others. [sent-77, score-0.159]

45 Community membership and interaction parameters are updated as λka = τa + E ij K k=1 φijkk yij , θik = αβk + λkb = τb + (i,j)∈E 3 K =1 E ij φijk . [sent-78, score-0.341]

46 K k=1 φijkk (1 − yij ), (6) (7) Here, the final summation is over all potential edges (i, j) linked to node i. [sent-79, score-0.313]

47 Updates for assignment distributions depend on expectations of log community assignment probabilities: Eq [log(wk )] = ψ(λka ) − ψ(λka + λkb ), πik ˜ Eq [log(1 − wk )] = ψ(λkb ) − ψ(λka + λkb ), exp{Eq [log(πik )]} = exp{ψ(θik ) − ψ( K+1 =1 θi )}, πi ˜ K k=1 πik . [sent-80, score-0.321]

48 ˜ (8) (9) Given these sufficient statistics, the assignment distributions can be updated as follows: φijkk ∝ πik πjk f (wk , yij ), ˜ ˜ φijk ∝ πik πj f ( , yij ), ˜ ˜ = k. [sent-81, score-0.449]

49 (10) (11) Here, f (wk , yij ) = exp{yij Eq [log(wk )] + (1 − yij )Eq [log(1 − wk )]}. [sent-82, score-0.506]

50 (7) can be expanded as follows: θik = αβk + (i,j)∈E φijkk + = αβk + (i,j)∈E φijkk + 1 ˜ ˜ =k πik πj f ( , yij ) Zij 1 ˜ π ˜ Zij πik f ( , yij )(˜j − πjk ). [sent-89, score-0.42]

51 The normalization constant Zij , ˜ which is defined so that φij is a valid categorical distribution, can also be computed in linear time: Zij = πi πj f ( , yij ) + ˜ ˜ K k=1 πik πjk (f (wk , yij ) − f ( , yij )). [sent-91, score-0.63]

52 3 Stochastic Variational Inference Standard variational batch updates become computationally intractable when N becomes very large. [sent-97, score-0.146]

53 Recent advancements in applying stochastic optimization techniques within variational inference [8] showed that if our variational mean-field family of distributions are members of the exponential family, we can derive a simple stochastic natural gradient update for our global parameters λ, θ, v. [sent-98, score-0.346]

54 By taking natural gradients with respect to our new variational objective for our global variables λ, θ, we have λ∗ = ka ∗ θik = 1 g(i,j) φijkk yij 1 g(i,j) + τa − λka ; K =1 (i,j)∈E φijk + αβk − θik , (14) (15) where the natural gradient for λ∗ is symmetric to λ∗ and where yij in Eq. [sent-102, score-0.676]

55 4 Restricted Stratified Node Sampling Stochastic variational inference provides us with the ability to choose a sampling scheme that allows us to better exploit the sparsity of real world networks. [sent-119, score-0.159]

56 We show how performing this simple modification results in significant improvements in both perplexity and link prediction scores. [sent-129, score-0.235]

57 5 Pruning Moves Our nested truncation requires setting an initial number of communities K. [sent-131, score-0.277]

58 To remedy this, we define a set of pruning moves aimed at improving inference by removing communities that have very small posterior mass. [sent-134, score-0.51]

59 Figure 2 provides an example illustrating how pruning occurs in our model. [sent-136, score-0.194]

60 To determine communities which are good candidates for pruning, for each community k we first N N K compute Θk = ( i=1 θik )/( i=1 k=1 θik ). [sent-137, score-0.374]

61 Any community for which Θk < (log K)/N for ∗ t = N/2 consecutive iterations is then evaluated in more depth. [sent-138, score-0.16]

62 To estimate an approximate but still informative ELBO for the pruned model, we must associate a set of relevant observations to each pruning candidate. [sent-140, score-0.255]

63 In particular, we approximate the pruned ELBO L(q prune ) by considering observations yij among pairs of nodes with significant mass in the pruned community. [sent-141, score-0.42]

64 We then compare these two values to accept or reject the pruning of the low-weight community. [sent-143, score-0.234]

65 We show significant gains in AUC and perplexity scores by using the restricted form of 5 New Model Adjacency Matrix ✓ ✓⇤ Y N nodes K communities Prune k=3 v⇤ , ⇤ ✓ , ⇥k < (log K)/N ⇥k = ( L(qold) L(qprune) PN i=1 ✓ik )/( PN PK i=1 k=1 ✓ik ) Uniformly redistribute mass. [sent-145, score-0.54]

66 Select nodes relevant to pruned topic and its corresponding subgraph (red box) to generate a new ELBO: L(qprune) ⇤ , ⇤ ⇤ ij2S v, , ✓, ij2S If L(qprune) > L(qold), accept or else reject and continue inference with old model Figure 2: Pruning extraneous communities. [sent-147, score-0.225]

67 Suppose that community k = 3 is considered for removal. [sent-148, score-0.16]

68 We specify a new model by redistributing its mass N θi3 uniformly across the remaining communities i=1 θi , = 3. [sent-149, score-0.233]

69 To accurately estimate the true a b ∗ change in ELBO for this pruning, we select the n = 10 nodes with greatest participation θi3 in community 3. [sent-151, score-0.229]

70 ij∈S for a model in which community k = 3 is pruned, and a corresponding ELBO L(q Using the data from the same sub-graph, but the old un-pruned model parameters, we estimate an alternative ELBO L(q old ). [sent-154, score-0.238]

71 ij∈S stratified node sampling, a quick K-means initialization1 for θ, and our efficient structured meanfield approach combined with pruning moves. [sent-157, score-0.265]

72 We perform a detailed comparison on a synthetic toy dataset, as well as the real-world relativity collaboration network, using a variety of metrics to show the benefits of each contribution. [sent-158, score-0.168]

73 We then show significant improvements over the baseline aMMSB model in both AUC and perplexity metrics on several real-world datasets previously analyzed by [6]. [sent-159, score-0.214]

74 Finally, we perform a qualitative analysis on the LittleSis network and demonstrate the usefulness of using our learned latent community structure to create visualizations of large networks. [sent-160, score-0.227]

75 1 Synthetic and Collaboration Networks The synthetic network we use for testing is generated from the standards and software outlined in [12] to produce realistic synthetic networks with overlapping communities and power-law degree distributions. [sent-164, score-0.338]

76 On this network the true number of latent communities was found to be K = 56. [sent-166, score-0.281]

77 To focus on the more challenging problem of identifying overlapping community structure, we take the largest connected component of each graph for analysis. [sent-170, score-0.178]

78 3 compare different aHDPR inference algorithms, and the perplexity scores achieved on various networks. [sent-173, score-0.279]

79 Using a combination of both modifications, we achieve the best perplexity scores on these datasets. [sent-176, score-0.239]

80 The naive mean-field approach is the aHDPR model where the community indicator assignments are split into sij and rij . [sent-180, score-0.411]

81 The aMMSB in some 1 Our K-means initialization views the rows of the adjacency matrix as distinct data points and produces a single community assignment zi for each node. [sent-183, score-0.208]

82 To initialize community membership distributions based on these assignments, we set θizi = N − 1 and θi\zi = α. [sent-184, score-0.227]

83 The lower left shows various perplexity scores for the synthetic and relativity networks with the best performing model (aHDPR-Pruning) scoring an average AUC of 0. [sent-194, score-0.37]

84 The lower right shows the pruning process for the toy data and the final K communities discovered on our real-world networks. [sent-199, score-0.442]

85 Pruning moves were applied every N/2 iterations with a maximum of K/10 communities removed per move. [sent-203, score-0.255]

86 If the number of prune candidates was greater than K/10, then K/10 communities with the lowest mass were chosen. [sent-204, score-0.281]

87 3 shows that our pruning moves can learn close to the true underlying number of clusters (K=56) on a synthetic network even when significantly altering its initial K. [sent-206, score-0.285]

88 Across several real world networks, there was low variance between runs with respect to the final K communities discovered, suggesting a degree of robustness. [sent-207, score-0.214]

89 Furthermore, pruning moves improved perplexity and AUC scores across every dataset as well as reducing computational costs during inference. [sent-208, score-0.474]

90 The figures above show that the aHDPR with pruning moves has the best performance, in terms of both perplexity (top) and AUC (bottom) scores. [sent-247, score-0.449]

91 For this analysis, we ran the aHDPR with pruning on the entire dataset using the same settings from our previous experiments. [sent-261, score-0.194]

92 We then took the top 200 degree nodes and generated weighted edges based off of a variational distance between their learned expected variational |E [π ]−E [π ]| posteriors such that dij = 1 − q i 2 q j . [sent-262, score-0.363]

93 Larger nodes have greater posterior bridgeness while node colors represent its dominant community membership. [sent-266, score-0.321]

94 Our learned latent communities can drive these types of visualizations that otherwise might not have been possible given the raw subgraph (see ‡A. [sent-267, score-0.254]

95 5 Discussion Our model represents the first Bayesian nonparametric relational model to use a stochastic variational approach for efficient inference. [sent-269, score-0.278]

96 Our pruning moves allow us to save computation and improve inference in a principled manner while our efficient structured mean-field inference procedure helps us escape local optima. [sent-270, score-0.339]

97 Future extensions of interest could entail advanced split-merge moves that can grow the number of communities as well as extending these scalable inference algorithms to more sophisticated relational models. [sent-271, score-0.417]

98 Truly nonparametric online variational inference for hierarchical dirichlet processes. [sent-299, score-0.254]

99 Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. [sent-354, score-0.178]

100 Fuzzy communities and the concept of bridgeness in complex networks. [sent-381, score-0.238]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ahdpr', 0.623), ('ammsb', 0.391), ('perplexity', 0.214), ('communities', 0.214), ('yij', 0.21), ('pruning', 0.194), ('community', 0.16), ('variational', 0.119), ('ik', 0.117), ('ka', 0.111), ('relational', 0.101), ('ijkk', 0.098), ('elbo', 0.093), ('init', 0.089), ('eq', 0.089), ('wk', 0.086), ('assortative', 0.086), ('littlesis', 0.086), ('eij', 0.085), ('auc', 0.082), ('rij', 0.082), ('ijk', 0.08), ('sij', 0.079), ('relativity', 0.075), ('nodes', 0.069), ('membership', 0.067), ('kb', 0.061), ('hdpr', 0.061), ('vk', 0.061), ('naive', 0.057), ('strati', 0.056), ('edges', 0.056), ('prune', 0.048), ('node', 0.047), ('moves', 0.041), ('dirichlet', 0.04), ('latent', 0.04), ('mmsb', 0.04), ('inference', 0.04), ('old', 0.039), ('edge', 0.038), ('nonparametric', 0.037), ('truncation', 0.037), ('pruned', 0.037), ('zij', 0.037), ('qprune', 0.037), ('collaboration', 0.036), ('toy', 0.034), ('kmeans', 0.034), ('networks', 0.033), ('quantiles', 0.033), ('hdp', 0.033), ('assignments', 0.033), ('hep', 0.032), ('ij', 0.032), ('unbounded', 0.031), ('eld', 0.029), ('assignment', 0.029), ('updates', 0.027), ('network', 0.027), ('nested', 0.026), ('global', 0.026), ('scores', 0.025), ('astrophysics', 0.024), ('bridgeness', 0.024), ('bridgness', 0.024), ('condensed', 0.024), ('qold', 0.024), ('secretary', 0.024), ('treasury', 0.024), ('unused', 0.024), ('associate', 0.024), ('structured', 0.024), ('memberships', 0.024), ('synthetic', 0.023), ('mixed', 0.022), ('phys', 0.022), ('metadata', 0.022), ('stochastic', 0.021), ('undirected', 0.021), ('scalable', 0.021), ('link', 0.021), ('posterior', 0.021), ('reject', 0.021), ('blei', 0.02), ('jk', 0.02), ('ths', 0.02), ('accept', 0.019), ('mass', 0.019), ('truncations', 0.019), ('neu', 0.019), ('initialization', 0.019), ('restricted', 0.018), ('dynamically', 0.018), ('cat', 0.018), ('overlapping', 0.018), ('grif', 0.018), ('online', 0.018), ('log', 0.017), ('observed', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

2 0.20340011 196 nips-2013-Modeling Overlapping Communities with Node Popularities

Author: Prem Gopalan, Chong Wang, David Blei

Abstract: We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB). Our model integrates two basic properties of nodes in social networks: homophily and preferential connection to popular nodes. We develop a scalable algorithm for posterior inference, based on a novel nonconjugate variant of stochastic variational inference. We evaluate the link prediction accuracy of our algorithm on nine real-world networks with up to 60,000 nodes, and on simulated networks with degree distributions that follow a power law. We demonstrate that the AMP predicts significantly better than the MMSB. 1

3 0.12225272 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

Author: Michael Hughes, Erik Sudderth

Abstract: Variational inference algorithms provide the most effective framework for largescale training of Bayesian nonparametric models. Stochastic online approaches are promising, but are sensitive to the chosen learning rate and often converge to poor local optima. We present a new algorithm, memoized online variational inference, which scales to very large (yet finite) datasets while avoiding the complexities of stochastic gradient. Our algorithm maintains finite-dimensional sufficient statistics from batches of the full dataset, requiring some additional memory but still scaling to millions of examples. Exploiting nested families of variational bounds for infinite nonparametric models, we develop principled birth and merge moves allowing non-local optimization. Births adaptively add components to the model to escape local optima, while merges remove redundancy and improve speed. Using Dirichlet process mixture models for image clustering and denoising, we demonstrate major improvements in robustness and accuracy.

4 0.10906962 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

Author: Junming Yin, Qirong Ho, Eric Xing

Abstract: We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. 1

5 0.096043907 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

Author: Myunghwan Kim, Jure Leskovec

Abstract: unkown-abstract

6 0.095385894 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

7 0.087164842 47 nips-2013-Bayesian Hierarchical Community Discovery

8 0.077557907 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

9 0.074738689 185 nips-2013-Matrix Completion From any Given Set of Observations

10 0.059896521 193 nips-2013-Mixed Optimization for Smooth Functions

11 0.058966085 318 nips-2013-Structured Learning via Logistic Regression

12 0.05630428 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

13 0.055251207 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

14 0.054467157 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

15 0.053666759 143 nips-2013-Integrated Non-Factorized Variational Inference

16 0.049531125 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

17 0.049060144 326 nips-2013-The Power of Asymmetry in Binary Hashing

18 0.045318019 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

19 0.043642256 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

20 0.041888416 226 nips-2013-One-shot learning by inverting a compositional causal process


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.12), (1, 0.041), (2, -0.042), (3, 0.006), (4, 0.026), (5, 0.074), (6, 0.084), (7, 0.004), (8, 0.126), (9, -0.014), (10, 0.015), (11, 0.047), (12, 0.123), (13, -0.041), (14, -0.049), (15, -0.072), (16, 0.02), (17, 0.03), (18, -0.043), (19, -0.105), (20, 0.13), (21, -0.093), (22, -0.06), (23, -0.059), (24, -0.012), (25, -0.006), (26, 0.014), (27, -0.011), (28, 0.122), (29, -0.063), (30, -0.051), (31, -0.027), (32, 0.001), (33, -0.053), (34, -0.077), (35, -0.001), (36, -0.058), (37, 0.081), (38, 0.024), (39, 0.112), (40, 0.033), (41, -0.102), (42, -0.02), (43, 0.076), (44, 0.097), (45, -0.047), (46, 0.014), (47, -0.056), (48, 0.164), (49, -0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94369203 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

2 0.86709237 196 nips-2013-Modeling Overlapping Communities with Node Popularities

Author: Prem Gopalan, Chong Wang, David Blei

Abstract: We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB). Our model integrates two basic properties of nodes in social networks: homophily and preferential connection to popular nodes. We develop a scalable algorithm for posterior inference, based on a novel nonconjugate variant of stochastic variational inference. We evaluate the link prediction accuracy of our algorithm on nine real-world networks with up to 60,000 nodes, and on simulated networks with degree distributions that follow a power law. We demonstrate that the AMP predicts significantly better than the MMSB. 1

3 0.75983125 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

Author: Junming Yin, Qirong Ho, Eric Xing

Abstract: We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. 1

4 0.65757054 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

Author: Myunghwan Kim, Jure Leskovec

Abstract: unkown-abstract

5 0.6032328 187 nips-2013-Memoized Online Variational Inference for Dirichlet Process Mixture Models

Author: Michael Hughes, Erik Sudderth

Abstract: Variational inference algorithms provide the most effective framework for largescale training of Bayesian nonparametric models. Stochastic online approaches are promising, but are sensitive to the chosen learning rate and often converge to poor local optima. We present a new algorithm, memoized online variational inference, which scales to very large (yet finite) datasets while avoiding the complexities of stochastic gradient. Our algorithm maintains finite-dimensional sufficient statistics from batches of the full dataset, requiring some additional memory but still scaling to millions of examples. Exploiting nested families of variational bounds for infinite nonparametric models, we develop principled birth and merge moves allowing non-local optimization. Births adaptively add components to the model to escape local optima, while merges remove redundancy and improve speed. Using Dirichlet process mixture models for image clustering and denoising, we demonstrate major improvements in robustness and accuracy.

6 0.53754401 47 nips-2013-Bayesian Hierarchical Community Discovery

7 0.50360388 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization

8 0.47976214 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

9 0.45509332 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

10 0.44007868 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars

11 0.43594909 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

12 0.4348211 294 nips-2013-Similarity Component Analysis

13 0.42560574 317 nips-2013-Streaming Variational Bayes

14 0.41613951 189 nips-2013-Message Passing Inference with Chemical Reaction Networks

15 0.41366565 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

16 0.41357899 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

17 0.38241041 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing

18 0.37567502 277 nips-2013-Restricting exchangeable nonparametric distributions

19 0.37208298 61 nips-2013-Capacity of strong attractor patterns to model behavioural and cognitive prototypes

20 0.3693051 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.054), (33, 0.086), (34, 0.095), (36, 0.011), (41, 0.04), (49, 0.039), (56, 0.099), (70, 0.03), (77, 0.259), (85, 0.104), (89, 0.015), (91, 0.013), (93, 0.031), (95, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75012082 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

2 0.72952503 165 nips-2013-Learning from Limited Demonstrations

Author: Beomjoon Kim, Amir massoud Farahmand, Joelle Pineau, Doina Precup

Abstract: We propose a Learning from Demonstration (LfD) algorithm which leverages expert data, even if they are very few or inaccurate. We achieve this by using both expert data, as well as reinforcement signals gathered through trial-and-error interactions with the environment. The key idea of our approach, Approximate Policy Iteration with Demonstration (APID), is that expert’s suggestions are used to define linear constraints which guide the optimization performed by Approximate Policy Iteration. We prove an upper bound on the Bellman error of the estimate computed by APID at each iteration. Moreover, we show empirically that APID outperforms pure Approximate Policy Iteration, a state-of-the-art LfD algorithm, and supervised learning in a variety of scenarios, including when very few and/or suboptimal demonstrations are available. Our experiments include simulations as well as a real robot path-finding task. 1

3 0.68358731 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

Author: Shunan Zhang, Angela J. Yu

Abstract: How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observations, is an important problem in cognitive science. We investigate this behavior in the context of a multi-armed bandit task. We compare human behavior to a variety of models that vary in their representational and computational complexity. Our result shows that subjects’ choices, on a trial-totrial basis, are best captured by a “forgetful” Bayesian iterative learning model [21] in combination with a partially myopic decision policy known as Knowledge Gradient [7]. This model accounts for subjects’ trial-by-trial choice better than a number of other previously proposed models, including optimal Bayesian learning and risk minimization, ε-greedy and win-stay-lose-shift. It has the added benefit of being closest in performance to the optimal Bayesian model than all the other heuristic models that have the same computational complexity (all are significantly less complex than the optimal model). These results constitute an advancement in the theoretical understanding of how humans negotiate the tension between exploration and exploitation in a noisy, imperfectly known environment. 1

4 0.67222744 326 nips-2013-The Power of Asymmetry in Binary Hashing

Author: Behnam Neyshabur, Nati Srebro, Ruslan Salakhutdinov, Yury Makarychev, Payman Yadollahpour

Abstract: When approximating binary similarity using the hamming distance between short binary hashes, we show that even if the similarity is symmetric, we can have shorter and more accurate hashes by using two distinct code maps. I.e. by approximating the similarity between x and x as the hamming distance between f (x) and g(x ), for two distinct binary codes f, g, rather than as the hamming distance between f (x) and f (x ). 1

5 0.59346497 332 nips-2013-Tracking Time-varying Graphical Structure

Author: Erich Kummerfeld, David Danks

Abstract: Structure learning algorithms for graphical models have focused almost exclusively on stable environments in which the underlying generative process does not change; that is, they assume that the generating model is globally stationary. In real-world environments, however, such changes often occur without warning or signal. Real-world data often come from generating models that are only locally stationary. In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner. We show by simulation that the algorithm performs comparably to batch-mode learning when the generating graphical structure is globally stationary, and significantly better when it is only locally stationary. 1

6 0.59031987 196 nips-2013-Modeling Overlapping Communities with Node Popularities

7 0.59008259 7 nips-2013-A Gang of Bandits

8 0.58798498 168 nips-2013-Learning to Pass Expectation Propagation Messages

9 0.58200473 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

10 0.5819754 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

11 0.57573175 232 nips-2013-Online PCA for Contaminated Data

12 0.57238579 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

13 0.57011932 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

14 0.5661186 25 nips-2013-Adaptive Anonymity via $b$-Matching

15 0.56571394 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

16 0.56499022 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

17 0.56484419 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

18 0.56288016 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

19 0.56232095 357 nips-2013-k-Prototype Learning for 3D Rigid Structures

20 0.56229287 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices