nips nips2012 nips2012-287 knowledge-graph by maker-knowledge-mining

287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

Source: pdf

Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy

Abstract: A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a ﬂexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efﬁcient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Roy Department of Engineering University of Cambridge Abstract A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. [sent-2, score-0.207]

2 Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. [sent-3, score-0.392]

3 Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. [sent-4, score-0.752]

4 We obtain a ﬂexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. [sent-5, score-0.168]

5 Efﬁcient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. [sent-6, score-0.224]

6 1 Introduction Structured relational data arises in a variety of contexts, including graph-valued data [e. [sent-8, score-0.139]

7 This data is typiﬁed by expressing relations between 2 or more objects (e. [sent-15, score-0.112]

8 Pairwise relations can be represented by a 2-dimensional array (a matrix); more generally, relations between d-tuples are recorded as d-dimensional arrays (d-arrays). [sent-18, score-0.705]

9 Latent variable models for such data explain observations by means of an underlying structure or summary, such as a low-rank approximation to an observed array or an embedding into a Euclidean space. [sent-27, score-0.232]

10 This structure is formalized as a latent (unobserved) variable. [sent-28, score-0.108]

11 Hoff [4] ﬁrst noted that a number of parametric latent variable models for relational data are exchangeable—an applicable assumption whenever the objects in the data have no natural ordering e. [sent-40, score-0.291]

12 , users in a social network or products in ratings data—and can be cast into the common functional form guaranteed to exist by results in probability theory. [sent-42, score-0.107]

13 Building on this connection, 1 0 U1 U2 0 0 U1 Pr{Xij = 1} Θ U2 1 1 1 Figure 1: Left: The distribution of any exchangeable random graph with vertex set N and edges E = (Xij )i,j∈N can be characterised by a random function Θ : [0, 1]2 → [0, 1]. [sent-43, score-0.48]

14 Given Θ, a graph can be sampled by generating a uniform random variable Ui for each vertex i, and sampling edges as Xij ∼ Bernoulli(Θ(Ui , Uj )). [sent-44, score-0.232]

15 Right: A 100 × 100 symmetric adjacency matrix sampled from Θ. [sent-46, score-0.079]

16 Results of Aldous [2], Hoover [6] and Kallenberg [7] show that random arrays that satisfy an exchangeability property can be represented in terms of a random function. [sent-50, score-0.537]

17 These representations have been further developed in discrete analysis for the special case of graphs [13]; this case is illustrated in Fig. [sent-51, score-0.129]

18 Their implication for Bayesian modeling is that we can specify a prior for an exchangeable random array model by specifying a prior on (measurable) functions. [sent-54, score-0.646]

19 A prior must therefore be nonparametric to have reasonably large support since a parametric prior concentrates on a ﬁnite-dimensional subset. [sent-56, score-0.144]

20 In the following, we model the representing function explicitly using a nonparametric prior. [sent-57, score-0.093]

21 2 Background: Exchangeable graphs and arrays A fundamental component of every Bayesian model is a random variable Θ, the parameter of the model, which decouples the data. [sent-58, score-0.504]

22 A sequence is called exchangeable if its joint distribution is invariant under arbitrary permutation of the indices, i. [sent-63, score-0.275]

23 De Finetti’s theorem states that, (Xi )i∈N is exchangeable if and only if there exists a random probability measure Θ on X such that X1 , X2 , . [sent-74, score-0.347]

24 1 De Finetti-type representations for random arrays To specify Bayesian models for graph- or array-valued data, we need a suitable counterpart to de Finetti’s theorem that is applicable when the random sequences in (2. [sent-82, score-0.472]

25 1) are substituted by random arrays X = (Xij )i,j∈N . [sent-83, score-0.403]

26 1) applied to all elements of X is typically too restrictive: In the graph case Xij ∈ {0, 1}, for example, the probability of X would then depend only on the proportion of edges present in the graph, but not on the graph structure. [sent-85, score-0.157]

27 Instead, we deﬁne exchangeability of random 2-arrays in terms of the simultaneous application of a permutation to rows and columns. [sent-86, score-0.207]

28 An array X = (Xij )i,j∈N is called an exchangeable array if d for every π ∈ S∞ . [sent-89, score-0.739]

29 1) by demanding invariance only under a subset of all permutations of N2 —those of the form (i, j) → (π(i), π(j))—we can no longer expect de Finetti’s theorem to hold. [sent-92, score-0.076]

30 A random 2-array (Xij ) is exchangeable if and only if there is a random (measurable) function F : [0, 1]3 → X such that d (Xij ) = (F (Ui , Uj , Uij )). [sent-95, score-0.351]

31 2 Random graphs The graph-valued data case X = {0, 1} is of particular interest. [sent-102, score-0.104]

32 Here, the array X, interpreted as an adjacency matrix, speciﬁes a random graph with vertex set N. [sent-103, score-0.414]

33 We call a random graph exchangeable if X satisﬁes (2. [sent-105, score-0.377]

34 3) simpliﬁes further: there is a random function Θ : [0, 1]2 → [0, 1], symmetric in its arguments, such that F (Ui , Uj , Uij ) := 1 0 if Uij < Θ(Ui , Uj ) otherwise (2. [sent-108, score-0.073]

35 Recent work in discrete analysis shows that any symmetric measurable function [0, 1]2 → [0, 1] can be regarded as a (suitably deﬁned) limit of adjacency matrices of graphs of increasing size [13]—intuitively speaking, as the number of rows and columns increases, the array in Fig. [sent-119, score-0.567]

36 2 can in fact be stated in a more general setting than 2-arrays, namely for random darrays, which are collections of random variables of the form (Xi1 . [sent-124, score-0.076]

37 For a discussion of convergence properties of general arrays similar to those sketched above for random graphs, see [3]. [sent-150, score-0.375]

38 3 Model To deﬁne a Bayesian model for exchangeable graphs or arrays, we start with Theorem 2. [sent-153, score-0.404]

39 2: A distribution on exchangeable arrays can be speciﬁed by a distribution on measurable functions [0, 1]3 → X . [sent-154, score-0.689]

40 We initially sample a random function Θ—the model parameter in terms of Bayesian statistics—which captures the structure of the underlying graph or array. [sent-159, score-0.127]

41 The (Ui ) then represent attributes of nodes or objects and H and the array (Uij ) model the remaining noise in the observed relations. [sent-160, score-0.301]

42 More precisely, we take W = R and consider a zero-mean Gaussian process prior on CW := C([0, 1]2 , W), the space of continuous functions from[0, 1]2 to W, with kernel function κ : [0, 1]2 × [0, 1]2 → W. [sent-163, score-0.127]

43 Graphs and real-valued arrays require different choices of P . [sent-172, score-0.337]

44 In either case, the model ﬁrst generates the latent array W = (Wij ). [sent-173, score-0.365]

45 |Wij ] Graph Real array X = {0, 1} X =R Bernoulli(φ(Wij )) 2 Normal(Wij , σX ) 2 where φ is the logistic function, and σX is a noise variance parameter. [sent-175, score-0.232]

46 The Gaussian process prior favors smooth functions, which will in general result in more interpretable latent space embeddings. [sent-176, score-0.209]

47 Exchangeable, undirected graphs are always representable using a Bernoulli distribution for P [Xij ∈ . [sent-181, score-0.137]

48 3) Another rather subtle assumption arises implicitly when the array X is not symmetric, i. [sent-187, score-0.232]

49 2, the array (Uij ) is symmetric even if X is not. [sent-190, score-0.267]

50 However, it can be shown that any exchangeable array can be arbitrarily well approximated by arrays which treat Xij |Wij and Xji |Wji as independent [8, Thm. [sent-193, score-0.844]

51 The methods described here address random arrays that are dense, i. [sent-198, score-0.375]

52 , as the size of an n × n array increases the number of non-zero entries grows as O(n2 ). [sent-200, score-0.232]

53 2: For graph data the asymptotic proportion of present edges is p := Θ(x, y)dxdy, and the graph is hence either empty (for p = 0) or dense (since O(pn2 ) = O(n2 )). [sent-203, score-0.194]

54 Analogous representation theorems for sparse random graphs are to date an open problem in probability. [sent-204, score-0.171]

55 4 Related work Our model has some noteworthy relations to the Gaussian process latent variable model (GPLVM); a dimensionality-reduction technique [e. [sent-205, score-0.263]

56 GPLVMs can be applied to 2-arrays, but doing so makes the assumption that either the rows or the columns of the random array are independent [12]. [sent-208, score-0.345]

57 In terms of our model, this corresponds to choosing kernels of the form κU ⊗ δ, where ⊗ represents 4 a tensor product1 and δ represents an ‘identity’ kernel (i. [sent-209, score-0.098]

58 From this perspective, the application of our model to exchangeable real-valued arrays can be interpreted as a form of co-dimensionality reduction. [sent-212, score-0.637]

59 For graph data, a related parametric model is the eigenmodel of Hoff [4]. [sent-213, score-0.251]

60 This model, also justiﬁed by exchangeability arguments, approximates an array with a bilinear form, followed by some link function and conditional probability distribution. [sent-214, score-0.356]

61 Available nonparametric models include the inﬁnite relational model (IRM) [10], latent feature relational model (LFRM) [14], inﬁnite latent attribute model (ILA) [17] and many others. [sent-215, score-0.637]

62 The model uses kernels of the form κ1 ⊗ κ2 ; our work suggests that it may not be necessary to impose tensor product structure, which allows for inference with improved scaling. [sent-219, score-0.097]

63 Roy and Teh [20] present a nonparametric Bayesian model of relational data that approximates Θ by a piece-wise constant function with a speciﬁc hierarchical structure, which is called a Mondrian process in [20]. [sent-220, score-0.269]

64 Most importantly, we describe a random subset-of-regressors approximation that scales to graphs with hundreds of nodes. [sent-231, score-0.142]

65 2) 1 We deﬁne the tensor product of kernel functions as follows: (κU ⊗ κV )((u1 , v1 ), (u2 , v2 )) = κU (u1 , u2 ) × κV (v1 , v2 ). [sent-244, score-0.098]

66 2 Sampling without approximating the model In the simpler case of a real-valued array X, we construct an MCMC algorithm over the variables (U, ψ, σX ) by repeatedly slice sampling [16] from the conditional distributions ψi | ψ−i , σX , U, X σX | ψ, U, X and Uj | U−j , ψ, σX , X (5. [sent-250, score-0.369]

67 Let N = |U{i} | denote the number of rows in the observed array, let ξ be the set of all pairs (Ui , Uj ) for all observed relations Xij , let O = |ξ| denote the number of observed relations, and let K represent the O × O kernel matrix between all points in ξ. [sent-252, score-0.165]

68 Our approach is to treat both the inputs and outputs of the GP as latent variables. [sent-259, score-0.108]

69 In particular, we introduce k Gaussian distributed pseudoinputs η = (η1 , . [sent-260, score-0.093]

70 Writing Kηη for the kernel matrix formed from the pseudoinputs η, we have (ηi ) ∼iid N (0, I2r ) and T | η ∼ N (0, Kηη ). [sent-264, score-0.145]

71 5) where Kξη is the kernel matrix between the latent embeddings ξ and the pseudoinputs η. [sent-267, score-0.253]

72 The conditional distribution T | U, η, ψ, (σX ), X is amenable to elliptical slice sampling [15]. [sent-269, score-0.16]

73 All other random parameters, including the (Ui ), can again be sampled from their full conditional distributions using slice sampling. [sent-270, score-0.117]

74 The third data set, a protein interactome, was previously noted by Hoff [4] to be of interest since it exhibits both block structure and transitivity. [sent-276, score-0.113]

75 Data set Recorded data Vertices High school NIPS Protein high school social network densely connected subset of coauthorship network protein interactome 90 234 230 Reference e. [sent-277, score-0.464]

76 The models are chosen for comparability, since they all embed nodes into a Euclidean latent space. [sent-285, score-0.108]

77 The network exhibits stochastic equivalence (visible as block structure in the matrix) and homophily (concentration of points around the diagonal). [sent-301, score-0.083]

78 Parameters are chosen to favor slice sampling acceptance after a reasonable number of iterations, as evaluated over a range of data sets, summarized in the table on the right. [sent-305, score-0.112]

79 Algorithm parameters author defaults author defaults author defaults (see below) log mean length scale scale factor target noise U η std width 1 2 0. [sent-307, score-0.3]

80 We also note that in all experiments, a single latent dimension sufﬁces to achieve better performance, even when the other models use additional latent dimensions. [sent-356, score-0.216]

81 The posterior distribution of Θ favors functions deﬁning random array distributions that explain the data well. [sent-357, score-0.296]

82 The standard inference methods for GPLVM and PMF applied to relational data, in contrast, are designed to ﬁt mean squared error, and should therefore be expected to show stronger performance under a mean squared error metric. [sent-359, score-0.165]

83 Figure 2 provides a visualisation of Θ when modeling the protein interactome data using 1 latent dimension. [sent-404, score-0.398]

84 The likelihood of the smooth peak is sensitive to the lengthscale of the Gaussian process representation of Θ. [sent-405, score-0.119]

85 A Gaussian process prior introduces the assumption that Θ is continuous. [sent-406, score-0.075]

86 Continuous functions are dense in the space of measurable functions, i. [sent-407, score-0.114]

87 , any measurable function can be arbitrarily well approximated by a continuous one. [sent-409, score-0.077]

88 The assumption of continuity is therefore not restrictive, but rather the lengthscale of the Gaussian process determines the complexity of the model a priori. [sent-410, score-0.115]

89 The nonparametric prior placed on Θ allows the posterior to approximate any function if supported by the data, but by sampling the lengthscale we allow the model to quickly select an appropriate level of complexity. [sent-411, score-0.217]

90 7 Discussion and conclusions There has been a tremendous amount of research into modelling matrices, arrays, graphs and relational data, but nonparametric Bayesian modeling of such data is essentially uncharted territory. [sent-412, score-0.355]

91 In most modelling circumstances, the assumption of exchangeability amongst data objects is natural and fundamental to the model. [sent-413, score-0.212]

92 In this case, the representation results [2, 6, 7] precisely map out the scope of possible Bayesian models for exchangeable arrays: Any such model can be interpreted as a prior on random measurable functions on a suitable space. [sent-414, score-0.482]

93 Nonparametric Bayesian statistics provides a number of possible priors on random functions, but the Gaussian process and its modiﬁcations are the only well-studied model for almost surely continuous functions. [sent-415, score-0.126]

94 The model results in both interpretable representations for networks, such as a visualisation of a protein interactome, and has competitive predictive performance on benchmark data. [sent-417, score-0.201]

95 Modeling homophily and stochastic equivalence in symmetric relational data. [sent-447, score-0.212]

96 Relations on probability spaces and arrays of random variables. [sent-461, score-0.375]

97 Learning systems of concepts with an inﬁnite relational model. [sent-481, score-0.139]

98 Probabilistic non-linear principal component analysis with Gaussian process latent variable models. [sent-486, score-0.145]

99 A unifying view of sparse approximate gaussian n process regression. [sent-535, score-0.09]

100 Efﬁcient sampling for Gaussian process inference using control variables. [sent-574, score-0.096]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('arrays', 0.337), ('exchangeable', 0.275), ('xij', 0.269), ('wij', 0.237), ('array', 0.232), ('ui', 0.232), ('uij', 0.231), ('uj', 0.185), ('gplvm', 0.17), ('eigenmodel', 0.162), ('finetti', 0.162), ('interactome', 0.139), ('relational', 0.139), ('exchangeability', 0.124), ('xji', 0.123), ('pmf', 0.116), ('aldous', 0.116), ('kallenberg', 0.116), ('hoff', 0.113), ('protein', 0.113), ('latent', 0.108), ('graphs', 0.104), ('pseudoinputs', 0.093), ('gp', 0.088), ('hoover', 0.082), ('slice', 0.079), ('measurable', 0.077), ('wji', 0.075), ('defaults', 0.07), ('sor', 0.07), ('nonparametric', 0.068), ('relations', 0.068), ('iid', 0.065), ('graph', 0.064), ('rfm', 0.061), ('mondrian', 0.061), ('gaussian', 0.053), ('lengthscale', 0.053), ('kernel', 0.052), ('ij', 0.049), ('elliptical', 0.048), ('bayesian', 0.048), ('tensor', 0.046), ('ila', 0.046), ('lfrm', 0.046), ('mui', 0.046), ('smgb', 0.046), ('rows', 0.045), ('network', 0.045), ('objects', 0.044), ('school', 0.044), ('adjacency', 0.044), ('modelling', 0.044), ('invariance', 0.042), ('uik', 0.041), ('irm', 0.041), ('roy', 0.04), ('mcmc', 0.039), ('yan', 0.038), ('random', 0.038), ('prior', 0.038), ('blockmodels', 0.038), ('visualisation', 0.038), ('ujk', 0.038), ('homophily', 0.038), ('process', 0.037), ('dense', 0.037), ('vertex', 0.036), ('epsrc', 0.035), ('cw', 0.035), ('symmetric', 0.035), ('held', 0.034), ('social', 0.034), ('theorem', 0.034), ('knowles', 0.034), ('palla', 0.034), ('sampling', 0.033), ('undirected', 0.033), ('bernoulli', 0.033), ('heat', 0.032), ('symmetries', 0.032), ('uniform', 0.032), ('lawrence', 0.031), ('author', 0.03), ('nips', 0.03), ('columns', 0.03), ('edges', 0.029), ('representation', 0.029), ('substituted', 0.028), ('users', 0.028), ('middle', 0.028), ('klein', 0.028), ('normal', 0.028), ('jmlr', 0.027), ('priors', 0.026), ('auc', 0.026), ('favors', 0.026), ('inference', 0.026), ('representations', 0.025), ('model', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy

2 0.13335387 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efﬁcient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

3 0.11650282 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially inﬁnite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well ﬁtted to several real-world social networks. 1

4 0.10286197 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identiﬁes the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difﬁcult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

5 0.093337037 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

Author: Jasper Snoek, Hugo Larochelle, Ryan P. Adams

Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a “black art” requiring expert experience, rules of thumb, or sometimes bruteforce search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm’s generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks. 1

6 0.084110536 22 nips-2012-A latent factor model for highly multi-relational data

7 0.080616996 187 nips-2012-Learning curves for multi-task Gaussian process regression

8 0.078726441 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

9 0.078376807 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

10 0.077722445 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

11 0.075199343 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

12 0.074908756 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

13 0.071311198 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

14 0.070796102 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

15 0.070559695 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

16 0.07033658 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

17 0.068937957 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

18 0.066249833 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

19 0.066239305 364 nips-2012-Weighted Likelihood Policy Search with Model Selection

20 0.066133119 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.186), (1, 0.06), (2, 0.008), (3, -0.03), (4, -0.121), (5, -0.044), (6, -0.012), (7, 0.006), (8, -0.027), (9, -0.042), (10, -0.071), (11, 0.011), (12, -0.034), (13, -0.014), (14, 0.031), (15, 0.043), (16, 0.07), (17, 0.002), (18, 0.02), (19, -0.033), (20, 0.008), (21, -0.068), (22, 0.044), (23, -0.061), (24, -0.026), (25, 0.025), (26, -0.071), (27, -0.03), (28, -0.016), (29, -0.002), (30, -0.014), (31, 0.102), (32, -0.065), (33, 0.129), (34, 0.042), (35, -0.115), (36, 0.051), (37, -0.032), (38, 0.023), (39, -0.006), (40, -0.073), (41, -0.041), (42, 0.048), (43, -0.026), (44, -0.045), (45, 0.006), (46, -0.086), (47, 0.039), (48, -0.06), (49, 0.143)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91111803 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy

2 0.63593721 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

3 0.63549477 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

4 0.62709481 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

Author: Lei Shi

Abstract: For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Brieﬂy, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two low-dimensional features, which distribute in the row-wise and column-wise latent subspaces respectively. In consequence, PCSA captures the dependencies among entries intricately, and is able to handle non-Gaussian and heteroscedastic densities. By formulating the posterior updating into the task of solving Sylvester equations, we propose an efﬁcient variational inference algorithm. Furthermore, PCSA is extended to tackling and ﬁlling missing values, to adapting model sparseness, and to modelling tensor data. In comparison with several state-of-art methods, experiments demonstrate the effectiveness and efﬁciency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and ﬁlling missing values.

5 0.6191085 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

6 0.60470247 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

7 0.58849609 22 nips-2012-A latent factor model for highly multi-relational data

8 0.54339439 192 nips-2012-Learning the Dependency Structure of Latent Factors

9 0.53422391 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

10 0.52899325 55 nips-2012-Bayesian Warped Gaussian Processes

11 0.50588584 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

12 0.49080887 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

13 0.48825115 233 nips-2012-Multiresolution Gaussian Processes

14 0.48293275 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

15 0.47795746 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

16 0.47418725 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

17 0.46888715 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

18 0.46853295 66 nips-2012-Causal discovery with scale-mixture model for spatiotemporal variance dependencies

19 0.46378249 128 nips-2012-Fast Resampling Weighted v-Statistics

20 0.46346587 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.04), (9, 0.017), (21, 0.023), (38, 0.09), (39, 0.014), (42, 0.038), (54, 0.36), (55, 0.017), (74, 0.06), (76, 0.139), (80, 0.077), (92, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94270569 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for ﬁnding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an inﬁnite number of observations, there are only a ﬁnite number of continuous observation partitionings that are relevant for optimal decision-making when a ﬁnite, ﬁxed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

2 0.8456862 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difﬁcult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a beneﬁt from setting ﬂexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

3 0.81792468 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

Author: Paul Vernaza, Drew Bagnell

Abstract: Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality— a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efﬁciently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data. 1

4 0.80058497 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

5 0.79354674 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method signiﬁcantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classiﬁers as black boxes and learns from execution traces using reinforcement learning. 1

same-paper 6 0.7788285 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

7 0.67896777 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

8 0.67674923 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

9 0.63968635 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

10 0.63963556 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

11 0.62520248 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

12 0.62313253 38 nips-2012-Algorithms for Learning Markov Field Policies

13 0.62292457 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

14 0.61588639 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

15 0.61325264 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

16 0.60939318 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

17 0.60708988 348 nips-2012-Tractable Objectives for Robust Policy Optimization

18 0.60475188 160 nips-2012-Imitation Learning by Coaching

19 0.60202664 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

20 0.60144484 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress