nips nips2006 nips2006-183 knowledge-graph by maker-knowledge-mining

183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction

Source: pdf

Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu

Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each deﬁned on one type of entities. These models in fact deﬁne a set of nonparametric priors on inﬁnite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reﬂected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efﬁcient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The key idea is to model the stochastic structure of entity relationships (i. [sent-2, score-0.413]

2 , links) via a tensor interaction of multiple GPs, each deﬁned on one type of entities. [sent-4, score-0.145]

3 These models in fact deﬁne a set of nonparametric priors on inﬁnite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. [sent-5, score-0.235]

4 By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reﬂected by the adapted GP kernels. [sent-6, score-0.915]

5 The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). [sent-7, score-0.588]

6 Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. [sent-9, score-0.24]

7 In the end we discuss extensions of SRM to general relational learning tasks. [sent-10, score-0.382]

8 1 Introduction Relational learning concerns the modeling of physical, social, or other phenomena, where rich types of entities interact via complex relational structures. [sent-11, score-0.605]

9 If compared to the traditional machine learning settings, the entity relationships provide additional structural information. [sent-12, score-0.378]

10 A simple example of a relational setting is the user-movie rating database, which contains user entities with user attributes (e. [sent-13, score-0.85]

11 , age, gender, education), movie entities with movie attributes (e. [sent-15, score-0.42]

12 A typical relational learning problem is entity classiﬁcation, for example, segmenting users into groups. [sent-20, score-0.704]

13 One may apply usual clustering or classiﬁcation methods based on a ﬂat structure of data, where each user is associated with a vector of user attributes. [sent-21, score-0.127]

14 However a sound model should additionally explore the user-movie relations as well as even the movie attributes, since like-minded users tend to give similar ratings on the same movie, and may like (or dislike) movies with similar genres. [sent-22, score-0.233]

15 Entity classiﬁcation in a relational setting has gained considerable attentions, like webpage classiﬁcation using both textual contents and hyperlinks. [sent-24, score-0.362]

16 For example, one may want to predict protein-protein in- teractions, or in another application, user ratings for products. [sent-26, score-0.127]

17 The family of these problems has been called link prediction1 , which is the primary topic of this paper. [sent-27, score-0.188]

18 In general, link prediction includes link existence prediction (i. [sent-28, score-0.446]

19 In this paper we propose a family of stochastic relational models (SRM) for link prediction and other relational learning tasks. [sent-38, score-1.04]

20 The key idea of SRM is a stochastic link-wise process induced by a tensor interplay of multiple entity-wise Gaussian processes (GP). [sent-39, score-0.232]

21 These models in fact deﬁne a set of nonparametric priors on an inﬁnite dimensional tensor matrix, where each element represents a relationship between a tuple of entities. [sent-40, score-0.235]

22 By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reﬂected by the learned entity-wise GP kernels (i. [sent-41, score-0.947]

23 We present various models of SRM and address the computational issue, which is crucial in link prediction because the number of potential links grows exponentially with the entity size. [sent-45, score-0.692]

24 2 Stochastic Relational Models We ﬁrst consider pairwise asymmetric links r between entities u ∈ U and v ∈ V. [sent-57, score-0.403]

25 We use u or v to represent the attribute vectors of entities or their identity when entity attributes are unavailable. [sent-59, score-0.603]

26 Note that ri,n ≡ r(ui , vn ) does not have to be identical to rn,i when U = V, i. [sent-60, score-0.181]

27 Extensions to involve more than two entity sets, multi-way relations (i. [sent-63, score-0.302]

28 , links connecting more than two entities), and symmetric links are straightforward and will be brieﬂy discussed in Sec. [sent-65, score-0.356]

29 We assume that the observable links r are derived as local measurements of a real-valued latent relational function t : U × V → R, and each link ri,n is solely dependent on its latent value ti,n , modeled by the likelihood p(ri,n |ti,n ). [sent-67, score-0.829]

30 The focus of this paper is a family of stochastic relational processes acting on U × V, the space of entity pairs, to generate the latent relational function t, via a tensor interaction of two independent entity-speciﬁc GPs, one acting on U and the other on V. [sent-68, score-1.362]

31 Let the relational processes be characterized by hyperparameters θ = {θΣ , θΩ }, θΣ for the GP kernel function on U and θΩ for the GP kernel function on V, a SRM thus deﬁnes a Bayesian prior p(t|θ) for the latent variables t. [sent-70, score-0.588]

32 Let I be the index set of entity pairs having observed links, the marginal likelihood (also called evidence) under such a prior is p(RI |θ) = p(ri,n |ti,n )p(t|θ)dt, θ = {θΣ , θΩ } (1) (i,n)∈I where RI = {ri,n }(i,n)∈I . [sent-71, score-0.323]

33 We estimate the hyperparameters θ by maximizing the evidence, which is an empirical Bayesian approach to learning the relational structure of data. [sent-72, score-0.424]

34 Once θ are determined, we can predict the link for a new pair of entities via marginalization over the a posteriori p(t|RI , θ). [sent-73, score-0.422]

35 1 In order to represent a rich class of link structures, p(t|θ) should be sufﬁciently expressive. [sent-75, score-0.16]

36 2 Hierarchical Gaussian Processes By observing the relational data collectively, one may notice that two entities ui and uj in U demonstrate correlated relationships to entities in V. [sent-87, score-1.205]

37 For example, two users often show opposite or close opinions on movies, or two hub web pages are co-linked by a set of other authority web pages. [sent-88, score-0.155]

38 In this case, the dependency structure of links can be reduced to a dependency structure of entities in U. [sent-89, score-0.659]

39 The model assumes that, for every v ∈ V, its relational function t(·, v) : U → R is an i. [sent-91, score-0.362]

40 Optimizing the GP kernel Σ via evidence maximization means to learn the dependency of entities in U, summarized over all the entities v ∈ V. [sent-96, score-0.612]

41 There is a drawback if applying HGP to link prediction. [sent-97, score-0.16]

42 In particular, the attributes of entities v cannot be incorporated even if their entity attributes are available. [sent-99, score-0.708]

43 However, for the mentioned applications, it also makes sense to explore the dependency between movies, or the dependency between authority web pages. [sent-100, score-0.274]

44 The model explains the relational dependency through the entity dependencies of both V and U. [sent-104, score-0.744]

45 Let θ = {θΣ , θΩ }, we describe a stochastic relational process p(t|θ) as the following: Deﬁnition 2. [sent-105, score-0.409]

46 Given two sets U and V, a collection of random variables {t(u, v)|t : U × V → R} follow a tensor Gaussian processes (TGP), if for every ﬁnite sets {u1 , . [sent-107, score-0.185]

47 As a key difference, the new model treats the relational function t as a whole sample from a TGP, instead of being formed by i. [sent-117, score-0.362]

48 In other words, TGP is in fact a GP for the relational function t, where the kernel function Υ : (U × V) × (U × V) → R is deﬁned via a tensor product of two GP kernels Cov(ti,n , tj,m ) = Υ[(ui , vn ), (uj , vm )] = Σ(ui , uj )Ω(vn , vm ). [sent-132, score-1.179]

49 The model explains the dependence structure of links by the dependence structure of participating entities. [sent-133, score-0.26]

50 Let U ⊆ RP , V ⊆ RQ , and W ∼ NP ×Q (0, IP , IQ ), where IP denotes a P × P identity matrix and ·, · denotes the inner product, then the bilinear function t(u, v) = u Wv follows T GP(0, Σ, Ω) with Σ(ui , uj ) = ui , uj and Ω(vn , vm ) = vn , vm . [sent-138, score-0.924]

51 The proof is straightforward through Cov[t(ui , vn ), t(uj , vm )] = ui , uj vn , vm and E[t(ui , vn )] = 0, where E[·] means expectation. [sent-140, score-1.142]

52 In practice, the linear model will help to provide an efﬁcient discriminative approach to link prediction. [sent-141, score-0.184]

53 It appears that link prediction using TGP is almost the same as a normal GP regression or classiﬁcation, except that the hyperparameters θ now have two parts, θΣ for Σ and θΩ for Ω. [sent-142, score-0.247]

54 4 A Family of Stochastic Processes for Entity Relationships To improve the scalability of SRM, and also describe the relational dependency in a way similar to TGP, we propose a family of stochastic processes p(t|θ) for entity relationships. [sent-148, score-0.877]

55 A relational function t : U × V → R is said to iid d 1 follow a stochastic relational process (SRP), if t(u, v) = √d k=1 fk (u)gk (v), fk (u) ∼ GP(0, Σ), iid gk (v) ∼ GP(0, Ω). [sent-151, score-1.119]

56 Based on the central limit theory, for every (ui , vn ), ti,n ≡ t(ui , vn ) converges to a Gaussian random variable. [sent-157, score-0.362]

57 In the next steps, we prove E[ti,n ] = 0 and Cov(ti,n , tj,m ) = d d 1 E[ti,n tj,m ] = d { k=1 E[fk (ui )fk (uj )gk (vn )gk (vm )] + k=κ E[fk (ui )fκ (uj )gk (vn )gκ (vm )]} = 1 d d k=1 E[fk (ui )fk (uj )gk (vn )gk (vm )] = Σ(ui , uj )Ω(vn , vm ). [sent-158, score-0.284]

58 SRP is a general family of priors for modeling the relationships between entities, in which HGP and TGP are special cases. [sent-161, score-0.129]

59 The generalization offers several advantages: (1) SRP can model symmetric links between the same set of entities. [sent-162, score-0.178]

60 If we build a generative process where U = V, Σ = Ω and fk = gk , then on every ﬁnite sets {ui }N , T = {t(ui , uj )} is always a symmetric matrix; (2) Given i=1 a ﬁxed d, the inference with SRP obtains a much reduced complexity. [sent-163, score-0.444]

61 2 Choices for the Likelihood p(ri,n |ti,n ) The likelihood term describes the dependency of observable relationships on the latent variables. [sent-167, score-0.261]

62 For example, one may want to predict the rating of user u for movie v. [sent-176, score-0.168]

63 • One-class Problem: Sometimes one observed the presence of links between entities, for example, the hyperlinks between web pages. [sent-180, score-0.208]

64 Based on the open-world assumption, if a web page does not link to another, it does not mean that they are unrelated. [sent-181, score-0.19]

65 Therefore, we employ the likelihood p(ri,n |ti,n ) = Φ(ri,n ti,n − b) for those observed links ri,n = 1, where b is an offset. [sent-182, score-0.205]

66 3 Inference with Laplacian Approximation We have described the relational model under a prior of SRP, in which HGP and TGP are subcases. [sent-183, score-0.385]

67 , gM,k ] , where fi,k = fk (ui ), gn,k = gk (vn ). [sent-191, score-0.277]

68 We factorize the approximating distribution as q(F, G|β) = d ∗ ∗ ∗ ∗ k=1 q(f k |f k , Σk )q(gk |gk , Ωk ), where f k and gk are the solution from Eq. [sent-201, score-0.206]

69 Finally we note that, the posterior distribution of {F, G} has many modes (Simply, shufﬂing the order of latent dimensions or changing the signs of both f k and gk does not change the probability. [sent-206, score-0.257]

70 However each mode is equally well in constructing the relational function t. [sent-208, score-0.391]

71 In principal, it allows to learn some parametric forms of kernel functions Σ(ui , uj ; θΣ ) and Ω(vn , vm ; θΩ ) that are generalizable to new entities. [sent-210, score-0.319]

72 In this paper we particularly consider an situation where entity attributes are not fully informative or even absent. [sent-211, score-0.429]

73 Due to the conjugacy of the hyper prior, the maximization have an analytical solution, Σ= 5 λΣ0 + 1 d d ∗ ∗ k=1 (f k f k λ+1 + Σk ) , Ω= λΩ0 + 1 d d ∗ ∗ k=1 (gk gk λ+1 + Ωk ) . [sent-217, score-0.239]

74 We use a and η to penalize the effects of γ(·, ·) and ξ(·, ·), respectively, when entity attributes are deﬁcient. [sent-220, score-0.378]

75 6 Related Work There is a history of probabilistic relational models (PRM) [8] in machine learning. [sent-229, score-0.38]

76 [5] introduced link uncertainty and deﬁned a generative model for both entity attributes and links. [sent-231, score-0.538]

77 Recently, [12] and [7] independently introduced an inﬁnite (hidden) relational model to avoid the difﬁculty of structural learning in PRM by explaining links via a potentially inﬁnite number of hidden states of entities. [sent-232, score-0.589]

78 proposed relational Markov networks (RMNs) for link prediction [11], by describing a conditional distribution of links given entity attributes and other links. [sent-234, score-1.141]

79 RMN has to deﬁne a class of potential functions on cliques of random variables based on the observed relational structure. [sent-235, score-0.38]

80 In another work [1] a SVM using a tensor kernel based on user and item attributes was used to predict user ratings on items, which is similar to our TGP case and suffers a salability problem. [sent-240, score-0.466]

81 When attributes are deﬁcient or unavailable, the model does not work well, while SRM can learn informative kernels purely from only links (see Sec. [sent-241, score-0.366]

82 906); (c) prediction of SRM with noninformative prior (classiﬁcation rate 0. [sent-247, score-0.137]

83 942); (d) prediction of SRM with informative prior (classiﬁcation rate 0. [sent-248, score-0.137]

84 965); (e-f) informative Σ0 and Ω0 ; (g-h) learned Σ and Ω with noninformative prior; (i-j) learned Σ and Ω with informative prior. [sent-249, score-0.153]

85 structural learning by adapting the kernels and marginalizing out the latent relational function, while MMMF only estimates the mode of the latent relational function with ﬁxed Dirac kernels. [sent-250, score-0.918]

86 7 Experiments Synthetic Data: We generated two sets of entities U = {ui }20 and V = {vn }30 on a real line n=1 i=1 such that ui = 0. [sent-251, score-0.4]

87 The positions of entities were used to compute two RBF kernels that serve as informative Σ0 and Ω0 . [sent-254, score-0.308]

88 Binary links between U and V are obtained by taking the sign of a function, which is a sample from T GP(0, Σ, Ω). [sent-257, score-0.178]

89 We randomly withdrew 50% of links for training, and left the remaining for test (see Fig. [sent-258, score-0.208]

90 1, the block structures of learned kernels indicate that both SRMs can learn the cluster structure of entities from links. [sent-268, score-0.276]

91 8 Conclusions and Future Extensions In this paper we proposed a stochastic relational model (SRM) for learning relational data. [sent-283, score-0.771]

92 Entity relationships are modeled by a tensor interaction of multiple Gaussian processes (GPs). [sent-284, score-0.259]

93 We proposed a family of relational processes and showed its convergence to a tensor Gaussian process if the degrees of freedom goes to inﬁnity. [sent-285, score-0.575]

94 The process imposes an effective prior on the entity relationships, Table 1: User-movie rating prediction error measured by RMSE Repeats MMMF SRM 1 1. [sent-286, score-0.409]

95 005 and leads to a discriminative link prediction model. [sent-336, score-0.247]

96 Though the current work focused on the application of link prediction, the model can be used for general relational learning purposes. [sent-338, score-0.522]

97 There are several directions to extend the current model: (1) SRM can describe a joint distribution of entity links and entity classes conditioned on entity-wise GP kernels. [sent-339, score-0.724]

98 In a recent work [2] a tensor GP for multi-task learning was independently suggested; (5) Finally, it is extremely important to make the algorithm scalable to very large relational data, like the Netﬂix problem, containing about 480,000 users and 17,000 movies. [sent-344, score-0.558]

99 Probabilistic models of text and link structure for hypertext classiﬁcation. [sent-383, score-0.197]

100 Learning systems of concepts with an inﬁnite relational model. [sent-398, score-0.362]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('srm', 0.449), ('relational', 0.362), ('gp', 0.279), ('entity', 0.273), ('entities', 0.225), ('gk', 0.206), ('tgp', 0.192), ('vn', 0.181), ('links', 0.178), ('ui', 0.175), ('mmmf', 0.162), ('link', 0.16), ('uj', 0.144), ('vm', 0.14), ('srp', 0.133), ('tensor', 0.127), ('dependency', 0.109), ('attributes', 0.105), ('hgp', 0.103), ('relationships', 0.074), ('fk', 0.071), ('users', 0.069), ('prediction', 0.063), ('etr', 0.059), ('processes', 0.058), ('ri', 0.056), ('gps', 0.055), ('user', 0.054), ('ratings', 0.054), ('informative', 0.051), ('noninformative', 0.051), ('latent', 0.051), ('rating', 0.05), ('stochastic', 0.047), ('movie', 0.045), ('participating', 0.044), ('dirac', 0.036), ('movies', 0.036), ('kernel', 0.035), ('hyper', 0.033), ('rmse', 0.033), ('repeats', 0.033), ('kernels', 0.032), ('structural', 0.031), ('tresp', 0.031), ('web', 0.03), ('eachmovie', 0.03), ('messaged', 0.03), ('srms', 0.03), ('withdrew', 0.03), ('relations', 0.029), ('mode', 0.029), ('cov', 0.028), ('collaborative', 0.028), ('family', 0.028), ('likelihood', 0.027), ('priors', 0.027), ('classi', 0.026), ('yu', 0.026), ('exchanged', 0.026), ('unavailable', 0.026), ('authority', 0.026), ('taskar', 0.025), ('hyperparameters', 0.024), ('discriminative', 0.024), ('prm', 0.023), ('iw', 0.023), ('mae', 0.023), ('rmn', 0.023), ('relationship', 0.023), ('prior', 0.023), ('xu', 0.023), ('nonparametric', 0.023), ('inference', 0.023), ('covariance', 0.022), ('chu', 0.022), ('wei', 0.022), ('gaussian', 0.021), ('vec', 0.021), ('getoor', 0.021), ('extensions', 0.02), ('nite', 0.02), ('predict', 0.019), ('maximizing', 0.019), ('structure', 0.019), ('predictive', 0.019), ('marginalized', 0.019), ('det', 0.019), ('ip', 0.019), ('laplacian', 0.019), ('acting', 0.018), ('cliques', 0.018), ('item', 0.018), ('models', 0.018), ('via', 0.018), ('proceedings', 0.017), ('conference', 0.017), ('eq', 0.017), ('encouraging', 0.017), ('tuple', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction

Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu

2 0.13577385 169 nips-2006-Relational Learning with Gaussian Processes

Author: Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S. S. Keerthi

Abstract: Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. 1

3 0.12896444 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

Author: Mark Girolami, Mingjun Zhong

Abstract: By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classiﬁcation setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classiﬁer combination. 1

4 0.10292853 124 nips-2006-Linearly-solvable Markov decision problems

Author: Emanuel Todorov

Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1

5 0.099010713 138 nips-2006-Multi-Task Feature Learning

Author: Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil

Abstract: We present a method for learning a low-dimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacross-tasks representations and in the former step we learn task-speciﬁc functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks. 1

6 0.09759213 115 nips-2006-Learning annotated hierarchies from relational data

7 0.096859001 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

8 0.088655733 166 nips-2006-Recursive Attribute Factoring

9 0.08758796 65 nips-2006-Denoising and Dimension Reduction in Feature Space

10 0.084866092 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

11 0.077242635 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo

12 0.059899066 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

13 0.059471179 163 nips-2006-Prediction on a Graph with a Perceptron

14 0.057797275 132 nips-2006-Modeling Dyadic Data with Binary Latent Factors

15 0.056796238 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks

16 0.053586598 185 nips-2006-Subordinate class recognition using relational object models

17 0.053041078 173 nips-2006-Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

18 0.050548974 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

19 0.05052587 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

20 0.048134513 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.153), (1, 0.048), (2, 0.011), (3, -0.032), (4, 0.035), (5, 0.066), (6, 0.083), (7, 0.036), (8, -0.006), (9, -0.065), (10, -0.064), (11, -0.017), (12, 0.05), (13, -0.13), (14, -0.091), (15, 0.038), (16, 0.049), (17, -0.039), (18, -0.12), (19, -0.015), (20, -0.141), (21, -0.046), (22, 0.19), (23, 0.131), (24, 0.065), (25, 0.033), (26, -0.072), (27, -0.013), (28, 0.231), (29, 0.13), (30, -0.007), (31, 0.125), (32, 0.016), (33, 0.136), (34, -0.065), (35, -0.005), (36, -0.034), (37, -0.092), (38, 0.26), (39, 0.131), (40, 0.194), (41, 0.02), (42, 0.037), (43, -0.021), (44, 0.089), (45, 0.167), (46, 0.039), (47, 0.034), (48, -0.018), (49, 0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9524073 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction

Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu

2 0.58421999 169 nips-2006-Relational Learning with Gaussian Processes

Author: Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S. S. Keerthi

3 0.48850474 138 nips-2006-Multi-Task Feature Learning

Author: Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil

4 0.43768665 124 nips-2006-Linearly-solvable Markov decision problems

Author: Emanuel Todorov

5 0.4305926 166 nips-2006-Recursive Attribute Factoring

Author: Deepak Verma, Karl Pfleger, David Tax

Abstract: Clustering, or factoring of a document collection attempts to “explain” each observed document in terms of one or a small number of inferred prototypes. Prior work demonstrated that when links exist between documents in the corpus (as is the case with a collection of web pages or scientiﬁc papers), building a joint model of document contents and connections produces a better model than that built from contents or connections alone. Many problems arise when trying to apply these joint models to corpus at the scale of the World Wide Web, however; one of these is that the sheer overhead of representing a feature space on the order of billions of dimensions becomes impractical. We address this problem with a simple representational shift inspired by probabilistic relational models: instead of representing document linkage in terms of the identities of linking documents, we represent it by the explicit and inferred attributes of the linking documents. Several surprising results come with this shift: in addition to being computationally more tractable, the new model produces factors that more cleanly decompose the document collection. We discuss several variations on this model and show how some can be seen as exact generalizations of the PageRank algorithm. 1

6 0.39336994 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples

7 0.39113835 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

8 0.34878385 115 nips-2006-Learning annotated hierarchies from relational data

9 0.28377351 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

10 0.26993507 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

11 0.26938346 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks

12 0.2658498 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo

13 0.2540651 40 nips-2006-Bayesian Detection of Infrequent Differences in Sets of Time Series with Shared Structure

14 0.25243661 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

15 0.2480295 185 nips-2006-Subordinate class recognition using relational object models

16 0.24612588 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

17 0.24565691 6 nips-2006-A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems

18 0.24172749 5 nips-2006-A Kernel Method for the Two-Sample-Problem

19 0.23171006 173 nips-2006-Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

20 0.23015027 113 nips-2006-Learning Structural Equation Models for fMRI

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.097), (3, 0.024), (7, 0.067), (9, 0.05), (20, 0.036), (22, 0.053), (44, 0.089), (54, 0.03), (57, 0.049), (65, 0.026), (69, 0.026), (71, 0.016), (72, 0.014), (73, 0.319)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75858575 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction

Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu

2 0.48245776 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data

Author: Yevgeny Seldin, Noam Slonim, Naftali Tishby

Abstract: We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X, Y , but rather as a sample of values of an unknown (stochastic) function Z(X, Y ). For example, in gene expression data, the expression level Z is a function of gene X and condition Y ; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative ﬁltering) it improves the currently best known results. 1

3 0.47137368 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf

Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an inﬁnite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantiﬁed. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1

4 0.47098461 109 nips-2006-Learnability and the doubling dimension

Author: Yi Li, Philip M. Long

Abstract: Given a set of classiﬁers and a probability distribution over their domain, one can deﬁne a metric by taking the distance between a pair of classiﬁers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor. We prove a bound that holds for any algorithm that outputs a classiﬁer with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of , and strengthens the best known bound in terms of the VC-dimension alone. We show that there is no bound on the doubling dimension in terms of the VC-dimension of (in contrast with the metric dimension).

5 0.47044426 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: We show that the relevant information about a classiﬁcation problem in feature space is contained up to negligible error in a ﬁnite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efﬁcient implicit representations of the data to perform classiﬁcation. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classiﬁcation. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classiﬁcation results. 1

6 0.46768683 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

7 0.46711272 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

8 0.46708214 175 nips-2006-Simplifying Mixture Models through Function Approximation

9 0.46535212 121 nips-2006-Learning to be Bayesian without Supervision

10 0.46489996 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures

11 0.46361724 20 nips-2006-Active learning for misspecified generalized linear models

12 0.46347371 193 nips-2006-Tighter PAC-Bayes Bounds

13 0.46333399 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

14 0.4628365 159 nips-2006-Parameter Expanded Variational Bayesian Methods

15 0.46270967 117 nips-2006-Learning on Graph with Laplacian Regularization

16 0.46257064 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines

17 0.46148083 167 nips-2006-Recursive ICA

18 0.46122357 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

19 0.458893 169 nips-2006-Relational Learning with Gaussian Processes

20 0.45812345 35 nips-2006-Approximate inference using planar graph decomposition