nips nips2006 nips2006-183 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu
Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The key idea is to model the stochastic structure of entity relationships (i. [sent-2, score-0.413]
2 , links) via a tensor interaction of multiple GPs, each defined on one type of entities. [sent-4, score-0.145]
3 These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. [sent-5, score-0.235]
4 By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. [sent-6, score-0.915]
5 The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). [sent-7, score-0.588]
6 Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. [sent-9, score-0.24]
7 In the end we discuss extensions of SRM to general relational learning tasks. [sent-10, score-0.382]
8 1 Introduction Relational learning concerns the modeling of physical, social, or other phenomena, where rich types of entities interact via complex relational structures. [sent-11, score-0.605]
9 If compared to the traditional machine learning settings, the entity relationships provide additional structural information. [sent-12, score-0.378]
10 A simple example of a relational setting is the user-movie rating database, which contains user entities with user attributes (e. [sent-13, score-0.85]
11 , age, gender, education), movie entities with movie attributes (e. [sent-15, score-0.42]
12 A typical relational learning problem is entity classification, for example, segmenting users into groups. [sent-20, score-0.704]
13 One may apply usual clustering or classification methods based on a flat structure of data, where each user is associated with a vector of user attributes. [sent-21, score-0.127]
14 However a sound model should additionally explore the user-movie relations as well as even the movie attributes, since like-minded users tend to give similar ratings on the same movie, and may like (or dislike) movies with similar genres. [sent-22, score-0.233]
15 Entity classification in a relational setting has gained considerable attentions, like webpage classification using both textual contents and hyperlinks. [sent-24, score-0.362]
16 For example, one may want to predict protein-protein in- teractions, or in another application, user ratings for products. [sent-26, score-0.127]
17 The family of these problems has been called link prediction1 , which is the primary topic of this paper. [sent-27, score-0.188]
18 In general, link prediction includes link existence prediction (i. [sent-28, score-0.446]
19 In this paper we propose a family of stochastic relational models (SRM) for link prediction and other relational learning tasks. [sent-38, score-1.04]
20 The key idea of SRM is a stochastic link-wise process induced by a tensor interplay of multiple entity-wise Gaussian processes (GP). [sent-39, score-0.232]
21 These models in fact define a set of nonparametric priors on an infinite dimensional tensor matrix, where each element represents a relationship between a tuple of entities. [sent-40, score-0.235]
22 By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the learned entity-wise GP kernels (i. [sent-41, score-0.947]
23 We present various models of SRM and address the computational issue, which is crucial in link prediction because the number of potential links grows exponentially with the entity size. [sent-45, score-0.692]
24 2 Stochastic Relational Models We first consider pairwise asymmetric links r between entities u ∈ U and v ∈ V. [sent-57, score-0.403]
25 We use u or v to represent the attribute vectors of entities or their identity when entity attributes are unavailable. [sent-59, score-0.603]
26 Note that ri,n ≡ r(ui , vn ) does not have to be identical to rn,i when U = V, i. [sent-60, score-0.181]
27 Extensions to involve more than two entity sets, multi-way relations (i. [sent-63, score-0.302]
28 , links connecting more than two entities), and symmetric links are straightforward and will be briefly discussed in Sec. [sent-65, score-0.356]
29 We assume that the observable links r are derived as local measurements of a real-valued latent relational function t : U × V → R, and each link ri,n is solely dependent on its latent value ti,n , modeled by the likelihood p(ri,n |ti,n ). [sent-67, score-0.829]
30 The focus of this paper is a family of stochastic relational processes acting on U × V, the space of entity pairs, to generate the latent relational function t, via a tensor interaction of two independent entity-specific GPs, one acting on U and the other on V. [sent-68, score-1.362]
31 Let the relational processes be characterized by hyperparameters θ = {θΣ , θΩ }, θΣ for the GP kernel function on U and θΩ for the GP kernel function on V, a SRM thus defines a Bayesian prior p(t|θ) for the latent variables t. [sent-70, score-0.588]
32 Let I be the index set of entity pairs having observed links, the marginal likelihood (also called evidence) under such a prior is p(RI |θ) = p(ri,n |ti,n )p(t|θ)dt, θ = {θΣ , θΩ } (1) (i,n)∈I where RI = {ri,n }(i,n)∈I . [sent-71, score-0.323]
33 We estimate the hyperparameters θ by maximizing the evidence, which is an empirical Bayesian approach to learning the relational structure of data. [sent-72, score-0.424]
34 Once θ are determined, we can predict the link for a new pair of entities via marginalization over the a posteriori p(t|RI , θ). [sent-73, score-0.422]
35 1 In order to represent a rich class of link structures, p(t|θ) should be sufficiently expressive. [sent-75, score-0.16]
36 2 Hierarchical Gaussian Processes By observing the relational data collectively, one may notice that two entities ui and uj in U demonstrate correlated relationships to entities in V. [sent-87, score-1.205]
37 For example, two users often show opposite or close opinions on movies, or two hub web pages are co-linked by a set of other authority web pages. [sent-88, score-0.155]
38 In this case, the dependency structure of links can be reduced to a dependency structure of entities in U. [sent-89, score-0.659]
39 The model assumes that, for every v ∈ V, its relational function t(·, v) : U → R is an i. [sent-91, score-0.362]
40 Optimizing the GP kernel Σ via evidence maximization means to learn the dependency of entities in U, summarized over all the entities v ∈ V. [sent-96, score-0.612]
41 There is a drawback if applying HGP to link prediction. [sent-97, score-0.16]
42 In particular, the attributes of entities v cannot be incorporated even if their entity attributes are available. [sent-99, score-0.708]
43 However, for the mentioned applications, it also makes sense to explore the dependency between movies, or the dependency between authority web pages. [sent-100, score-0.274]
44 The model explains the relational dependency through the entity dependencies of both V and U. [sent-104, score-0.744]
45 Let θ = {θΣ , θΩ }, we describe a stochastic relational process p(t|θ) as the following: Definition 2. [sent-105, score-0.409]
46 Given two sets U and V, a collection of random variables {t(u, v)|t : U × V → R} follow a tensor Gaussian processes (TGP), if for every finite sets {u1 , . [sent-107, score-0.185]
47 As a key difference, the new model treats the relational function t as a whole sample from a TGP, instead of being formed by i. [sent-117, score-0.362]
48 In other words, TGP is in fact a GP for the relational function t, where the kernel function Υ : (U × V) × (U × V) → R is defined via a tensor product of two GP kernels Cov(ti,n , tj,m ) = Υ[(ui , vn ), (uj , vm )] = Σ(ui , uj )Ω(vn , vm ). [sent-132, score-1.179]
49 The model explains the dependence structure of links by the dependence structure of participating entities. [sent-133, score-0.26]
50 Let U ⊆ RP , V ⊆ RQ , and W ∼ NP ×Q (0, IP , IQ ), where IP denotes a P × P identity matrix and ·, · denotes the inner product, then the bilinear function t(u, v) = u Wv follows T GP(0, Σ, Ω) with Σ(ui , uj ) = ui , uj and Ω(vn , vm ) = vn , vm . [sent-138, score-0.924]
51 The proof is straightforward through Cov[t(ui , vn ), t(uj , vm )] = ui , uj vn , vm and E[t(ui , vn )] = 0, where E[·] means expectation. [sent-140, score-1.142]
52 In practice, the linear model will help to provide an efficient discriminative approach to link prediction. [sent-141, score-0.184]
53 It appears that link prediction using TGP is almost the same as a normal GP regression or classification, except that the hyperparameters θ now have two parts, θΣ for Σ and θΩ for Ω. [sent-142, score-0.247]
54 4 A Family of Stochastic Processes for Entity Relationships To improve the scalability of SRM, and also describe the relational dependency in a way similar to TGP, we propose a family of stochastic processes p(t|θ) for entity relationships. [sent-148, score-0.877]
55 A relational function t : U × V → R is said to iid d 1 follow a stochastic relational process (SRP), if t(u, v) = √d k=1 fk (u)gk (v), fk (u) ∼ GP(0, Σ), iid gk (v) ∼ GP(0, Ω). [sent-151, score-1.119]
56 Based on the central limit theory, for every (ui , vn ), ti,n ≡ t(ui , vn ) converges to a Gaussian random variable. [sent-157, score-0.362]
57 In the next steps, we prove E[ti,n ] = 0 and Cov(ti,n , tj,m ) = d d 1 E[ti,n tj,m ] = d { k=1 E[fk (ui )fk (uj )gk (vn )gk (vm )] + k=κ E[fk (ui )fκ (uj )gk (vn )gκ (vm )]} = 1 d d k=1 E[fk (ui )fk (uj )gk (vn )gk (vm )] = Σ(ui , uj )Ω(vn , vm ). [sent-158, score-0.284]
58 SRP is a general family of priors for modeling the relationships between entities, in which HGP and TGP are special cases. [sent-161, score-0.129]
59 The generalization offers several advantages: (1) SRP can model symmetric links between the same set of entities. [sent-162, score-0.178]
60 If we build a generative process where U = V, Σ = Ω and fk = gk , then on every finite sets {ui }N , T = {t(ui , uj )} is always a symmetric matrix; (2) Given i=1 a fixed d, the inference with SRP obtains a much reduced complexity. [sent-163, score-0.444]
61 2 Choices for the Likelihood p(ri,n |ti,n ) The likelihood term describes the dependency of observable relationships on the latent variables. [sent-167, score-0.261]
62 For example, one may want to predict the rating of user u for movie v. [sent-176, score-0.168]
63 • One-class Problem: Sometimes one observed the presence of links between entities, for example, the hyperlinks between web pages. [sent-180, score-0.208]
64 Based on the open-world assumption, if a web page does not link to another, it does not mean that they are unrelated. [sent-181, score-0.19]
65 Therefore, we employ the likelihood p(ri,n |ti,n ) = Φ(ri,n ti,n − b) for those observed links ri,n = 1, where b is an offset. [sent-182, score-0.205]
66 3 Inference with Laplacian Approximation We have described the relational model under a prior of SRP, in which HGP and TGP are subcases. [sent-183, score-0.385]
67 , gM,k ] , where fi,k = fk (ui ), gn,k = gk (vn ). [sent-191, score-0.277]
68 We factorize the approximating distribution as q(F, G|β) = d ∗ ∗ ∗ ∗ k=1 q(f k |f k , Σk )q(gk |gk , Ωk ), where f k and gk are the solution from Eq. [sent-201, score-0.206]
69 Finally we note that, the posterior distribution of {F, G} has many modes (Simply, shuffling the order of latent dimensions or changing the signs of both f k and gk does not change the probability. [sent-206, score-0.257]
70 However each mode is equally well in constructing the relational function t. [sent-208, score-0.391]
71 In principal, it allows to learn some parametric forms of kernel functions Σ(ui , uj ; θΣ ) and Ω(vn , vm ; θΩ ) that are generalizable to new entities. [sent-210, score-0.319]
72 In this paper we particularly consider an situation where entity attributes are not fully informative or even absent. [sent-211, score-0.429]
73 Due to the conjugacy of the hyper prior, the maximization have an analytical solution, Σ= 5 λΣ0 + 1 d d ∗ ∗ k=1 (f k f k λ+1 + Σk ) , Ω= λΩ0 + 1 d d ∗ ∗ k=1 (gk gk λ+1 + Ωk ) . [sent-217, score-0.239]
74 We use a and η to penalize the effects of γ(·, ·) and ξ(·, ·), respectively, when entity attributes are deficient. [sent-220, score-0.378]
75 6 Related Work There is a history of probabilistic relational models (PRM) [8] in machine learning. [sent-229, score-0.38]
76 [5] introduced link uncertainty and defined a generative model for both entity attributes and links. [sent-231, score-0.538]
77 Recently, [12] and [7] independently introduced an infinite (hidden) relational model to avoid the difficulty of structural learning in PRM by explaining links via a potentially infinite number of hidden states of entities. [sent-232, score-0.589]
78 proposed relational Markov networks (RMNs) for link prediction [11], by describing a conditional distribution of links given entity attributes and other links. [sent-234, score-1.141]
79 RMN has to define a class of potential functions on cliques of random variables based on the observed relational structure. [sent-235, score-0.38]
80 In another work [1] a SVM using a tensor kernel based on user and item attributes was used to predict user ratings on items, which is similar to our TGP case and suffers a salability problem. [sent-240, score-0.466]
81 When attributes are deficient or unavailable, the model does not work well, while SRM can learn informative kernels purely from only links (see Sec. [sent-241, score-0.366]
82 906); (c) prediction of SRM with noninformative prior (classification rate 0. [sent-247, score-0.137]
83 942); (d) prediction of SRM with informative prior (classification rate 0. [sent-248, score-0.137]
84 965); (e-f) informative Σ0 and Ω0 ; (g-h) learned Σ and Ω with noninformative prior; (i-j) learned Σ and Ω with informative prior. [sent-249, score-0.153]
85 structural learning by adapting the kernels and marginalizing out the latent relational function, while MMMF only estimates the mode of the latent relational function with fixed Dirac kernels. [sent-250, score-0.918]
86 7 Experiments Synthetic Data: We generated two sets of entities U = {ui }20 and V = {vn }30 on a real line n=1 i=1 such that ui = 0. [sent-251, score-0.4]
87 The positions of entities were used to compute two RBF kernels that serve as informative Σ0 and Ω0 . [sent-254, score-0.308]
88 Binary links between U and V are obtained by taking the sign of a function, which is a sample from T GP(0, Σ, Ω). [sent-257, score-0.178]
89 We randomly withdrew 50% of links for training, and left the remaining for test (see Fig. [sent-258, score-0.208]
90 1, the block structures of learned kernels indicate that both SRMs can learn the cluster structure of entities from links. [sent-268, score-0.276]
91 8 Conclusions and Future Extensions In this paper we proposed a stochastic relational model (SRM) for learning relational data. [sent-283, score-0.771]
92 Entity relationships are modeled by a tensor interaction of multiple Gaussian processes (GPs). [sent-284, score-0.259]
93 We proposed a family of relational processes and showed its convergence to a tensor Gaussian process if the degrees of freedom goes to infinity. [sent-285, score-0.575]
94 The process imposes an effective prior on the entity relationships, Table 1: User-movie rating prediction error measured by RMSE Repeats MMMF SRM 1 1. [sent-286, score-0.409]
95 005 and leads to a discriminative link prediction model. [sent-336, score-0.247]
96 Though the current work focused on the application of link prediction, the model can be used for general relational learning purposes. [sent-338, score-0.522]
97 There are several directions to extend the current model: (1) SRM can describe a joint distribution of entity links and entity classes conditioned on entity-wise GP kernels. [sent-339, score-0.724]
98 In a recent work [2] a tensor GP for multi-task learning was independently suggested; (5) Finally, it is extremely important to make the algorithm scalable to very large relational data, like the Netflix problem, containing about 480,000 users and 17,000 movies. [sent-344, score-0.558]
99 Probabilistic models of text and link structure for hypertext classification. [sent-383, score-0.197]
100 Learning systems of concepts with an infinite relational model. [sent-398, score-0.362]
wordName wordTfidf (topN-words)
[('srm', 0.449), ('relational', 0.362), ('gp', 0.279), ('entity', 0.273), ('entities', 0.225), ('gk', 0.206), ('tgp', 0.192), ('vn', 0.181), ('links', 0.178), ('ui', 0.175), ('mmmf', 0.162), ('link', 0.16), ('uj', 0.144), ('vm', 0.14), ('srp', 0.133), ('tensor', 0.127), ('dependency', 0.109), ('attributes', 0.105), ('hgp', 0.103), ('relationships', 0.074), ('fk', 0.071), ('users', 0.069), ('prediction', 0.063), ('etr', 0.059), ('processes', 0.058), ('ri', 0.056), ('gps', 0.055), ('user', 0.054), ('ratings', 0.054), ('informative', 0.051), ('noninformative', 0.051), ('latent', 0.051), ('rating', 0.05), ('stochastic', 0.047), ('movie', 0.045), ('participating', 0.044), ('dirac', 0.036), ('movies', 0.036), ('kernel', 0.035), ('hyper', 0.033), ('rmse', 0.033), ('repeats', 0.033), ('kernels', 0.032), ('structural', 0.031), ('tresp', 0.031), ('web', 0.03), ('eachmovie', 0.03), ('messaged', 0.03), ('srms', 0.03), ('withdrew', 0.03), ('relations', 0.029), ('mode', 0.029), ('cov', 0.028), ('collaborative', 0.028), ('family', 0.028), ('likelihood', 0.027), ('priors', 0.027), ('classi', 0.026), ('yu', 0.026), ('exchanged', 0.026), ('unavailable', 0.026), ('authority', 0.026), ('taskar', 0.025), ('hyperparameters', 0.024), ('discriminative', 0.024), ('prm', 0.023), ('iw', 0.023), ('mae', 0.023), ('rmn', 0.023), ('relationship', 0.023), ('prior', 0.023), ('xu', 0.023), ('nonparametric', 0.023), ('inference', 0.023), ('covariance', 0.022), ('chu', 0.022), ('wei', 0.022), ('gaussian', 0.021), ('vec', 0.021), ('getoor', 0.021), ('extensions', 0.02), ('nite', 0.02), ('predict', 0.019), ('maximizing', 0.019), ('structure', 0.019), ('predictive', 0.019), ('marginalized', 0.019), ('det', 0.019), ('ip', 0.019), ('laplacian', 0.019), ('acting', 0.018), ('cliques', 0.018), ('item', 0.018), ('models', 0.018), ('via', 0.018), ('proceedings', 0.017), ('conference', 0.017), ('eq', 0.017), ('encouraging', 0.017), ('tuple', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu
Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1
2 0.13577385 169 nips-2006-Relational Learning with Gaussian Processes
Author: Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S. S. Keerthi
Abstract: Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. 1
3 0.12896444 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors
Author: Mark Girolami, Mingjun Zhong
Abstract: By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classification setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classifier combination. 1
4 0.10292853 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
5 0.099010713 138 nips-2006-Multi-Task Feature Learning
Author: Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil
Abstract: We present a method for learning a low-dimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacross-tasks representations and in the former step we learn task-specific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks. 1
6 0.09759213 115 nips-2006-Learning annotated hierarchies from relational data
7 0.096859001 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
8 0.088655733 166 nips-2006-Recursive Attribute Factoring
9 0.08758796 65 nips-2006-Denoising and Dimension Reduction in Feature Space
10 0.084866092 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
11 0.077242635 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
12 0.059899066 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
13 0.059471179 163 nips-2006-Prediction on a Graph with a Perceptron
14 0.057797275 132 nips-2006-Modeling Dyadic Data with Binary Latent Factors
15 0.056796238 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
16 0.053586598 185 nips-2006-Subordinate class recognition using relational object models
17 0.053041078 173 nips-2006-Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds
18 0.050548974 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
19 0.05052587 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
20 0.048134513 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
topicId topicWeight
[(0, -0.153), (1, 0.048), (2, 0.011), (3, -0.032), (4, 0.035), (5, 0.066), (6, 0.083), (7, 0.036), (8, -0.006), (9, -0.065), (10, -0.064), (11, -0.017), (12, 0.05), (13, -0.13), (14, -0.091), (15, 0.038), (16, 0.049), (17, -0.039), (18, -0.12), (19, -0.015), (20, -0.141), (21, -0.046), (22, 0.19), (23, 0.131), (24, 0.065), (25, 0.033), (26, -0.072), (27, -0.013), (28, 0.231), (29, 0.13), (30, -0.007), (31, 0.125), (32, 0.016), (33, 0.136), (34, -0.065), (35, -0.005), (36, -0.034), (37, -0.092), (38, 0.26), (39, 0.131), (40, 0.194), (41, 0.02), (42, 0.037), (43, -0.021), (44, 0.089), (45, 0.167), (46, 0.039), (47, 0.034), (48, -0.018), (49, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.9524073 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu
Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1
2 0.58421999 169 nips-2006-Relational Learning with Gaussian Processes
Author: Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S. S. Keerthi
Abstract: Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. 1
3 0.48850474 138 nips-2006-Multi-Task Feature Learning
Author: Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil
Abstract: We present a method for learning a low-dimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacross-tasks representations and in the former step we learn task-specific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks. 1
4 0.43768665 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
5 0.4305926 166 nips-2006-Recursive Attribute Factoring
Author: Deepak Verma, Karl Pfleger, David Tax
Abstract: Clustering, or factoring of a document collection attempts to “explain” each observed document in terms of one or a small number of inferred prototypes. Prior work demonstrated that when links exist between documents in the corpus (as is the case with a collection of web pages or scientific papers), building a joint model of document contents and connections produces a better model than that built from contents or connections alone. Many problems arise when trying to apply these joint models to corpus at the scale of the World Wide Web, however; one of these is that the sheer overhead of representing a feature space on the order of billions of dimensions becomes impractical. We address this problem with a simple representational shift inspired by probabilistic relational models: instead of representing document linkage in terms of the identities of linking documents, we represent it by the explicit and inferred attributes of the linking documents. Several surprising results come with this shift: in addition to being computationally more tractable, the new model produces factors that more cleanly decompose the document collection. We discuss several variations on this model and show how some can be seen as exact generalizations of the PageRank algorithm. 1
6 0.39336994 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
7 0.39113835 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors
8 0.34878385 115 nips-2006-Learning annotated hierarchies from relational data
9 0.28377351 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
10 0.26993507 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
11 0.26938346 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
12 0.2658498 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
13 0.2540651 40 nips-2006-Bayesian Detection of Infrequent Differences in Sets of Time Series with Shared Structure
14 0.25243661 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
15 0.2480295 185 nips-2006-Subordinate class recognition using relational object models
16 0.24612588 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
17 0.24565691 6 nips-2006-A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems
18 0.24172749 5 nips-2006-A Kernel Method for the Two-Sample-Problem
19 0.23171006 173 nips-2006-Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds
20 0.23015027 113 nips-2006-Learning Structural Equation Models for fMRI
topicId topicWeight
[(1, 0.097), (3, 0.024), (7, 0.067), (9, 0.05), (20, 0.036), (22, 0.053), (44, 0.089), (54, 0.03), (57, 0.049), (65, 0.026), (69, 0.026), (71, 0.016), (72, 0.014), (73, 0.319)]
simIndex simValue paperId paperTitle
same-paper 1 0.75858575 183 nips-2006-Stochastic Relational Models for Discriminative Link Prediction
Author: Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, Zhao Xu
Abstract: We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks. 1
2 0.48245776 100 nips-2006-Information Bottleneck for Non Co-Occurrence Data
Author: Yevgeny Seldin, Noam Slonim, Naftali Tishby
Abstract: We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X, Y , but rather as a sample of values of an unknown (stochastic) function Z(X, Y ). For example, in gene expression data, the expression level Z is a function of gene X and condition Y ; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results. 1
3 0.47137368 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf
Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1
4 0.47098461 109 nips-2006-Learnability and the doubling dimension
Author: Yi Li, Philip M. Long
Abstract: Given a set of classifiers and a probability distribution over their domain, one can define a metric by taking the distance between a pair of classifiers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor. We prove a bound that holds for any algorithm that outputs a classifier with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of , and strengthens the best known bound in terms of the VC-dimension alone. We show that there is no bound on the doubling dimension in terms of the VC-dimension of (in contrast with the metric dimension).
5 0.47044426 65 nips-2006-Denoising and Dimension Reduction in Feature Space
Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: We show that the relevant information about a classification problem in feature space is contained up to negligible error in a finite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efficient implicit representations of the data to perform classification. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classification. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classification results. 1
6 0.46768683 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
7 0.46711272 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
8 0.46708214 175 nips-2006-Simplifying Mixture Models through Function Approximation
9 0.46535212 121 nips-2006-Learning to be Bayesian without Supervision
10 0.46489996 19 nips-2006-Accelerated Variational Dirichlet Process Mixtures
11 0.46361724 20 nips-2006-Active learning for misspecified generalized linear models
12 0.46347371 193 nips-2006-Tighter PAC-Bayes Bounds
13 0.46333399 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
14 0.4628365 159 nips-2006-Parameter Expanded Variational Bayesian Methods
15 0.46270967 117 nips-2006-Learning on Graph with Laplacian Regularization
16 0.46257064 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
17 0.46148083 167 nips-2006-Recursive ICA
18 0.46122357 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
19 0.458893 169 nips-2006-Relational Learning with Gaussian Processes
20 0.45812345 35 nips-2006-Approximate inference using planar graph decomposition