nips nips2008 nips2008-225 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yoshihiro Yamanishi
Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 fr Abstract We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. [sent-3, score-0.441]
2 The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. [sent-4, score-0.713]
3 The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. [sent-5, score-0.049]
4 We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. [sent-6, score-0.308]
5 1 Introduction The problem of bipartite graph inference is to predict the presence or absence of edges between heterogeneous objects known to form the vertices of the bipartite graph, based on the observation about the heterogeneous objects. [sent-7, score-1.171]
6 This problem is becoming a challenging issue in bioinformatics and computational biology, because there are many biological networks which are represented by a bipartite graph structure with vertices being heterogeneous molecules and edges being interactions between them. [sent-8, score-0.752]
7 Especially, the prediction of compound-protein interaction networks is a key issue toward genomic drug discovery, because drug development depends heavily on the detection of interactions between ligand compounds and target proteins. [sent-10, score-0.908]
8 The human genome sequencing project has made available the sequences of a large number of human proteins, while the high-throughput screening of large-scale chemical compound libraries is enabling us to explore the chemical space of possible compounds[1]. [sent-11, score-0.389]
9 However, our knowledge about the such compound-protein interactions is very limited. [sent-12, score-0.107]
10 It is therefore important is to detect unknown compound-protein interactions in order to identify potentially useful compounds such as imaging probes and drug leads from huge amount of chemical and genomic data. [sent-13, score-0.711]
11 A major traditional method for predicting compound-protein interactions is docking simulation [2]. [sent-14, score-0.148]
12 However, docking simulation requires 3D structure information for the target proteins. [sent-15, score-0.071]
13 Most pharmaceutically useful target proteins are membrane proteins such as ion channels and G proteincoupled receptors (GPCRs). [sent-16, score-0.56]
14 There is therefore a strong incentive to develop new useful prediction methods based on protein sequences, chemical compound structures, and the available known compound-protein interaction information simultaneously. [sent-18, score-0.439]
15 Recently, several supervised methods for inferring a simple graph structure (e. [sent-19, score-0.154]
16 , protein network, enzyme network) have been developed in the framework of kernel methods [3, 4, 5]. [sent-21, score-0.169]
17 The corresponding algorithms of the previous methods are based on kernel canonical correlation analysis 1 [3], distance metric learning [4], and em-algorithm [5], respectively. [sent-22, score-0.167]
18 In contrast, in this paper we address the problem of supervised learning of the bipartite graph rather than the simple graph. [sent-24, score-0.363]
19 In this contribution, we develop a new supervised method for inferring the bipartite graph, borrowing the idea of distance metric learning used in the framework for inferring the simple graph [4]. [sent-25, score-0.416]
20 The proposed method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. [sent-26, score-0.713]
21 The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. [sent-27, score-0.049]
22 To our knowledge, there are no statistical methods to predict bipartite graphs from observed data in a supervised context. [sent-28, score-0.291]
23 In the results, we show the usefulness of the proposed method on the predictions of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. [sent-29, score-0.308]
24 Suppose that we are given an undirected bipartite graph G = (U + V, E), where U = (u1 , · · · , un1 ) and V = (v1 , · · · , vn2 ) are sets of heterogeneous vertices and E ⊂ (U × V ) ∪ (V × U ) is a set of edges. [sent-31, score-0.611]
25 The problem is, given additional sets of vertices ′ ′ U ′ = (u′ , · · · , u′ 1 ) and V ′ = (v1 , · · · , vm2 ), to infer a set of new edges E ′ ⊂ U ′ × (V + V ′ ) ∪ m 1 ′ ′ ′ ′ V × (U + U ) ∪ (U + U ) × V ∪ (V + V ′ ) × U ′ involving the additional vertices in U ′ and V ′ . [sent-33, score-0.264]
26 The prediction of compound-protein interaction networks is a typical problem which is suitable in this framework from a practical viewpoint. [sent-35, score-0.105]
27 In this case, U corresponds to a set of compounds (known ligands), V corresponds to a set of proteins (known targets), and E corresponds to a set of known compound-protein interactions (known ligand-target interactions). [sent-36, score-0.676]
28 U ′ corresponds to a set of additional compounds (new ligand candidates), V ′ corresponds to a set of additional proteins (new target candidates), and E ′ corresponds to a set of unknown compound-protein interactions (potential ligand-target interactions). [sent-37, score-0.826]
29 For example, compounds are represented by molecular structures and proteins are represented by amino acid sequences. [sent-40, score-0.684]
30 The question is how to predict unknown compound-protein interactions from compound structures and protein sequences using prior knowledge about known compound-protein interactions. [sent-41, score-0.383]
31 Sets of U and V (X and Y) are referred to as training sets, and heterogeneous objects are represented by u and v in the sense of vertices on the bipartite graph or by x and y in the sense of objects in the observed data below. [sent-42, score-0.787]
32 Similarly, we will assume that Y is a set endowed with a positive definite kernel kv , that is, a symmetric function n2 kv : Y 2 → R satisfying i,j=1 ai aj kv (yi , yj ) ≥ 0 for any n2 ∈ N, (a1 , a2 , · · · , an2 ) ∈ Rn2 and (y1 , y2 , · · · , yn2 ) ∈ Y. [sent-44, score-1.289]
33 For example, in the case of compounds and proteins, each x has a chemical graph structure and each y has a sequence structure, so the data structures completely differ between x and y. [sent-48, score-0.61]
34 Therefore, we make an assumption that n1 objects (x1 , · · · , xn1 ) and n2 objects (y1 , · · · , yn2 ) are implicitly embedded in a unified Euclidean space Rd , and a graph is inferred on those heterogeneous points by the nearest neighbor approach, i. [sent-49, score-0.508]
35 , putting an edge between heterogeneous points that are close to each other. [sent-51, score-0.198]
36 We propose the following two step procedure for the supervised bipartite graph inference: 1. [sent-52, score-0.363]
37 embed the heterogeneous objects into a unified Euclidean space representing the network topology of the bipartite graph, where connecting heterogeneous vertices are close to each other, through mappings f : X → Rd and g : Y → Rd 2. [sent-53, score-0.909]
38 apply the mappings f and g to X ′ and Y ′ respectively, and predict new edges between the heterogeneous objects if the distance between the points {f (x), x ∈ X ∪ X ′ } and {g(y), y ∈ Y ∪ Y ′ } is smaller than a fixed threshold δ. [sent-54, score-0.417]
39 While the second step in this procedure is fixed, the first step can be optimized by supervised learning of f and g using the known bipartite graph. [sent-55, score-0.26]
40 To do so, we require the mappings f and g to map adjacent heterogeneous vertices in the known bipartite graph onto nearby positions in a unified Euclidian space Rd , in order to ensure that the known bipartite graph can be recovered to some extent by the nearest neighbor approach. [sent-56, score-0.994]
41 (1) 2 (ui ,vj )∈U ×V (f (xi ) − g(yj )) A small value of R(f, g) ensures that connected heterogeneous vertices tend to be closer than disconnected heterogeneous vertices in the sense of quadratic error. [sent-60, score-0.627]
42 To represent the connectivity between heterogeneous vertices on the bipartite graph G = (U +V, E), we define a kind of the adjacency matrix Auv , where element (Auv )ij is equal to 1 (resp. [sent-61, score-0.611]
43 We also define a kind of the degree matrix of the heterogeneous vertices as Du and Dv , where diagonal elements (Du )ii and (Dv )jj are the degrees of vertices ui and vj (the numbers of edges involving vertices ui and vj ), respectively. [sent-65, score-0.809]
44 We assume that that f and g belong to the reproducing kernel Hilbert space (r. [sent-69, score-0.049]
45 ) HU and HV defined by the kernels ku on X and kv on Y, and to use the norms of f and g as regularization operators. [sent-73, score-0.85]
46 In order to obtain a d-dimensional feature representation of the objects, we propose to iterate the minimization of the regularized criterion (3) under orthogonality constraints in the r. [sent-89, score-0.074]
47 , that is, we recursively define the p-th features fp and gp for p = 1, · · · , d as follows: (fp , gp ) = arg min fU gV T −Auv Dv Du −AT uv fU gV fU gV T + λ1 ||f ||2 + λ2 ||g||2 (4) fU gV under the orthogonality constraints: f ⊥f1 , · · · , fp−1 , and g⊥g1 , · · · , gp−1 . [sent-93, score-0.272]
48 In the prediction process, we map any new objects x′ ∈ X ′ and y ′ ∈ Y ′ by the mappings f and g respectively, and predict new edges between the heterogeneous objects if the distance between the points {f (x), x ∈ X ∪ X ′ } and {g(y), y ∈ Y ∪ Y ′ } is smaller than a fixed threshold δ. [sent-94, score-0.53]
49 2 Algorithm Let ku and kv be the kernels on the sets X and Y, where the kernels are both centered in HU and HV . [sent-96, score-0.85]
50 , for any p = 1, · · · , d, the solution to equation (4) has the following expansions: n1 fp (x) = n2 αp,j ku (xj , x), gp (y) = j=1 βp,j kv (yj , y), (5) j=1 for some vector αp = (αp,1 , · · · , αp,n1 )T ∈ Rn1 and β p = (βp,1 , · · · , βp,n2 )T ∈ Rn2 . [sent-101, score-0.973]
51 Let Ku and Kv be the Gram matrices of the kernels ku and ku such that (Ku )ij = ku (xi , xj ), i, j = 1, · · · , n1 and (Kv )ij = kv (yi , yj ), i, j = 1, · · · , n2 . [sent-102, score-1.79]
52 The orthogonarity constraints fp ⊥fq and gp ⊥gq (p = q) can be written by αT Ku αq = 0 and β T Kv β q = 0. [sent-105, score-0.123]
53 Therefore, it is not possible to directly apply the above methods to the bipartite graph inference problem. [sent-110, score-0.312]
54 Here we attempt to consider an extension of the CA using the idea of kernel methods so that it can work in the context of the bipartite graph inference problem. [sent-115, score-0.361]
55 The method is referred to as kernel correspondence analysis (KCA) below. [sent-116, score-0.078]
56 To formulate the KCA, we propose to replace the embedding functions φ : U → R and ψ : V → R by functions f : X → R and g : Y → R, where f and g belong to the r. [sent-117, score-0.039]
57 HU and HV defined by the kernels ku on X and kv on Y. [sent-121, score-0.85]
58 In order to obtain a d-dimensional feature representation and deal with the scale issue, we propose to iterate the maximization of the regularized correlation coefficient (9) under orthogonality constraints in the r. [sent-126, score-0.083]
59 , that is, we recursively define the p-th features fp and gp for p = 1, · · · , d as (fp , gp ) = arg max corr(f, g) under the orthogonality constraints: f ⊥f1 , · · · , fp−1 and g⊥g1 , · · · , gp−1 and the normalization constraints: ||f || = ||g|| = 1. [sent-130, score-0.219]
60 =ρ The final form of KCA is similar to that of kernel canonical correlation analysis (KCCA) [12, 13], so KCA can be regarded as a variant of KCCA. [sent-134, score-0.114]
61 However, the critical differences between KCA and KCCA are as follows: i) the objects are the same across two different data in KCCA, while the objects are different across two different data in KCA, and ii) KCCA cannot deal with co-occurence information about the objects. [sent-135, score-0.176]
62 In the experiment below, we are interested in the performance comparison between the distance learning in DML and correlation maximization in KCA. [sent-136, score-0.065]
63 1 Experiment Data In this study we focus on compound-protein interaction networks made by four pharmaceutically useful protein classes: enzymes, ion channels, G protein-coupled receptors (GPCRs), and nuclear receptors. [sent-139, score-0.31]
64 The information about compound-protein interactions were obtained from the KEGG BRITE [14], SuperTarget [15] and DrugBank databases [16]. [sent-140, score-0.107]
65 The number of known interactions involving enzymes, ion channels, GPCRs, and nuclear receptors is 5449, 3020, 1124, and 199, respectively. [sent-141, score-0.245]
66 The number of proteins involving the interactions is 1062, 242, 134, and 38, respectively, and the number of compounds involving the interactions is 1123, 475, 417, and 115, respectively. [sent-142, score-0.839]
67 The compound set includes not only drugs but also experimentally confirmed ligand compounds. [sent-143, score-0.268]
68 These data are regarded as gold standard sets to evaluate the prediction performance below. [sent-144, score-0.051]
69 Chemical structures of the compounds and amino acid sequences of the human proteins were obtained from the KEGG database [14]. [sent-145, score-0.658]
70 We computed the sequence similarities between the proteins using Smith-Waterman scores based on the local alignment between two amino acid sequences [18]. [sent-147, score-0.297]
71 In this study we used the above similarity measures as kernel functions, but the Smith-Waterman scores are not always positive definite, so we added an appropriate identify matrix such that the corresponding kernel Gram matrix is positive definite, which is related with [19]. [sent-148, score-0.15]
72 All the kernel matrices are normalized such that all diagonals are ones. [sent-149, score-0.049]
73 2 Performance evaluation As a baseline method, we used the nearest neighbor (NN) method, because this idea has been used in traditional molecular screening in many public databases. [sent-151, score-0.085]
74 Given a new ligand candidate compound, we find a known ligand compound (in the training set) sharing the highest structure similarity with the new compound, and predict the new compound to interact with proteins known to interact with the nearest ligand compound. [sent-152, score-0.961]
75 Likewise, given a new target candidate protein, we find a known target protein (in the training set) sharing the highest sequence similarity with the new protein, and predict the new protein to interact with ligand compounds known to interact with the nearest target protein. [sent-153, score-0.898]
76 Newly predicted compound-protein interaction pairs are assigned prediction scores with the highest structure or sequence similarity values involving new compounds or new proteins in order to draw the ROC curve below. [sent-154, score-0.754]
77 We tested the three different methods: NN, KCA, and DML on their abilities to reconstruct the four compound-protein interaction networks. [sent-155, score-0.08]
78 ” indicates training compounds, training proteins, test compounds and test proteins, respectively. [sent-160, score-0.416]
79 We draw a receiver operating curve (ROC), the plot of true positives as a function of false positives based on various thresholds δ, where true positives are correctly predicted interactions and false positives are predicted interactions that are not present in the gold standard interactions. [sent-290, score-0.368]
80 The regularization parameter λ and the number of features d are optimized by applying the internal cross-validation within the training set with the AUC score as a target criterion in the case of KCA and DML. [sent-292, score-0.06]
81 Table 1 shows the resulting AUC scores for different sets of predictions depending on whether the compound and/or the protein were in the initial training set or not. [sent-294, score-0.24]
82 Compounds and proteins in the training set are called training compounds and proteins whereas those not in the training set are called test compounds and proteins. [sent-295, score-1.168]
83 Four different classes are then possible: i) test compounds vs training proteins, ii) training compounds vs test proteins, iii) test compounds vs test proteins, and iv) all the possible predictions (the average of the above three parts). [sent-296, score-1.581]
84 Comparing the three different methods, DML seems to have the best performance for all four types of compound-protein interaction networks, and outperform the other methods KCA and NN at a significant level. [sent-297, score-0.08]
85 The worst performance of NN implies that raw compound structure or protein sequence similarities do not always reflect the tendency of interaction partners in true compound-protein interaction networks. [sent-298, score-0.374]
86 Among the four prediction classes, predictions where neither the protein nor the compound are in the training set (iii) are weakest, but even then reliable predictions are possible in DML. [sent-299, score-0.239]
87 Note that the NN method cannot predict iii) test vs test interaction, because it depends on the template information about known ligand compounds and known target proteins. [sent-300, score-0.728]
88 These results suggest that the feature space learned by DML successfully represents the network topology of the bipartite graph structure of compound-protein networks, and the correlation maximization learning used in KCA is not enough to reflect the network topology of the bipartite graph. [sent-301, score-0.71]
89 6 Concluding remarks In this paper, we developed a new supervised method to infer the bipartite graph from the viewpoint of distance metric learning (DML). [sent-302, score-0.441]
90 The originality of the proposed method lies in the embedding of heterogeneous objects forming vertices on the bipartite graph into a unified Euclidian space and in the learning of the distance between heterogeneous objects with different data structures in the unified feature space. [sent-303, score-1.081]
91 We also discussed the relationship with correspondence analysis (CA) and kernel canonical correlation analysis (KCCA). [sent-304, score-0.143]
92 In the experiment, it is shown that the proposed method DML outperforms the other methods on the problem of compound-protein interaction network reconstruction from chemical structure and genomic sequence data. [sent-305, score-0.308]
93 From a practical viewpoint, the 7 proposed method is useful for virtual screening of a huge number of ligand candidate compounds being generated with various biological assays and target candidate proteins toward genomic drug discovery. [sent-306, score-0.875]
94 It should be also pointed out that the proposed method can be applied to other network prediction problems such as metabolic network reconstruction, host-pathogen protein-protein interaction prediction, and customer-product recommendation system as soon as they are represented by bipartite graphs. [sent-307, score-0.427]
95 A fast flexible docking method using an incremental construction algorithm. [sent-318, score-0.041]
96 Protein network inference from multiple genomic data: a supervised approach. [sent-325, score-0.159]
97 Selective integration of multiple biological data for supervised network inference. [sent-337, score-0.093]
98 From genomics to chemical genomics: new developments in kegg. [sent-398, score-0.12]
99 Drugbank: a knowledgebase for drugs, drug actions and drug targets. [sent-419, score-0.124]
100 Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. [sent-426, score-0.335]
wordName wordTfidf (topN-words)
[('ku', 0.45), ('kv', 0.4), ('compounds', 0.356), ('proteins', 0.213), ('bipartite', 0.209), ('heterogeneous', 0.198), ('auv', 0.164), ('kca', 0.151), ('gv', 0.143), ('fu', 0.136), ('vs', 0.131), ('dml', 0.123), ('compound', 0.121), ('ligand', 0.12), ('chemical', 0.12), ('interactions', 0.107), ('graph', 0.103), ('vertices', 0.101), ('protein', 0.093), ('objects', 0.088), ('interaction', 0.08), ('dv', 0.08), ('fp', 0.071), ('du', 0.07), ('hv', 0.068), ('genomic', 0.066), ('ui', 0.063), ('drug', 0.062), ('vj', 0.06), ('kcca', 0.055), ('ribute', 0.055), ('uv', 0.053), ('gp', 0.052), ('supervised', 0.051), ('kernel', 0.049), ('corr', 0.048), ('nn', 0.048), ('hu', 0.046), ('orthogonality', 0.044), ('network', 0.042), ('auc', 0.041), ('docking', 0.041), ('dui', 0.041), ('dvj', 0.041), ('enzymes', 0.041), ('gpcrs', 0.041), ('ion', 0.041), ('yj', 0.04), ('mappings', 0.04), ('correlation', 0.039), ('embedding', 0.039), ('train', 0.038), ('uni', 0.037), ('receptors', 0.036), ('eigenvalue', 0.035), ('edges', 0.034), ('euclidean', 0.034), ('iv', 0.034), ('topology', 0.033), ('nuclear', 0.033), ('vert', 0.033), ('positives', 0.032), ('iii', 0.032), ('structures', 0.031), ('nearest', 0.031), ('predict', 0.031), ('target', 0.03), ('criterion', 0.03), ('test', 0.03), ('nucleic', 0.029), ('acid', 0.029), ('disconnected', 0.029), ('amino', 0.029), ('metabolic', 0.029), ('interact', 0.029), ('correspondence', 0.029), ('screening', 0.028), ('involving', 0.028), ('addi', 0.027), ('drugbank', 0.027), ('drugs', 0.027), ('enzyme', 0.027), ('hattori', 0.027), ('mol', 0.027), ('onal', 0.027), ('pharmaceutically', 0.027), ('supertarget', 0.027), ('yamanishi', 0.027), ('roc', 0.027), ('metric', 0.027), ('acids', 0.027), ('similarity', 0.026), ('vertex', 0.026), ('canonical', 0.026), ('scores', 0.026), ('molecular', 0.026), ('gold', 0.026), ('distance', 0.026), ('viewpoint', 0.025), ('prediction', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 225 nips-2008-Supervised Bipartite Graph Inference
Author: Yoshihiro Yamanishi
Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1
2 0.13582766 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
3 0.099017948 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
4 0.081887446 134 nips-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: In many settings, such as protein interactions and gene regulatory networks, collections of author-recipient email, and social networks, the data consist of pairwise measurements, e.g., presence or absence of links between pairs of objects. Analyzing such data with probabilistic models requires non-standard assumptions, since the usual independence or exchangeability assumptions no longer hold. In this paper, we introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) with a local model to instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodel with applications to social networks and protein interaction networks. 1
5 0.078094356 84 nips-2008-Fast Prediction on a Tree
Author: Mark Herbster, Massimiliano Pontil, Sergio R. Galeano
Abstract: Given an n-vertex weighted tree with structural diameter S and a subset of m vertices, we present a technique to compute a corresponding m × m Gram matrix of the pseudoinverse of the graph Laplacian in O(n + m2 + mS) time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. We present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 vertices and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information. 1
6 0.078020878 171 nips-2008-Online Prediction on Large Diameter Graphs
7 0.073215716 194 nips-2008-Regularized Learning with Networks of Features
8 0.070777297 193 nips-2008-Regularized Co-Clustering with Dual Supervision
9 0.064132884 217 nips-2008-Sparsity of SVMs that use the epsilon-insensitive loss
10 0.060570613 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
11 0.059090141 5 nips-2008-A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning
12 0.055256635 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
13 0.050418921 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
14 0.050244041 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
15 0.049223498 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
16 0.04891365 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
17 0.048875667 107 nips-2008-Influence of graph construction on graph-based clustering measures
18 0.048703875 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
19 0.047761727 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
20 0.046798114 69 nips-2008-Efficient Exact Inference in Planar Ising Models
topicId topicWeight
[(0, -0.137), (1, -0.038), (2, -0.03), (3, 0.058), (4, 0.064), (5, -0.077), (6, 0.096), (7, -0.021), (8, 0.068), (9, -0.075), (10, 0.043), (11, -0.009), (12, -0.054), (13, -0.079), (14, 0.077), (15, 0.004), (16, -0.0), (17, -0.012), (18, -0.093), (19, 0.005), (20, -0.021), (21, -0.045), (22, 0.041), (23, -0.06), (24, 0.032), (25, 0.048), (26, -0.076), (27, -0.02), (28, 0.002), (29, -0.02), (30, -0.031), (31, 0.045), (32, -0.037), (33, 0.043), (34, 0.021), (35, 0.097), (36, 0.032), (37, 0.031), (38, -0.057), (39, 0.037), (40, 0.052), (41, 0.102), (42, -0.034), (43, -0.163), (44, -0.039), (45, 0.024), (46, -0.063), (47, -0.033), (48, 0.006), (49, -0.119)]
simIndex simValue paperId paperTitle
same-paper 1 0.92844957 225 nips-2008-Supervised Bipartite Graph Inference
Author: Yoshihiro Yamanishi
Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1
2 0.58827132 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
3 0.58514941 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
Author: Benjamin Yackley, Eduardo Corona, Terran Lane
Abstract: Many interesting problems, including Bayesian network structure-search, can be cast in terms of finding the optimum value of a function over the space of graphs. However, this function is often expensive to compute exactly. We here present a method derived from the study of Reproducing Kernel Hilbert Spaces which takes advantage of the regular structure of the space of all graphs on a fixed number of nodes to obtain approximations to the desired function quickly and with reasonable accuracy. We then test this method on both a small testing set and a real-world Bayesian network; the results suggest that not only is this method reasonably accurate, but that the BDe score itself varies quadratically over the space of all graphs. 1
4 0.55642396 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching
Author: Pavel P. Kuksa, Pai-hsi Huang, Vladimir Pavlovic
Abstract: We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times. 1
5 0.50499743 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
6 0.49440593 194 nips-2008-Regularized Learning with Networks of Features
7 0.46887958 84 nips-2008-Fast Prediction on a Tree
8 0.45432913 134 nips-2008-Mixed Membership Stochastic Blockmodels
9 0.44658092 113 nips-2008-Kernelized Sorting
10 0.43829647 69 nips-2008-Efficient Exact Inference in Planar Ising Models
11 0.43615144 193 nips-2008-Regularized Co-Clustering with Dual Supervision
12 0.41996396 212 nips-2008-Skill Characterization Based on Betweenness
13 0.41670099 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
14 0.40841374 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
15 0.39736548 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
16 0.3969909 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
17 0.39522004 248 nips-2008-Using matrices to model symbolic relationship
18 0.38750824 171 nips-2008-Online Prediction on Large Diameter Graphs
19 0.38720024 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
20 0.378153 104 nips-2008-Improved Moves for Truncated Convex Models
topicId topicWeight
[(6, 0.035), (7, 0.052), (12, 0.018), (15, 0.018), (28, 0.101), (57, 0.04), (59, 0.015), (63, 0.02), (77, 0.037), (83, 0.565)]
simIndex simValue paperId paperTitle
1 0.93102384 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis
Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1
2 0.92862701 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
same-paper 3 0.90516591 225 nips-2008-Supervised Bipartite Graph Inference
Author: Yoshihiro Yamanishi
Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1
4 0.88449836 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
5 0.87052381 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
Author: Takafumi Kanamori, Shohei Hido, Masashi Sugiyama
Abstract: We address the problem of estimating the ratio of two probability density functions (a.k.a. the importance). The importance values can be used for various succeeding tasks such as non-stationarity adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally very efficient and numerically stable. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bound. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. 1
6 0.74521166 32 nips-2008-Bayesian Kernel Shaping for Learning Control
7 0.60627359 95 nips-2008-Grouping Contours Via a Related Image
8 0.58281648 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
9 0.57427496 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification
10 0.56813741 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
11 0.5617134 194 nips-2008-Regularized Learning with Networks of Features
12 0.55618382 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
13 0.54831362 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
14 0.54596758 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
15 0.54346281 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
16 0.52890652 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
17 0.51840746 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
18 0.51803446 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
19 0.51475048 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
20 0.50644428 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features