nips nips2004 nips2004-177 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Supervised graph inference Jean-Philippe Vert Centre de G´ ostatistique e Ecole des Mines de Paris 35 rue Saint-Honor´ e 77300 Fontainebleau, France Jean-Philippe. [sent-1, score-0.25]
2 jp Abstract We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. [sent-6, score-0.521]
3 The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. [sent-7, score-0.513]
4 We report encouraging results on the problem of metabolic network reconstruction from genomic data. [sent-8, score-0.414]
5 1 Introduction The problem of graph inference, or graph reconstruction, is to predict the presence or absence of edges between a set of points known to form the vertices of a graph, the prediction being based on observations about the points. [sent-9, score-0.699]
6 Various approaches have been proposed to solve the network inference problem. [sent-12, score-0.167]
7 Bayesian [2] or Petri networks [4] are popular frameworks to model the gene regulatory or the metabolic network, and include methods to infer the network from data such as gene expression of metabolite concentrations [2]. [sent-13, score-0.95]
8 In other cases, such as inferring protein interactions from gene sequences or gene expression, these models are less relevant and more direct approaches involving the prediction of edges between “similar” nodes have been tested [5, 6]. [sent-14, score-0.905]
9 The actual situations we are confronted with, however, can often be expressed in a supervised framework: besides the data about the vertices, part of the network is already known. [sent-16, score-0.16]
10 This is obviously the case with all network examples discussed above, and the real challenge is to denoise the observed subgraph, if errors are assumed to be present, and to infer new edges involving in particular nodes outside of the observed subgraph. [sent-17, score-0.43]
11 In order to clarify this point, let us take the example of an actual network inference problem that we treat in the experiment below: the inference of the metabolic network from various genomic data. [sent-18, score-0.609]
12 The metabolic network is a graph of genes that involves only a subset of all the genes of an organisms, known as enzymes. [sent-19, score-0.693]
13 Enzymes can catalyze chemical reaction, and an edge between two enzymes indicates that they can catalyze two successive reactions. [sent-20, score-0.386]
14 For most organisms, this graph is partially known, because many enzymes have already been characterized. [sent-21, score-0.306]
15 However many enzymes are also missing, and the problem is to detect uncharacterized enzymes and place them in their correct location in the metabolic network. [sent-22, score-0.462]
16 Mathematically speaking, this means adding new edges involving new points, and eventually modifying edges in the known graph to remove mistakes from our current knowledge. [sent-23, score-0.526]
17 In this contribution we propose an algorithm for supervised graph inference, i. [sent-24, score-0.243]
18 , to infer a graph from observations about the vertices and from the knowledge of part of the graph. [sent-26, score-0.441]
19 Several attempts have already been made to formalize the network inference problem as a supervised machine learning problem [1, 7], but these attempts consist in predicting each edge independently from each others using algorithms for supervised classification. [sent-27, score-0.408]
20 We propose below a radically different setting, where the known subgraph is used to extract a new representation for the vertices, as points in a vector space, where the structure of the graph is easier to infer than from the original observations. [sent-28, score-0.377]
21 The edge inference engine in the vector space is very simple (edges are inferred between nodes with similar representations), and the learning step is limited to the construction of the mapping of the nodes onto the vector space. [sent-29, score-0.342]
22 2 The supervised graph inference problem Let us formally define the supervised graph inference problem. [sent-30, score-0.682]
23 We suppose an undirected simple graph G = (V, E) is given, where V = (v1 , . [sent-31, score-0.152]
24 , vn ) ∈ V n is a set of vertices and E ⊂ V × V is a set of edges. [sent-34, score-0.218]
25 The problem is, given an additional set of vertices V = (v1 , . [sent-35, score-0.184]
26 , vm ) ∈ V m , to infer a set of edges E ⊂ V × (V ∪ V ) ∪ (V ∪ V ) × V involving the nodes in V . [sent-38, score-0.361]
27 In many situations of interest, in particular gene networks, the additional nodes V might be known in advance, but we do not make this assumption here to ensure a level of generality as large as possible. [sent-39, score-0.28]
28 For the applications we have in mind, the vertices can be represented in V by a variety of data types, including but not limited to biological sequences, molecular structures, expression profiles or metabolite concentrations. [sent-40, score-0.32]
29 In order to allow this diversity and take advantage of recent works on positive definite kernels on general sets [8], we will assume that V is a set endowed with a positive definite kernel k, p that is, a symmetric function k : V 2 → R satisfying i,j=1 ai aj k(xi , xj ) ≥ 0 for any p ∈ N, (a1 , . [sent-41, score-0.129]
30 3 From distance learning to graph inference Suppose first that a graph must be inferred on p points (x1 , . [sent-48, score-0.479]
31 Then the simplest strategy to predict edges between the points is to put an edge between vertices that are at a distance from each other smaller than a fixed threshold δ. [sent-52, score-0.428]
32 More or less edges can be inferred by varying the threshold. [sent-53, score-0.168]
33 We now propose to cast the supervised graph inference problem in a two step procedure: • map the original points to a Euclidean space through a mapping f : V → Rd ; • apply the direct strategy to infer the network on the points {f (v), v ∈ V ∪ V } . [sent-55, score-0.719]
34 While the second part of this procedure is fixed, the first part can be optimized by supervised learning of f using the known network. [sent-56, score-0.119]
35 To do so we require the mapping f to map adjacent vertices in the known graph to nearby positions in Rd , in order to ensure that the known graph can be recovered to some extent by the direct strategy. [sent-57, score-0.627]
36 Given a function f : V → R, a possible criterion to assess whether connected (resp. [sent-59, score-0.05]
37 dissimilar) points in R is the following: 2 R(f ) = (u,v)∈E (f (u) − f (v)) − 2 (u,v)∈E (f (u) − f (v)) 2 (u,v)∈V 2 (f (u) − f (v)) . [sent-61, score-0.046]
38 (1) A small value of R(f ) ensures that connected vertices tend to be closer than disconnected vertices (in a quadratic error sense). [sent-62, score-0.464]
39 Observe that the numerator ensures an invariance of R(f ) with respect to a scaling of f by a constant, which is consistent with the fact that the direct strategy itself is invariant with respect to scaling of the points. [sent-63, score-0.15]
40 , f (vn )) ∈ Rn the values taken by f on the training set, and by L the combinatorial Laplacian of the graph G, i. [sent-67, score-0.182]
41 0) if i = j and vertices vi and vj are connected (resp. [sent-70, score-0.184]
42 If we restrict fV to have zero mean ( v∈V f (v) = 0), then the criterion (1) can be rewritten as follows: R(f ) = 4 fV LfV − 2. [sent-72, score-0.05]
43 fV fV The obvious minimum of R(f ) under the constraint v∈V f (v) = 0 is reached for any function f such that fV is equal to the second largest eigenvector of L (the largest eigenvector of L begin the constant vector). [sent-73, score-0.122]
44 However, this only defines the values of f on the points V , but leaves indeterminacy on the values of f outside of V . [sent-74, score-0.046]
45 Moreover, any arbitrary choice of f under a single constraint on fV is likely to be a mapping that overfits the known graph at the expense of the capacity to infer the unknown edges. [sent-75, score-0.333]
46 To overcome both issues, we propose to regularize the criterion (1), by a smoothness functional on f , a classical approach in statistical learning [10, 11]. [sent-76, score-0.05]
47 A convenient setting is to assume that f belongs to the reproducing kernel Hilbert space (r. [sent-77, score-0.129]
48 ) H defined by the kernel k on V, and to use the norm of f in the r. [sent-81, score-0.158]
49 The regularized criterion to be minimized becomes: min f ∈H0 fV LfV + λ||f ||2 H fV fV , (2) where H0 = {f ∈ H : v∈V f (v) = 0} is the subset of H orthogonal to the function x → v∈V k(x, v) in H and λ is a regularization parameter. [sent-86, score-0.173]
50 The regularization parameter controls the trade-off between minimizing the original criterion (1) and ensuring that the solution has a small norm in the r. [sent-88, score-0.234]
51 When λ varies, the solution to (2) varies between to extremes: • When λ is small, fV tends to the second largest eigenvector of the Laplacian L. [sent-92, score-0.093]
52 The regularization ensures that f is well defined as a function of V → R, but f is likely to overfit the known graph. [sent-93, score-0.181]
53 • When λ is large, the solution to (2) converges to the first kernel principal component (up to a scaling) [13], whatever the graph. [sent-94, score-0.161]
54 Before showing how (2) is solved in practice, we must complete the picture by explaining how the mapping f : V → Rd is obtained. [sent-96, score-0.048]
55 First note that the criterion in (2) is defined up to a scaling of the functions, and the solution is therefore a direction in the r. [sent-97, score-0.11]
56 In order to extract a function, an additional constraint must be set, such that imposing the norm ||f ||HV = 1, or imposing v∈V f (v)2 = 1. [sent-101, score-0.101]
57 The first solution correspond to an orthogonal projection onto the direction selected in the r. [sent-102, score-0.071]
58 (which would for example give the same result as kernel PCA for large λ), while the second solution would provide a sphering of the data. [sent-106, score-0.161]
59 , that is, we recursively define the i-th feature fi for i = 1, . [sent-113, score-0.054]
60 , d by: fi = fV LfV + λ||f ||2 H fV fV arg min f ∈H0 ,f ⊥{f1 ,. [sent-116, score-0.054]
61 (3) Implementation Let kV be the kernel obtained by centering k on the set V , i. [sent-120, score-0.129]
62 , d, the solution to (3) has an expansion of the form: n fi (x) = αi,j kV (xj , x), j=1 for some vector αi = (αi,1 , . [sent-135, score-0.086]
63 The corresponding vector fi,V can be written in terms of αi by fi,V = KV αi , where KV is the Gram matrix of the kernel kV on the set V , i. [sent-139, score-0.129]
64 KV is obtained from the Gram matrix K of the original kernel k by the classical formula KV = (I − U )K(I − U ), I being the n × n identity matrix and U being the constant n × n matrix [U ]i,j = 1/n for i, j = 1, . [sent-145, score-0.129]
65 Besides, the norm in HV is equal to ||fi ||2 V = αi KV αi , and the H orthogonality constraint between fi and fj in HV translates into αi KV αj = 0. [sent-149, score-0.118]
66 (4) Taking the differential of (4) with respect to α to 0 we see that the first vector α1 must solve the following generalized eigenvector problem with the smallest (non-negative) generalized eigenvalue: 2 (KV LKV + λKV ) α = µKV α. [sent-154, score-0.137]
67 Hence any solution of (5) differs from a solution of (6) by such an , which however does not change the corresponding function f ∈ HV . [sent-156, score-0.064]
68 K being positive semidefinite, the other generalized eigenvectors of (6) are conjugate with respect to KV , so it can easily be checked that the d vectors α1 , . [sent-158, score-0.108]
69 , αd solving (4) are in fact the d smallest generalized eigenvectors or (6). [sent-161, score-0.076]
70 In practice, for large n, the generalized eigenvector problem (6) can be solved by first performing an incomplete Cholesky decomposition of KV , see e. [sent-162, score-0.099]
71 cerevisiae, the network corresponding to our current knowledge of the network was extracted from the KEGG database [18]. [sent-168, score-0.138]
72 The resulting network contains 769 vertices and 7404 edges. [sent-169, score-0.253]
73 In order to infer it, various independent data about the genes can be used. [sent-170, score-0.25]
74 In each case a Gaussian RBF kernel was used to represent the data as a kernel matrix. [sent-172, score-0.258]
75 Additionally, we considered a fourth kernel obtained by summing the first three kernels. [sent-174, score-0.129]
76 For each random split of the set of genes into 80% (training set) and 20% (test set), the features are learned from the subgraph with genes from the training set as vertices. [sent-178, score-0.408]
77 The edges involving genes in the test set are then predicted among all possible interactions involving the test set. [sent-179, score-0.718]
78 The performance of the inference is estimated in term of ROC curves (the plot of the percentage of actual edges predicted as a function of the number of edges predicted although they are not present), and in terms of the area under the ROC curve normalized between 0 and 1. [sent-180, score-0.43]
79 Notice that the set of possible interactions to be predicted is made of interactions between two genes in the test set, on the one hand, and between one gene in the test set and one gene in the training set, on the other hand. [sent-181, score-0.956]
80 As it might be more challenging to infer an edge in the former case, we compute two performances: first on the edges involving two nodes in the test set, and second on the edges involving at least one vertex in the test set. [sent-182, score-0.786]
81 The algorithm contains 2 free parameters: the number d of features to be kept, and the regularization parameter λ that prevents from overfitting the known graph. [sent-183, score-0.193]
82 Figure 1 displays the performance in terms of ROC index for the graph inference with the integrated kernel, for different values of d and λ. [sent-188, score-0.37]
83 On the training set, it can be seen that the effect of increasing λ constantly decreases the performance of the graph reconstruction, which is natural since smaller values of λ are expected to overfit the training graph. [sent-189, score-0.212]
84 These results however justify that the criterion (1), although not directly related to the ROC index of the graph reconstruction procedure, is a useful criterion to be optimized. [sent-190, score-0.386]
85 As an example, for very small values of λ, the ROC index on the training set is above 96%. [sent-191, score-0.094]
86 (train + test)” show that the first one is indeed more difficult that the latter one, where some form of overfitting is likely to occur (in the mapping of the vertices in the training set). [sent-197, score-0.262]
87 Finally, and more interestingly, the inference with the integrated kernel outperforms the inference with each individual kernel. [sent-205, score-0.381]
88 A bayesian networks approach for predicting protein-protein 0. [sent-220, score-0.079]
89 5 8 −4 (a) Expression kernel 8 (b) Localization kernel) 1 0. [sent-230, score-0.129]
90 9 ROC index 1 ROC index −2 0 2 4 6 Regularization parameter (log2) 0. [sent-232, score-0.128]
91 5 8 −4 (c) Phylogenetic kernel −2 0 2 4 6 Regularization parameter (log2) 8 (d) Integrated kernel Figure 2: ROC scores for different regularization parameters when 20 features are selected. [sent-240, score-0.423]
92 In each picture, the dashed blue line, dashdot red line and continuous black line correspond respectively to the ROC index on the training vs training set, the test vs (training + test) set, and the test vs test set. [sent-242, score-0.625]
93 100 80 80 True Positives (%) True Positives (%) 100 60 40 20 0 0 20 40 60 False positives (%) Kexp Kloc Kphy Kint Krand 80 (a) Test vs. [sent-243, score-0.056]
94 (train+test) 100 60 40 Kexp Kloc Kphy Kint Krand 20 0 0 20 40 60 80 False positives (%) (b) Test vs. [sent-244, score-0.056]
95 Prediction of higher order functional networks from genomic data. [sent-257, score-0.173]
96 Detecting protein function and protein-protein interactions from genome sequences. [sent-277, score-0.247]
97 Graph-driven features extraction from microarray data using diffusion kernels and kernel CCA. [sent-363, score-0.203]
98 Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. [sent-386, score-0.282]
99 Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. [sent-407, score-0.394]
100 Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. [sent-431, score-0.493]
wordName wordTfidf (topN-words)
[('kv', 0.529), ('fv', 0.309), ('gene', 0.205), ('vertices', 0.184), ('roc', 0.182), ('hv', 0.176), ('enzymes', 0.154), ('metabolic', 0.154), ('graph', 0.152), ('genes', 0.145), ('edges', 0.137), ('kernel', 0.129), ('protein', 0.125), ('regularization', 0.123), ('genomic', 0.121), ('infer', 0.105), ('phylogenetic', 0.101), ('inference', 0.098), ('test', 0.092), ('supervised', 0.091), ('interactions', 0.079), ('catalyze', 0.076), ('lfv', 0.076), ('vs', 0.075), ('involving', 0.072), ('reconstruction', 0.07), ('network', 0.069), ('lkv', 0.066), ('disconnected', 0.066), ('index', 0.064), ('eigenvector', 0.061), ('vert', 0.06), ('positives', 0.056), ('integrated', 0.056), ('expression', 0.056), ('fi', 0.054), ('regulatory', 0.053), ('networks', 0.052), ('budding', 0.051), ('eisen', 0.051), ('kegg', 0.051), ('kexp', 0.051), ('kint', 0.051), ('kloc', 0.051), ('kphy', 0.051), ('krand', 0.051), ('marcotte', 0.051), ('metabolite', 0.051), ('pellegrini', 0.051), ('petri', 0.051), ('yamanishi', 0.051), ('train', 0.05), ('criterion', 0.05), ('chemical', 0.048), ('localization', 0.048), ('mapping', 0.048), ('nodes', 0.047), ('subgraph', 0.046), ('points', 0.046), ('biology', 0.045), ('spellman', 0.044), ('genome', 0.043), ('features', 0.042), ('microarrays', 0.04), ('cerevisiae', 0.04), ('pnas', 0.04), ('bioinformatics', 0.039), ('onto', 0.039), ('generalized', 0.038), ('eigenvectors', 0.038), ('xp', 0.037), ('imposing', 0.036), ('orthogonality', 0.035), ('kyoto', 0.035), ('yeast', 0.035), ('organisms', 0.035), ('pro', 0.035), ('direct', 0.035), ('vn', 0.034), ('euclidean', 0.033), ('edge', 0.032), ('microarray', 0.032), ('checked', 0.032), ('solution', 0.032), ('inferred', 0.031), ('rd', 0.031), ('tting', 0.03), ('training', 0.03), ('cell', 0.03), ('ensures', 0.03), ('ng', 0.029), ('norm', 0.029), ('strategy', 0.029), ('molecular', 0.029), ('predicted', 0.029), ('rn', 0.029), ('known', 0.028), ('scaling', 0.028), ('clustering', 0.027), ('predicting', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
2 0.12663789 168 nips-2004-Semigroup Kernels on Finite Sets
Author: Marco Cuturi, Jean-philippe Vert
Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1
3 0.1262055 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
4 0.094281033 40 nips-2004-Common-Frame Model for Object Recognition
Author: Pierre Moreels, Pietro Perona
Abstract: A generative probabilistic model for objects in images is presented. An object consists of a constellation of features. Feature appearance and pose are modeled probabilistically. Scene images are generated by drawing a set of objects from a given database, with random clutter sprinkled on the remaining image surface. Occlusion is allowed. We study the case where features from the same object share a common reference frame. Moreover, parameters for shape and appearance densities are shared across features. This is to be contrasted with previous work on probabilistic ‘constellation’ models where features depend on each other, and each feature and model have different pose and appearance statistics [1, 2]. These two differences allow us to build models containing hundreds of features, as well as to train each model from a single example. Our model may also be thought of as a probabilistic revisitation of Lowe’s model [3, 4]. We propose an efficient entropy-minimization inference algorithm that constructs the best interpretation of a scene as a collection of objects and clutter. We test our ideas with experiments on two image databases. We compare with Lowe’s algorithm and demonstrate better performance, in particular in presence of large amounts of background clutter.
5 0.0902844 42 nips-2004-Computing regularization paths for learning multiple kernels
Author: Francis R. Bach, Romain Thibaux, Michael I. Jordan
Abstract: The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity of solving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, we show empirically that the effect of the block 1-norm regularization differs notably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case. 1
6 0.088823795 165 nips-2004-Semi-supervised Learning on Directed Graphs
7 0.081002764 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
8 0.07900776 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
9 0.076380059 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
10 0.074085303 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
11 0.073436178 136 nips-2004-On Semi-Supervised Classification
12 0.073004752 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
13 0.071407884 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes
14 0.070905231 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
15 0.070825249 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
16 0.070768818 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
17 0.067036726 54 nips-2004-Distributed Information Regularization on Graphs
18 0.06645079 141 nips-2004-Optimal sub-graphical models
19 0.065754458 115 nips-2004-Maximum Margin Clustering
20 0.065638609 19 nips-2004-An Application of Boosting to Graph Classification
topicId topicWeight
[(0, -0.209), (1, 0.067), (2, -0.077), (3, 0.032), (4, -0.058), (5, -0.031), (6, -0.069), (7, -0.051), (8, -0.057), (9, -0.14), (10, -0.065), (11, -0.032), (12, -0.0), (13, 0.028), (14, 0.037), (15, 0.06), (16, 0.025), (17, -0.095), (18, 0.117), (19, -0.05), (20, 0.005), (21, 0.015), (22, -0.108), (23, -0.019), (24, 0.005), (25, -0.043), (26, -0.059), (27, -0.192), (28, -0.013), (29, 0.006), (30, -0.149), (31, 0.1), (32, 0.183), (33, -0.0), (34, -0.09), (35, 0.016), (36, 0.08), (37, 0.082), (38, 0.06), (39, 0.037), (40, 0.018), (41, -0.03), (42, 0.029), (43, 0.097), (44, 0.006), (45, 0.163), (46, 0.063), (47, -0.047), (48, -0.054), (49, 0.143)]
simIndex simValue paperId paperTitle
same-paper 1 0.93783152 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
2 0.67303038 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag
Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high affinity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we find that 2/3 of the predicted active motifs correspond to actual interaction sites. 1
3 0.62973011 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
Author: Ofer Shai, Brendan J. Frey, Quaid D. Morris, Qun Pan, Christine Misquitta, Benjamin J. Blencowe
Abstract: Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well understood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parameters are shown to correctly capture expected AS trends. A comparison of the results obtained with a well-established but low through-put experimental method demonstrate that AS levels obtained from GenASAP are highly predictive of AS levels in mammalian tissues. 1 Biological diversity through alternative splicing Current estimates place the number of genes in the human genome at approximately 30,000, which is a surprisingly small number when one considers that the genome of yeast, a singlecelled organism, has 6,000 genes. The number of genes alone cannot account for the complexity and cell specialization exhibited by higher eukaryotes (i.e. mammals, plants, etc.). Some of that added complexity can be achieved through the use of alternative splicing, whereby a single gene can be used to code for a multitude of products. Genes are segments of the double stranded DNA that contain the information required by the cell for protein synthesis. That information is coded using an alphabet of 4 (A, C, G, and T), corresponding to the four nucleotides that make up the DNA. In what is known as the central dogma of molecular biology, DNA is transcribed to RNA, which in turn is translated into proteins. Messenger RNA (mRNA) is synthesized in the nucleus of the cell and carries the genomic information to the ribosome. In eukaryotes, genes are generally comprised of both exons, which contain the information needed by the cell to synthesize proteins, and introns, sometimes referred to as spacer DNA, which are spliced out of the pre-mRNA to create mature mRNA. An estimated 35%-75% of human genes [1] can be C1 (a) C1 A C1 C2 C1 A 3’ C2 (b) C2 C1 A 3’ C1 A 5’ C2 A 5’ C2 C1 C1 C2 C1 (c) A2 A1 C1 A1 C1 A2 C1 C2 C2 C1 (d) C2 C2 A C2 C2 C2 C1 C2 Figure 1: Four types of AS. Boxes represent exons and lines represent introns, with the possible splicing alternatives indicated by the connectors. (a) Single cassette exon inclusion/exclusion. C1 and C2 are constitutive exons (exons that are included in all isoforms) and flank a single alternative exon (A). The alternative exon is included in one isoform and excluded in the other. (b) Alternative 3’ (or donor) and alternative 5’ (acceptor) splicing sites. Both exons are constitutive, but may contain alternative donor and/or acceptor splicing sites. (c) Mutually exclusive exons. One of the two alternative exons (A1 and A2 ) may be included in the isoform, but not both. (d) Intron inclusion. An intron may be included in the mature mRNA strand. spliced to yield different combinations of exons (called isoforms), a phenomenon referred to as alternative splicing (AS). There are four major types of AS as shown in Figure 1. Many multi-exon genes may undergo more than one alternative splicing event, resulting in many possible isoforms from a single gene. [2] In addition to adding to the genetic repertoire of an organism by enabling a single gene to code for more than one protein, AS has been shown to be critical for gene regulation, contributing to tissue specificity, and facilitating evolutionary processes. Despite the evident importance of AS, its regulation and impact on specific genes remains poorly understood. The work presented here is concerned with the inference of single cassette exon AS levels (Figure 1a) based on data obtained from RNA expression arrays, also known as microarrays. 1.1 An exon microarray data set that probes alternative splicing events Although it is possible to directly analyze the proteins synthesized by a cell, it is easier, and often more informative, to instead measure the abundance of mRNA present. Traditionally, gene expression (abundance of mRNA) has been studied using low throughput techniques (such as RT-PCR or Northern blots), limited to studying a few sequences at a time and making large scale analysis nearly impossible. In the early 1990s, microarray technology emerged as a method capable of measuring the expression of thousands of DNA sequences simultaneously. Sequences of interest are deposited on a substrate the size of a small microscope slide, to form probes. The mRNA is extracted from the cell and reverse-transcribed back into DNA, which is labelled with red and green fluorescent dye molecules (cy3 and cy5 respectively). When the sample of tagged DNA is washed over the slide, complementary strands of DNA from the sample hybridize to the probes on the array forming A-T and C-G pairings. The slide is then scanned and the fluorescent intensity is measured at each probe. It is generally assumed that the intensity measure at the probe is linearly related to the abundance of mRNA in the cell over a wide dynamic range. Despite significant improvements in microarray technologies in recent years, microarray data still presents some difficulties in analysis. Low measurements tend to have extremely low signal to noise ratio (SNR) [7] and probes often bind to sequences that are very similar, but not identical, to the one for which they were designed (a process referred to as cross- C1 A C1 A C1:A C1 C2 C2 3 Body probes A:C2 A C2 2 Inclusion junction probes C1:C2 C1 C2 1 Exclusion junction probe Figure 2: Each alternative splicing event is studied using six probes. Probes were chosen to measure the expression levels of each of the three exons involved in the event. Additionally, 3 probes are used that target the junctions that are formed by each of the two isoforms. The inclusion isoform would express the junctions formed by C1 and A, and A and C2 , while the exclusion isoform would express the junction formed by C1 and C2 hybridization). Additionally, probes exhibit somewhat varying hybridization efficiency, and sequences exhibit varying labelling efficiency. To design our data sets, we mined public sequence databases and identified exons that were strong candidates for exhibiting AS (the details of that analysis are provided elsewhere [4, 3]). Of the candidates, 3,126 potential AS events in 2,647 unique mouse genes were selected for the design of Agilent Custom Oligonucleotide microarray. The arrays were hybridized with unamplified mRNA samples extracted from 10 wild-type mouse tissues (brain, heart, intestine, kidney, liver, lung, salivary gland, skeletal muscle, spleen, and testis). Each AS event has six target probes on the arrays, chosen from regions of the C1 exon, C2 exon, A exon, C1 :A splice junction, A:C2 splice junction, and C1 :C2 splice junction, as shown in Figure 2. 2 Unsupervised discovery of alternative splicing With the exception of the probe measuring the alternative exon, A (Figure 2), all probes measure sequences that occur in both isoforms. For example, while the sequence of the probe measuring the junction A:C1 is designed to measure the inclusion isoform, half of it corresponds to a sequence that is found in the exclusion isoform. We can therefore safely assume that the measured intensity at each probe is a result of a certain amount of both isoforms binding to the probe. Due to the generally assumed linear relationship between the abundance of mRNA hybridized at a probe and the fluorescent intensity measured, we model the measured intensity as a weighted sum of the overall abundance of the two isoforms. A stronger assumption is that of a single, consistent hybridization profile for both isoforms across all probes and all slides. Ideally, one would prefer to estimate an individual hybridization profile for each AS event studied across all slides. However, in our current setup, the number of tissues is small (10), resulting in two difficulties. First, the number of parameters is very large when compared to the number of data point using this model, and second, a portion of the events do not exhibit tissue specific alternative splicing within our small set of tissues. While the first hurdle could be accounted for using Baysian parameter estimation, the second cannot. 2.1 GenASAP - a generative model for alternative splicing array platform Using the setup described above, the expression vector x, containing the six microarray measurements as real numbers, can be decomposed as a linear combination of the abundance of the two splice isoforms, represented by the real vector s, with some added noise: x = Λs + noise, where Λ is a 6 × 2 weight matrix containing the hybridization profiles for s1 ^ xC ^ xC 1 2 s2 ^ xC :A ^ xA ^ xA:C 2 ^ xC1:C2 2 xC1:C2 1 r xC xC 2 xA xC :A xA:C oC1 oC2 oA oC :A oA:C 1 1 1 2 oC :C 1 2 Σn 2 Figure 3: Graphical model for alternative splicing. Each measurement in the observed expression profile, x, is generated by either using a scale factor, r, on a linear combination of the isoforms, s, or drawing randomly from an outlier model. For a detailed description of the model, see text. the two isoforms across the six probes. Note that we may not have a negative amount of a given isoform, nor can the presence of an isoform deduct from the measured expression, and so both s and Λ are constrained to be positive. Expression levels measured by microarrays have previously been modelled as having expression-dependent noise [7]. To address this, we rewrite the above formulation as x = r(Λs + ε), (1) where r is a scale factor and ε is a zero-mean normally distributed random variable with a diagonal covariance matrix, Ψ, denoted as p(ε) = N (ε; 0, Ψ). The prior distribution for the abundance of the splice isoforms is given by a truncated normal distribution, denoted as p(s) ∝ N (s, 0, I)[s ≥ 0], where [·] is an indicator function such that [s ≥ 0] = 1 if ∀i, si ≥ 0, and [s ≥ 0] = 0 otherwise. Lastly, there is a need to account for aberrant observations (e.g. due to faulty probes, flakes of dust, etc.) with an outlier model. The complete GenASAP model (shown in Figure 3) accounts for the observations as the outcome of either applying equation (1) or an outlier model. To avoid degenerate cases and ensure meaningful and interpretable results, the number of faulty probes considered for each AS event may not exceed two, as indicated by the filled-in square constraint node in Figure 3. The distribution of x conditional on the latent variables, s, r, and o, is: N (xi ; rΛi s, r2 Ψi )[oi =0] N (xi ; Ei , Vi )[oi =1] , p(x|s, r, o) = (2) i where oi ∈ {0, 1} is a bernoulli random variable indicating if the measurement at probe xi is the result of the AS model or the outlier model parameterized by p(oi = 1) = γi . The parameters of the outlier model, E and V, are not optimized and are set to the mean and variance of the data. 2.2 Variational learning in the GenASAP model To infer the posterior distribution over the splice isoform abundances while at the same time learning the model parameters we use a variational expectation-maximization algorithm (EM). EM maximizes the log likelihood of the data by iteratively estimating the posterior distribution of the model given the data in the expectation (E) step, and maximizing the log likelihood with respect to the parameters, while keeping the posterior fixed, in the maximization (M) step. Variational EM is used when, as in the case of GenASAP, the exact posterior is intractable. Variational EM minimizes the free energy of the model, defined as the KL-divergence between the joint distribution of the latent and observed variables and the approximation to the posterior under the model parameters [5, 6]. We approximate the true posterior using the Q distribution given by T Q({s(t) }, {o(t) }, {r(t) }) = (t) Q(r(t) )Q(o(t) |r(t) ) t=1 (t) Q(si |oi , r(t) ) i −1 T =Z (t) (3) d d ρ(t) ω (t) N (s(t) ; µ(t) , Σ(t) )[s(t) ≥ 0], ro ro t=1 where Z is a normalization constant, the superscript d indicates that Σ is constrained to be diagonal, and there are T iid AS events. For computational efficiency, r is selected from a finite set, r ∈ {r1 , r2 , . . . , rC } with uniform probability. The variational free energy is given by Q({s(t) }, {o(t) }, {r(t) }) . P ({s(t) }, {o(t) }, {r(t) }, {x(t) }) s r o (4) Variational EM minimizes the free energy by iteratively updating the Q distribution’s vari(t)d (t)d ational parameters (ρ(t) , ω (t) , µro , and Σro ) in the E-step, and the model parameters (Λ, Ψ, {r1 , r2 , . . . , rC }, and γ) in the M-step. The resulting updates are too long to be shown in the context of this paper and are discussed in detail elsewhere [3]. A few particular points regarding the E-step are worth covering in detail here. Q({s(t) }, {o(t) }, {r(t) }) log F(Q, P ) = If the prior on s was a full normal distribution, there would be no need for a variational approach, and exact EM is possible. For a truncated normal distribution, however, the mixing proportions, Q(r)Q(o|r) cannot be calculated analytically except for the case where s is scalar, necessitating the diagonality constraint. Note that if Σ was allowed to be a full covariance matrix, equation (3) would be the true posterior, and we could find the sufficient statistics of Q(s(t) |o(t) , r(t) ): −1 µ(t) = (I + ΛT (I − O(t) )T Ψ−1 (I − O(t) )Λ)−1 ΛT (I − O(t) )T Ψ−1 x(t) r(t) ro −1 Σ(t) ro = (I + ΛT (I − O(t) )T Ψ−1 (I − O(t) )Λ) (5) (6) where O is a diagonal matrix with elements Oi,i = oi . Furthermore, it can be easily shown that the optimal settings for µd and Σd approximating a normal distribution with full covariance Σ and mean µ is µd optimal = µ −1 Σd optimal = diag(Σ (7) −1 ) (8) In the truncated case, equation (8) is still true. Equation (7) does not hold, though, and µd optimal cannot be found analytically. In our experiments, we found that using equation (7) still decreases the free energy every E-step, and it is significantly more efficient than using, for example, a gradient decent method to compute the optimal µd . Intuitive Weigh Matrix Optimal Weight Matrix 50 50 40 40 30 30 20 20 10 0 10 Inclusion Isoform 0 Exclusion Isoform Inclusion Isoform (a) Exclusion Isoform (b) Figure 4: (a) An intuitive set of weights. Based on the biological background, one would expect to see the inclusion isoform hybridize to the probes measuring C1 , C2 , A, C1 :A, and A:C2 , while the exclusion isoform hybridizes to C1 , C2 , and C1 :C2 . (b) The learned set of weights closely agrees with the intuition, and captures cross hybridization between the probes Contribution of exclusion isoform Contribution of inclusion isoform AS model Original Data (a) RT−PCR AS model measurement prediction (% exclusion) (% exclusion) 14% 72% (b) 27% 70% 8% 22% outliers (c) Figure 5: Three examples of data cases and their predictions. (a) The data does not follow our notion of single cassette exon AS, but the AS level is predicted accurately by the model.(b) The probe C1 :A is marked as outlier, allowing the model to predict the other probes accurately. (c) Two probes are marked as outliers, and the model is still successful in predicting the AS levels. 3 Making biological predictions about alternative splicing The results presented in this paper were obtained using two stages of learning. In the first step, the weight matrix, Λ, is learned on a subset of the data that is selected for quality. Two selection criteria were used: (a) sequencing data was used to select those cases for which, with high confidence, no other AS event is present (Figure 1) and (b) probe sets were selected for high expression, as determined by a set of negative controls. The second selection criterion is motivated by the common assumption that low intensity measurements are of lesser quality (see Section 1.1). In the second step, Λ is kept fixed, and we introduce the additional constraint that the noise is isotropic (Ψ = ψI) and learn on the entire data set. The constraint on the noise is introduced to prevent the model from using only a subset of the six probes for making the final set of predictions. We show a typical learned set of weights in Figure 4. The weights fit well with our intuition of what they should be to capture the presence of the two isoforms. Moreover, the learned weights account for the specific trends in the data. Examples of model prediction based on the microarray data are shown in Figure 5. Due to the nature of the microarray data, we do not expect all the inferred abundances to be equally good, and we devised a scoring criterion that ranks each AS event based on its fit to the model. Intuitively, given two input vectors that are equivalent up to a scale factor, with inferred MAP estimations that are equal up to the same scale factor, we would like their scores to be identical. The scoring criterion used, therefore is k (xk − rΛk s)2 /(xk + Rank 500 1000 2000 5000 10000 15000 20000 30000 Pearson’s correlation coefficient 0.94 0.95 0.95 0.79 0.79 0.78 0.75 0.65 False positive rate 0.11 0.08 0.05 0.2 0.25 0.29 0.32 0.42 Table 1: Model performance evaluated at various ranks. Using 180 RT-PCR measurements, we are able to predict the model’s performance at various ranks. Two evaluation criteria are used: Pearson’s correlation coefficient between the model’s predictions and the RT-PCR measurements and false positive rate, where a prediction is considered to be false positive if it is more than 15% away from the RT-PCR measurement. rΛk s)2 , where the MAP estimations for r and s are used. This scoring criterion can be viewed as proportional to the sum of noise to signal ratios, as estimated using the two values given by the observation and the model’s best prediction of that observation. Since it is the relative amount of the isoforms that is of most interest, we need to use the inferred distribution of the isoform abundances to obtain an estimate for the relative levels of AS. It is not immediately clear how this should be done. We do, however, have RTPCR measurements for 180 AS events to guide us (see figure 6 for details). Using the top 50 ranked RT-PCR measurement, we fit three parameters, {a1 , a2 , a3 }, such that the s2 proportion of excluded isoform present, p, is given by p = a1 s1 +a2 s2 + a3 , where s1 is the MAP estimation of the abundance of the inclusion isoform, s2 is the MAP estimation of the abundance of the exclusion isoform, and the RT-PCR measurement are used for target p. The parameters are fitted using gradient descent on a least squared error (LSE) evaluation criterion. We used two criteria to evaluate the quality of the AS model predictions. Pearson’s correlation coefficient (PCC) is used to evaluate the overall ability of the model to correctly estimate trends in the data. PCC is invariant to affine transformation and so is independent of the transformation parameters a1 and a3 discussed above, while the parameter a2 was found to effect PCC very little. The PCC stays above 0.75 for the top two thirds ranked predictions. The second evaluation criterion used is the false positive rate, where a prediction is considered to be false positive if it is more than 15% away from the RT-PCR measurement. This allows us to say, for example, that if a prediction is within the top 10000, we are 75% confident that it is within 15% of the actual levels of AS. 4 Summary We designed a novel AS model for the inference of the relative abundance of two alternatively spliced isoforms from six measurements. Unsupervised learning in the model is performed using a structured variational EM algorithm, which correctly captures the underlying structure of the data, as suggested by its biological nature. The AS model, though presented here for a cassette exon AS events, can be used to learn any type of AS, and with a simple adjustment, multiple types. The predictions obtained from the AS model are currently being used to verify various claims about the role of AS in evolution and functional genomics, and to help identify sequences that affect the regulation of AS. % Exclusion isoform RT−PCR measurement Vs. AS model predictions RT−PCR measurements: 90 80 AS model prediction Int es Te tine sti Kid s n Sa ey liva Br ry ain Sp le Liv en er Mu sc Lu le ng 100 70 60 50 40 30 14 22 27 32 47 46 66 78 63 20 AS model prediction: 10 27 24 26 26 51 75 60 85 100 (a) 0 0 20 40 60 80 RT−PCR measurement 100 (b) Figure 6: (a) Sample RT-PCR. RNA extracted from the cell is reverse-transcribed to DNA, amplified and labelled with radioactive or fluorescent molecules. The sample is pulled through a viscous gel in an electric field (DNA, being an acid, is positively charged). Shorter strands travel further through the gel than longer ones, resulting in two distinct bands, corresponding to the two isoforms, when exposed to a photosensitive or x-ray film. (b) A scatter plot showing the RT-PCR measurements as compared to the AS model predictions. The plot shows all available RT-PCR measurements with a rank of 8000 or better. The AS model presented assumes a single weight matrix for all data cases. This is an oversimplified view of the data, and current work is being carried out in identifying probe specific expression profiles. However, due to the low dimensionality of the problem (10 tissues, six probes per event), care must be taken to avoid overfitting and to ensure meaningful interpretations. Acknowledgments We would like to thank Wen Zhang, Naveed Mohammad, and Timothy Hughes for their contributions in generating the data set. This work was funded in part by an operating and infrastructure grants from the CIHR and CFI, and a operating grants from NSERC and a Premier’s Research Excellence Award. References [1] J. M. Johnson et al. Genome-wide survey of human alternative pre-mrna splicing with exon junction microarrays. Science, 302:2141–44, 2003. [2] L. Cartegni et al. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Gen. Rev., 3:285–98, 2002. [3] Q. Pan et al. Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. Molecular Cell, 16(6):929–41, 2004. [4] Q. Pan et al. Alternative splicing of conserved exons is frequently species specific in human and mouse. Trends Gen., In Press, 2005. [5] M. I. Jordan, Z. Ghahramani, T. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183– 233, 1999. [6] R. M. Neal and G. E. Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models. Cambridge, MIT Press, 1998. [7] D. M. Rocke and B. Durbin. A model for measurement error for gene expression arrays. Journal of Computational Biology, 8(6):557–69, 2001.
4 0.57172745 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
5 0.564623 165 nips-2004-Semi-supervised Learning on Directed Graphs
Author: Dengyong Zhou, Thomas Hofmann, Bernhard Schölkopf
Abstract: Given a directed graph in which some of the nodes are labeled, we investigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions defined over nodes of a directed graph that forces the classification function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classification algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classification problems demonstrates encouraging results that validate our approach. 1
6 0.50826049 128 nips-2004-Neural Network Computation by In Vitro Transcriptional Circuits
7 0.50379431 141 nips-2004-Optimal sub-graphical models
8 0.49386925 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
9 0.48996264 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data
10 0.48619029 168 nips-2004-Semigroup Kernels on Finite Sets
11 0.46272916 30 nips-2004-Binet-Cauchy Kernels
12 0.45864111 57 nips-2004-Economic Properties of Social Networks
13 0.42875403 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
14 0.41145492 94 nips-2004-Kernels for Multi--task Learning
15 0.39085889 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
16 0.37402937 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
17 0.36803436 42 nips-2004-Computing regularization paths for learning multiple kernels
18 0.36430281 19 nips-2004-An Application of Boosting to Graph Classification
19 0.34700245 96 nips-2004-Learning, Regularization and Ill-Posed Inverse Problems
20 0.33815125 52 nips-2004-Discrete profile alignment via constrained information bottleneck
topicId topicWeight
[(3, 0.251), (13, 0.084), (15, 0.137), (25, 0.013), (26, 0.051), (31, 0.039), (33, 0.191), (35, 0.017), (39, 0.019), (50, 0.046), (52, 0.019), (76, 0.024)]
simIndex simValue paperId paperTitle
1 0.87202156 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
Author: Felix A. Wichmann, Arnulf B. Graf, Heinrich H. Bülthoff, Eero P. Simoncelli, Bernhard Schölkopf
Abstract: We study gender discrimination of human faces using a combination of psychophysical classification and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classifiers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classifiers) using human classification data. Because we combine a linear preprocessor with linear classifiers, the entire system acts as a linear classifier, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classifier. We predict that the female-tomaleness transition along the normal vector for classifiers closely mimicking human classification (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. 1
same-paper 2 0.84045559 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
3 0.83605754 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky
Abstract: We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. 1
4 0.71201444 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer
Abstract: Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight. 1
5 0.71036679 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
Author: Changjiang Yang, Ramani Duraiswami, Larry S. Davis
Abstract: The computation and memory required for kernel machines with N training samples is at least O(N 2 ). Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N ). We also give an error bound for the approximation, and provide experimental results on the UCI datasets. 1
6 0.71013153 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
7 0.70946765 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
8 0.7070961 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
9 0.70658314 69 nips-2004-Fast Rates to Bayes for Kernel Machines
10 0.70613945 131 nips-2004-Non-Local Manifold Tangent Learning
11 0.70589828 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
12 0.70559746 178 nips-2004-Support Vector Classification with Input Data Uncertainty
13 0.70438218 10 nips-2004-A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
14 0.70385927 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
15 0.70364076 168 nips-2004-Semigroup Kernels on Finite Sets
16 0.70342326 77 nips-2004-Hierarchical Clustering of a Mixture Model
17 0.70326418 127 nips-2004-Neighbourhood Components Analysis
18 0.70283729 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
19 0.70262152 102 nips-2004-Learning first-order Markov models for control
20 0.70235115 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications