nips nips2002 nips2002-32 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chen Yanover, Yair Weiss
Abstract: Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF). In cases where exact inference was possible, max-product BP always found the global minimum of the energy (except in few cases where it failed to converge), while other approximation algorithms of similar complexity did not. In the full protein data set, maxproduct BP always found a lower energy configuration than the other algorithms, including a widely used protein-folding software (SCWRL). 1
Reference: text
sentIndex sentText sentNum sentScore
1 it Abstract Side-chain prediction is an important subtask in the protein-folding problem. [sent-4, score-0.073]
2 We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. [sent-5, score-0.697]
3 The graphical model is relatively sparse yet has many cycles. [sent-6, score-0.103]
4 We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. [sent-7, score-0.2]
5 Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF). [sent-8, score-0.098]
6 In cases where exact inference was possible, max-product BP always found the global minimum of the energy (except in few cases where it failed to converge), while other approximation algorithms of similar complexity did not. [sent-9, score-0.514]
7 In the full protein data set, maxproduct BP always found a lower energy configuration than the other algorithms, including a widely used protein-folding software (SCWRL). [sent-10, score-0.637]
8 1 Introduction Inference in graphical models scales exponentially with the number of variables. [sent-11, score-0.103]
9 Since many real-world applications involve hundreds of variables, it has been impossible to utilize the powerful mechanism of probabilistic inference in these applications. [sent-12, score-0.154]
10 Despite the significant progress achieved in approximate inference, some practical questions still remain open - it is not yet known which algorithm to use for a given problem nor is it understood what are the advantages and disadvantages of each technique. [sent-13, score-0.075]
11 We address these questions in the context of real-world protein-folding application - the side-chain prediction problem. [sent-14, score-0.073]
12 Predicting side-chain conformation given the backbone structure is a central problem in protein-folding and molecular design. [sent-15, score-0.451]
13 Figure 1: Cow actin binding protein (PDB code 1pne, top) and closer view of its 6 carboxyl-terminal residues (bottom-left). [sent-17, score-0.542]
14 Given the protein backbone (black) and amino acid sequence, native side-chain conformation (gray) is searched for. [sent-18, score-0.93]
15 Problem representation as a graphical model for those carboxyl-terminal residues shown in the bottom-right figure (nodes located at COl atom positions, edges drawn in black). [sent-19, score-0.58]
16 In this paper, we show the equivalence between side-chain prediction and inference in an undirected graphical model. [sent-20, score-0.369]
17 We compare the performance of BP, generalized BP and naive mean field on this problem as well as comparing to a widely used protein-folding program called SCWRL. [sent-21, score-0.132]
18 2 The side-chain prediction problem Proteins are chains of simpler molecules called amino acids. [sent-22, score-0.245]
19 All amino acids have a common structure - a central carbon atom (COl) to which a hydrogen atom, an amino group (N H 2 ) and a carboxyl group (COOH) are bonded. [sent-23, score-0.633]
20 In addition, each amino acid has a chemical group called the side-chain, bound to COl. [sent-24, score-0.32]
21 This group distinguishes one amino acid from another and gives its distinctive properties. [sent-25, score-0.286]
22 Amino acids are joined end to end during protein synthesis by the formation of peptide bonds. [sent-26, score-0.395]
23 An amino acid unit in a protein is called a residue. [sent-27, score-0.513]
24 The formation of a succession of peptide bonds generate the backbone (consisting of COl and its adjacent atoms, N and CO, of each reside), upon which the side-chains are hanged (Figure 1). [sent-28, score-0.486]
25 We seek to predict the configuration of all the side-chains relative to the backbone. [sent-29, score-0.173]
26 The standard approach to this problem is to define an energy function and use the configuration that achieves the global minimum of the energy as the prediction. [sent-30, score-0.759]
27 1 The energy function We adopted the van der Waals energy function, used by SCWRL [3], which approximates the repulsive portion of Lennard-Jones 12-6 potential. [sent-32, score-0.464]
28 For a pair of atoms, a and b, the energy of interaction is given by: E(a, b) = { -k2 :'0 + k~ d> Ro Ro ~ d ~ k1Ro k1Ro > d Emax (1) where Emax = 10, kl = 0. [sent-33, score-0.286]
29 Constant radii were used for protein's atoms (Carbon - 1. [sent-35, score-0.189]
30 For two sets of atoms, the interaction energy is a sum of the pairwise atom interactions. [sent-39, score-0.468]
31 The energy surface of a typical protein in the data set has dozens to thousands local minima. [sent-40, score-0.55]
32 2 Rotamers The configuration of a single side-chain is represented by at most 4 dihedral angles (denoted Xl,X2,X3 and X4)' Any assignment of X angles for all the residues defines a protein configuration. [sent-42, score-0.917]
33 Thus the energy minimization problem is a highly nonlinear continuous optimization problem. [sent-43, score-0.232]
34 It turns out, however, that side-chains have a small repertoire of energetically preferred conformations, called rotamers. [sent-44, score-0.034]
35 Statistical analysis of those conformations in well-determined protein structures produce a rotamer library. [sent-45, score-0.58]
36 We used a backbone dependent rotamer library (by Dunbrack and Kurplus, July 2001 version). [sent-46, score-0.631]
37 Given the coordinates of the backbone atoms, its dihedral angles ¢ (defined, for the ith residue, by Ci - 1 - Ni - Ci - Ci ) and 'IjJ (defined by Ni - Ci - Ci - NHd were calculated. [sent-47, score-0.406]
38 The library then gives the typical rotamers for each side-chain and their prior probabilities. [sent-48, score-0.304]
39 By using the library we convert the continuous optimization problem into a discrete one. [sent-49, score-0.118]
40 The number of discrete variables is equal to the number of residues and the possible values each variable can take lies between 2 and 81. [sent-50, score-0.31]
41 3 Graphical model Since we have a discrete optimization problem and the energy function is a sum of pairwise interactions, we can transform the problem into a graphical model with pairwise potentials. [sent-52, score-0.451]
42 Each node corresponds to a residue, and the state of each node represents the configuration of the side chain of that residue. [sent-53, score-0.233]
43 Denoting by {rd an assignment of rotamers for all the residues then: P({ri}) = ! [sent-54, score-0.54]
44 Thus we only need to calculate the pairwise interactions for nearby residues. [sent-58, score-0.114]
45 To define the topology of the undirected graph, we examine all pairs of residues i, j and check whether there exists an assignment ri, rj for which the energy is nonzero. [sent-59, score-0.781]
46 If it exists, we connect nodes i and j in the graph and set the potential to be: (4) Figure 1 shows a subgraph of the undirected graph. [sent-60, score-0.221]
47 The graph is relatively sparse (each node is connected to nodes that are close in 3D space) but contains many small loops. [sent-61, score-0.213]
48 A typical protein in the data set gives rise to a model with hundreds of loops of size 3. [sent-62, score-0.383]
49 3 Experiments When the protein was small enough we used the max-junction tree algorithm [1] to find the most likely configuration of the variables (and hence the global minimum of the energy function). [sent-63, score-0.731]
50 The approximate inference algorithms we tested were loopy belief propagation (BP), generalized BP (GBP) and naive mean field (MF). [sent-65, score-0.329]
51 BP is an exact and efficient local message passing algorithm for inference in singly connected graphs [15]. [sent-66, score-0.242]
52 Its essential idea is replacing the exponential enumeration (either summation or maximizing) over the unobserved nodes with series of local enumerations (a process called "elimination" or "peeling"). [sent-67, score-0.15]
53 Loopy BP, that is applying BP to multiply connected graphical models , may not converge due to circulation of messages through the loops [12]. [sent-68, score-0.318]
54 However, many groups have recently reported excellent results using loopy BP as an approximate inference algorithm [4, 11, 5]. [sent-69, score-0.231]
55 GBP is a class of approximate inference algorithms that trade complexity for accuracy [15]. [sent-71, score-0.169]
56 A subset of GBP algorithms is equivalent to forming a graph from clusters of nodes and edges in the original graph and then running ordinary BP on the cluster graph. [sent-72, score-0.387]
57 Both clusters contained all nodes in the graph but each cluster contained only a subset of the edges. [sent-74, score-0.294]
58 The first cluster contained all edges resulting from residues, for which the difference between its indices is less than a constant k (typically, 6). [sent-75, score-0.127]
59 All other edges were included in the second cluster. [sent-76, score-0.043]
60 It can be shown that the cluster graph BP messages can be computed efficiently using the JT algorithm. [sent-77, score-0.161]
61 Thus this approximation tries to capture dependencies between a large number of nodes in the original graph while maintaining computational feasibility. [sent-78, score-0.201]
62 The naive MF approximation tries to approximate the joint distribution in equation 2 as a product of independent marginals qi(ri) . [sent-79, score-0.183]
63 The marginals qi(ri) can be found by iterating: qi(ri) f- a\(li(ri) exp (L L qj(rj) log \(Iij(ri, rj )) JENi rj (5) where a denotes a normalization constant and Ni means all nodes neighboring i. [sent-80, score-0.317]
64 We initialized qi(ri) to \[Ii(ri) and chose a random update ordering for the nodes. [sent-81, score-0.033]
65 For each protein we repeated this minimization 10 times (each time with a different update order) and chose the local minimum that gave the lowest energy. [sent-82, score-0.38]
66 In addition to the approximate inference algorithms described above, we also compared the results to two approaches in use in side-chain prediction: the SCWRL and DEE algorithms. [sent-83, score-0.169]
67 The Side-Chain placement With a Rotamer Library (SCWRL) algorithm is considered one of the leading algorithms for predicting side-chain conformations [3]. [sent-84, score-0.119]
68 It uses the energy function described above (equation 1) and a heuristic search strategy to find a minimal energy conformation in a discrete conformational space (defined using rotamer library). [sent-85, score-0.881]
69 Dead end elimination (DEE) is a search algorithm that tries to reduce the search space until it becomes suitable for an exhaustive search. [sent-86, score-0.212]
70 It is based on a simple condition that identifies rotamers that cannot be members of the global minimum energy conformation [2]. [sent-87, score-0.694]
71 If enough rotamers can be eliminated, the global minimum energy conformation can be found by an exhaustive search of the remaining rotamers. [sent-88, score-0.747]
72 The various inference algorithms were tested on set of 325 X-ray crystal structures with resolution better than or equal to 2A, R factor below 20% and length up to 300 residues. [sent-89, score-0.186]
73 One representative structure was selected from each cluster of homologous structures (50% homology or more) . [sent-90, score-0.165]
74 Protein structures were acquired from Protein Data Bank site (http://www. [sent-91, score-0.038]
75 Many proteins contain Cysteine residues which tend to form strong disulfide bonds with each other. [sent-94, score-0.498]
76 in SCWRL) is to first search for possible disulfide bonds and if they exist to freeze these residues in that configuration. [sent-97, score-0.537]
77 We repeated our experiments with and without freezing the Cysteine residues. [sent-99, score-0.06]
78 Side-chain to backbone interaction seems to be much severe than side-chain to sidechain interaction - the backbone is more rigid than side-chains and its structure assumed to be known. [sent-100, score-0.7]
79 Therefore, the parameter R was introduced into the pairwise potential equation, as follows: \[Io(ro ro) -- (e -,f-E(ri ,r;))* "J ", J (6) Using R > 1 assigns an increased weight for side-chain to backbone interactions over side-chain to side-chain interactions. [sent-101, score-0.436]
80 We repeated our experiments both with R = 1 and R > 1. [sent-102, score-0.033]
81 It worth mentioning that SCWRL implicitly adopts a weighting assumption that assigns an increased weight to side-chain to backbone interactions. [sent-103, score-0.322]
82 4 Results In our first set of experiments we wanted to compare approximate inference to exact inference. [sent-104, score-0.201]
83 In order to make exact inference possible we restricted the possible rotamers of each residue. [sent-105, score-0.339]
84 Out of the 81 possible states we chose a subset whose local probability accounted for 90% of the local probability. [sent-106, score-0.124]
85 We constrained the size of the subset to be at least 2. [sent-107, score-0.027]
86 The resulting graphical model retains only a small fraction of the loops occurring in the full graphical model (about 7% of the loops of size 3). [sent-108, score-0.442]
87 However, it still contains many small loops, and in particular, dozens of loops of size 3. [sent-109, score-0.172]
88 On these graphs we found that ordinary max-product BP always found the global minimum of the energy function (except in few cases where it failed to converge). [sent-110, score-0.402]
wordName wordTfidf (topN-words)
[('residues', 0.31), ('backbone', 0.296), ('bp', 0.274), ('protein', 0.232), ('energy', 0.232), ('ri', 0.223), ('rotamer', 0.217), ('rotamers', 0.186), ('scwrl', 0.186), ('configuration', 0.173), ('amino', 0.165), ('atoms', 0.162), ('conformation', 0.155), ('atom', 0.124), ('gbp', 0.124), ('ro', 0.123), ('inference', 0.121), ('library', 0.118), ('loops', 0.118), ('graphical', 0.103), ('rj', 0.099), ('bonds', 0.093), ('col', 0.093), ('conformations', 0.093), ('nodes', 0.084), ('acid', 0.082), ('ci', 0.074), ('mf', 0.069), ('graph', 0.069), ('undirected', 0.068), ('ni', 0.065), ('carbon', 0.062), ('cysteine', 0.062), ('dee', 0.062), ('dihedral', 0.062), ('disulfide', 0.062), ('emax', 0.062), ('peptide', 0.062), ('loopy', 0.062), ('qi', 0.061), ('pairwise', 0.058), ('interactions', 0.056), ('interaction', 0.054), ('cluster', 0.054), ('dozens', 0.054), ('residue', 0.054), ('naive', 0.052), ('minimum', 0.05), ('approximate', 0.048), ('angles', 0.048), ('tries', 0.048), ('homology', 0.046), ('field', 0.046), ('prediction', 0.046), ('search', 0.045), ('global', 0.044), ('assignment', 0.044), ('edges', 0.043), ('jt', 0.041), ('ordinary', 0.041), ('acids', 0.039), ('elimination', 0.039), ('group', 0.039), ('ii', 0.039), ('structures', 0.038), ('messages', 0.038), ('defined', 0.036), ('marginals', 0.035), ('formation', 0.035), ('exhaustive', 0.035), ('failed', 0.035), ('called', 0.034), ('proteins', 0.033), ('hundreds', 0.033), ('chose', 0.033), ('repeated', 0.033), ('exact', 0.032), ('local', 0.032), ('equivalence', 0.031), ('connected', 0.03), ('node', 0.03), ('contained', 0.03), ('converge', 0.029), ('define', 0.028), ('questions', 0.027), ('conserved', 0.027), ('crystal', 0.027), ('eo', 0.027), ('freeze', 0.027), ('freezing', 0.027), ('homologous', 0.027), ('homologs', 0.027), ('identifies', 0.027), ('joined', 0.027), ('radii', 0.027), ('singly', 0.027), ('subtask', 0.027), ('subset', 0.027), ('predicting', 0.026), ('assigns', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 32 nips-2002-Approximate Inference and Protein-Folding
Author: Chen Yanover, Yair Weiss
Abstract: Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF). In cases where exact inference was possible, max-product BP always found the global minimum of the energy (except in few cases where it failed to converge), while other approximation algorithms of similar complexity did not. In the full protein data set, maxproduct BP always found a lower energy configuration than the other algorithms, including a widely used protein-folding software (SCWRL). 1
2 0.16486575 94 nips-2002-Fractional Belief Propagation
Author: Wim Wiegerinck, Tom Heskes
Abstract: We consider loopy belief propagation for approximate inference in probabilistic graphical models. A limitation of the standard algorithm is that clique marginals are computed as if there were no loops in the graph. To overcome this limitation, we introduce fractional belief propagation. Fractional belief propagation is formulated in terms of a family of approximate free energies, which includes the Bethe free energy and the naive mean-field free as special cases. Using the linear response correction of the clique marginals, the scale parameters can be tuned. Simulation results illustrate the potential merits of the approach.
3 0.14743328 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
Author: Gianluca Pollastri, Pierre Baldi, Alessandro Vullo, Paolo Frasconi
Abstract: We develop and test new machine learning methods for the prediction of topological representations of protein structures in the form of coarse- or fine-grained contact or distance maps that are translation and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topology directly in the fine-grained case and, in the coarsegrained case, indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achieve state-of-the-art performance. 1 Introduction: Protein Topology Prediction Predicting the 3D structure of protein chains from the linear sequence of amino acids is a fundamental open problem in computational molecular biology [1]. Any approach to the problem must deal with the basic fact that protein structures are translation and rotation invariant. To address this invariance, we have proposed a machine learning approach to protein structure prediction [4] based on the prediction of topological representations of proteins, in the form of contact or distance maps. The contact or distance map is a 2D representation of neighborhood relationships consisting of an adjacency matrix at some distance cutoff (typically in the range of 6 to 12 ˚), or a matrix of pairwise Euclidean distances. Fine-grained maps A are derived at the amino acid or even atomic level. Coarse maps are obtained by looking at secondary structure elements, such as helices, and the distance between their centers of gravity or, as in the simulations below, the minimal distances between their Cα atoms. Reasonable methods for reconstructing 3D coordinates from contact/distance maps have been developed in the NMR literature and elsewhere Oi B Hi F Hi Ii Figure 1: Bayesian network for bidirectional IOHMMs consisting of input units, output units, and both forward and backward Markov chains of hidden states. [14] using distance geometry and stochastic optimization techniques. Thus the main focus here is on the more difficult task of contact map prediction. Various algorithms for the prediction of contact maps have been developed, in particular using feedforward neural networks [6]. The best contact map predictor in the literature and at the last CASP prediction experiment reports an average precision [True Positives/(True Positives + False Positives)] of 21% for distant contacts, i.e. with a linear distance of 8 amino acid or more [6] for fine-grained amino acid maps. While this result is encouraging and well above chance level by a factor greater than 6, it is still far from providing sufficient accuracy for reliable 3D structure prediction. A key issue in this area is the amount of noise that can be tolerated in a contact map prediction without compromising the 3D-reconstruction step. While systematic tests in this area have not yet been published, preliminary results appear to indicate that recovery of as little as half of the distant contacts may suffice for proper reconstruction, at least for proteins up to 150 amino acid long (Rita Casadio and Piero Fariselli, private communication and oral presentation during CASP4 [10]). It is important to realize that the input to a fine-grained contact map predictor need not be confined to the sequence of amino acids only, but may also include evolutionary information in the form of profiles derived by multiple alignment of homologue proteins, or structural feature information, such as secondary structure (alpha helices, beta strands, and coils), or solvent accessibility (surface/buried), derived by specialized predictors [12, 13]. In our approach, we use different GIOHMM and GRNN strategies to predict both structural features and contact maps. 2 GIOHMM Architectures Loosely speaking, GIOHMMs are Bayesian networks with input, hidden, and output units that can be used to process complex data structures such as sequences, images, trees, chemical compounds and so forth, built on work in, for instance, [5, 3, 7, 2, 11]. In general, the connectivity of the graphs associated with the hidden units matches the structure of the data being processed. Often multiple copies of the same hidden graph, but with different edge orientations, are used in the hidden layers to allow direct propagation of information in all relevant directions. Output Plane NE NW 4 Hidden Planes SW SE Input Plane Figure 2: 2D GIOHMM Bayesian network for processing two-dimensional objects such as contact maps, with nodes regularly arranged in one input plane, one output plane, and four hidden planes. In each hidden plane, nodes are arranged on a square lattice, and all edges are oriented towards the corresponding cardinal corner. Additional directed edges run vertically in column from the input plane to each hidden plane, and from each hidden plane to the output plane. To illustrate the general idea, a first example of GIOHMM is provided by the bidirectional IOHMMs (Figure 1) introduced in [2] to process sequences and predict protein structural features, such as secondary structure. Unlike standard HMMs or IOHMMS used, for instance in speech recognition, this architecture is based on two hidden markov chains running in opposite directions to leverage the fact that biological sequences are spatial objects rather than temporal sequences. Bidirectional IOHMMs have been used to derive a suite of structural feature predictors [12, 13, 4] available through http://promoter.ics.uci.edu/BRNN-PRED/. These predictors have accuracy rates in the 75-80% range on a per amino acid basis. 2.1 Direct Prediction of Topology To predict contact maps, we use a 2D generalization of the previous 1D Bayesian network. The basic version of this architecture (Figures 2) contains 6 layers of units: input, output, and four hidden layers, one for each cardinal corner. Within each column indexed by i and j, connections run from the input to the four hidden units, and from the four hidden units to the output unit. In addition, the hidden units in each hidden layer are arranged on a square or triangular lattice, with all the edges oriented towards the corresponding cardinal corner. Thus the parameters of this two-dimensional GIOHMMs, in the square lattice case, are the conditional probability distributions: NE NW SW SE P (Oi |Ii,j , Hi,j , Hi,j , Hi,j , Hi,j, ) NE NE NE P (Hi,j |Ii,j , Hi−1,j , Hi,j−1 ) N NW NW P (Hi,jW |Ii,j , Hi+1,j , Hi,j−1 ) SW SW SW P (Hi,j |Ii,j , Hi+1,j , Hi,j+1 ) SE SE SE P (Hi,j |Ii,j , Hi−1,j , Hi,j+1 ) (1) In a contact map prediction at the amino acid level, for instance, the (i, j) output represents the probability of whether amino acids i and j are in contact or not. This prediction depends directly on the (i, j) input and the four-hidden units in the same column, associated with omni-directional contextual propagation in the hidden planes. In the simulations reported below, we use a more elaborated input consisting of a 20 × 20 probability matrix over amino acid pairs derived from a multiple alignment of the given protein sequence and its homologues, as well as the structural features of the corresponding amino acids, including their secondary structure classification and their relative exposure to the solvent, derived from our corresponding predictors. It should be clear how GIOHMM ideas can be generalized to other data structures and problems in many ways. In the case of 3D data, for instance, a standard GIOHMM would have an input cube, an output cube, and up to 8 cubes of hidden units, one for each corner with connections inside each hidden cube oriented towards the corresponding corner. In the case of data with an underlying tree structure, the hidden layers would correspond to copies of the same tree with different orientations and so forth. Thus a fundamental advantage of GIOHMMs is that they can process a wide range of data structures of variable sizes and dimensions. 2.2 Indirect Prediction of Topology Although GIOHMMs allow flexible integration of contextual information over ranges that often exceed what can be achieved, for instance, with fixed-input neural networks, the models described above still suffer from the fact that the connections remain local and therefore long-ranged propagation of information during learning remains difficult. Introduction of large numbers of long-ranged connections is computationally intractable but in principle not necessary since the number of contacts in proteins is known to grow linearly with the length of the protein, and hence connectivity is inherently sparse. The difficulty of course is that the location of the long-ranged contacts is not known. To address this problem, we have developed also a complementary GIOHMM approach described in Figure 3 where a candidate graph structure is proposed in the hidden layers of the GIOHMM, with the two different orientations naturally associated with a protein sequence. Thus the hidden graphs change with each protein. In principle the output ought to be a single unit (Figure 3b) which directly computes a global score for the candidate structure presented in the hidden layer. In order to cope with long-ranged dependencies, however, it is preferable to compute a set of local scores (Figure 3c), one for each vertex, and combine the local scores into a global score by averaging. More specifically, consider a true topology represented by the undirected contact graph G∗ = (V, E ∗ ), and a candidate undirected prediction graph G = (V, E). A global measure of how well E approximates E ∗ is provided by the informationretrieval F1 score defined by the normalized edge-overlap F1 = 2|E ∩ E ∗ |/(|E| + |E ∗ |) = 2P R/(P + R), where P = |E ∩ E ∗ |/|E| is the precision (or specificity) and R = |E ∩ E ∗ |/|E ∗ | is the recall (or sensitivity) measure. Obviously, 0 ≤ F1 ≤ 1 and F1 = 1 if and only if E = E ∗ . The scoring function F1 has the property of being monotone in the sense that if |E| = |E | then F1 (E) < F1 (E ) if and only if |E ∩ E ∗ | < |E ∩ E ∗ |. Furthermore, if E = E ∪ {e} where e is an edge in E ∗ but not in E, then F1 (E ) > F1 (E). Monotonicity is important to guide the search in the space of possible topologies. It is easy to check that a simple search algorithm based on F1 takes on the order of O(|V |3 ) steps to find E ∗ , basically by trying all possible edges one after the other. The problem then is to learn F1 , or rather a good approximation to F1 . To approximate F1 , we first consider a similar local measure Fv by considering the O I(v) I(v) F B H (v) H (v) (a) I(v) F B H (v) H (v) (b) O(v) (c) Figure 3: Indirect prediction of contact maps. (a) target contact graph to be predicted. (b) GIOHMM with two hidden layers: the two hidden layers correspond to two copies of the same candidate graph oriented in opposite directions from one end of the protein to the other end. The single output O is the global score of how well the candidate graph approximates the true contact map. (c) Similar to (b) but with a local score O(v) at each vertex. The local scores can be averaged to produce a global score. In (b) and (c) I(v) represents the input for vertex v, and H F (v) and H B (v) are the corresponding hidden variables. ∗ ∗ set Ev of edges adjacent to vertex v and Fv = 2|Ev ∩ Ev |/(|Ev | + |Ev |) with the ¯ global average F = v Fv /|V |. If n and n∗ are the average degrees of G and G∗ , it can be shown that: F1 = 1 |V | v 2|Ev ∩ E ∗ | n + n∗ and 1 ¯ F = |V | v 2|Ev ∩ E ∗ | n + v + n∗ + ∗ v (2) where n + v (resp. n∗ + ∗ ) is the degree of v in G (resp. in G∗ ). In particular, if G v ¯ ¯ and G∗ are regular graphs, then F1 (E) = F (E) so that F is a good approximation to F1 . In the contact map regime where the number of contacts grows linearly with the length of the sequence, we should have in general |E| ≈ |E ∗ | ≈ (1 + α)|V | so that each node on average has n = n∗ = 2(1 + α) edges. The value of α depends of course on the neighborhood cutoff. As in reinforcement learning, to learn the scoring function one is faced with the problem of generating good training sets in a high dimensional space, where the states are the topologies (graphs), and the policies are algorithms for adding a single edge to a given graph. In the simulations we adopt several different strategies including static and dynamic generation. Within dynamic generation we use three exploration strategies: random exploration (successor graph chosen at random), pure exploitation (successor graph maximizes the current scoring function), and semi-uniform exploitation to find a balance between exploration and exploitation [with probability (resp. 1 − ) we choose random exploration (resp. pure exploitation)]. 3 GRNN Architectures Inference and learning in the protein GIOHMMs we have described is computationally intensive due to the large number of undirected loops they contain. This problem can be addressed using a neural network reparameterization assuming that: (a) all the nodes in the graphs are associated with a deterministic vector (note that in the case of the output nodes this vector can represent a probability distribution so that the overall model remains probabilistic); (b) each vector is a deterministic function of its parents; (c) each function is parameterized using a neural network (or some other class of approximators); and (d) weight-sharing or stationarity is used between similar neural networks in the model. For example, in the 2D GIOHMM contact map predictor, we can use a total of 5 neural networks to recursively compute the four hidden states and the output in each column in the form: NW NE SW SE Oij = NO (Iij , Hi,j , Hi,j , Hi,j , Hi,j ) NE NE NE Hi,j = NN E (Ii,j , Hi−1,j , Hi,j−1 ) N NW NW Hi,jW = NN W (Ii,j , Hi+1,j , Hi,j−1 ) SW SW SW Hi,j = NSW (Ii,j , Hi+1,j , Hi,j+1 ) SE SE SE Hi,j = NSE (Ii,j , Hi−1,j , Hi,j+1 ) (3) N In the NE plane, for instance, the boundary conditions are set to Hij E = 0 for i = 0 N or j = 0. The activity vector associated with the hidden unit Hij E depends on the NE NE local input Iij , and the activity vectors of the units Hi−1,j and Hi,j−1 . Activity in NE plane can be propagated row by row, West to East, and from the first row to the last (from South to North), or column by column South to North, and from the first column to the last. These GRNN architectures can be trained by gradient descent by unfolding the structures in space, leveraging the acyclic nature of the underlying GIOHMMs. 4 Data Many data sets are available or can be constructed for training and testing purposes, as described in the references. The data sets used in the present simulations are extracted from the publicly available Protein Data Bank (PDB) and then redundancy reduced, or from the non-homologous subset of PDB Select (ftp://ftp.emblheidelberg.de/pub/databases/). In addition, we typically exclude structures with poor resolution (less than 2.5-3 ˚), sequences containing less than 30 amino acids, A and structures containing multiple sequences or sequences with chain breaks. For coarse contact maps, we use the DSSP program [9] (CMBI version) to assign secondary structures and we remove also sequences for which DSSP crashes. The results we report for fine-grained contact maps are derived using 424 proteins with lengths in the 30-200 range for training and an additional non-homologous set of 48 proteins in the same length range for testing. For the coarse contact map, we use a set of 587 proteins of length less than 300. Because the average length of a secondary structure element is slightly above 7, the size of a coarse map is roughly 2% the size of the corresponding amino acid map. 5 Simulation Results and Conclusions We have trained several 2D GIOHMM/GRNN models on the direct prediction of fine-grained contact maps. Training of a single model typically takes on the order of a week on a fast workstation. A sample of validation results is reported in Table 1 for four different distance cutoffs. Overall percentages of correctly predicted contacts Table 1: Direct prediction of amino acid contact maps. Column 1: four distance cutoffs. Column 2, 3, and 4: overall percentages of amino acids correctly classified as contacts, non-contacts, and in total. Column 5: Precision percentage for distant contacts (|i − j| ≥ 8) with a threshold of 0.5. Single model results except for last line corresponding to an ensemble of 5 models. Cutoff 6˚ A 8˚ A 10 ˚ A 12 ˚ A 12 ˚ A Contact .714 .638 .512 .433 .445 Non-Contact .998 .998 .993 .987 .990 Total .985 .970 .931 .878 .883 Precision (P) .594 .670 .557 .549 .717 and non-contacts at all linear distances, as well as precision results for distant contacts (|i − j| ≥ 8) are reported for a single GIOHMM/GRNN model. The model has k = 14 hidden units in the hidden and output layers of the four hidden networks, as well as in the hidden layer of the output network. In the last row, we also report as an example the results obtained at 12˚ by an ensemble of 5 networks A with k = 11, 12, 13, 14 and 15. Note that precision for distant contacts exceeds all previously reported results and is well above 50%. For the prediction of coarse-grained contact maps, we use the indirect GIOHMM/GRNN strategy and compare different exploration/exploitation strategies: random exploration, pure exploitation, and their convex combination (semiuniform exploitation). In the semi-uniform case we set the probability of random uniform exploration to = 0.4. In addition, we also try a fourth hybrid strategy in which the search proceeds greedily (i.e. the best successor is chosen at each step, as in pure exploitation), but the network is trained by randomly sub-sampling the successors of the current state. Eight numerical features encode the input label of each node: one-hot encoding of secondary structure classes; normalized linear distances from the N to C terminus; average, maximum and minimum hydrophobic character of the segment (based on the Kyte-Doolittle scale with a moving window of length 7). A sample of results obtained with 5-fold cross-validation is shown in Table 2. Hidden state vectors have dimension k = 5 with no hidden layers. For each strategy we measure performances by means of several indices: micro and macroaveraged precision (mP , M P ), recall (mR, M R) and F1 measure (mF1 , M F1 ). Micro-averages are derived based on each pair of secondary structure elements in each protein, whereas macro-averages are obtained on a per-protein basis, by first computing precision and recall for each protein, and then averaging over the set of all proteins. In addition, we also measure the micro and macro averages for specificity in the sense of percentage of correct prediction for non-contacts (mP (nc), M P (nc)). Note the tradeoffs between precision and recall across the training methods, the hybrid method achieving the best F 1 results. Table 2: Indirect prediction of coarse contact maps with dynamic sampling. Strategy Random exploration Semi-uniform Pure exploitation Hybrid mP .715 .454 .431 .417 mP (nc) .769 .787 .806 .834 mR .418 .631 .726 .790 mF1 .518 .526 .539 .546 MP .767 .507 .481 .474 M P (nc) .709 .767 .793 .821 MR .469 .702 .787 .843 M F1 .574 .588 .596 .607 We have presented two approaches, based on a very general IOHMM/RNN framework, that achieve state-of-the-art performance in the prediction of proteins contact maps at fine and coarse-grained levels of resolution. In principle both methods can be applied to both resolution levels, although the indirect prediction is computationally too demanding for fine-grained prediction of large proteins. Several extensions are currently under development, including the integration of these methods into complete 3D structure predictors. While these systems require long training periods, once trained they can rapidly sift through large proteomic data sets. Acknowledgments The work of PB and GP is supported by a Laurel Wilkening Faculty Innovation award and awards from NIH, BREP, Sun Microsystems, and the California Institute for Telecommunications and Information Technology. The work of PF and AV is partially supported by a MURST grant. References [1] D. Baker and A. Sali. Protein structure prediction and structural genomics. Science, 294:93–96, 2001. [2] P. Baldi and S. Brunak and P. Frasconi and G. Soda and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11):937–946, 1999. [3] P. Baldi and Y. Chauvin. Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7):1541–1565, 1996. [4] P. Baldi and G. Pollastri. Machine learning structural and functional proteomics. IEEE Intelligent Systems. Special Issue on Intelligent Systems in Biology, 17(2), 2002. [5] Y. Bengio and P. Frasconi. Input-output HMM’s for sequence processing. IEEE Trans. on Neural Networks, 7:1231–1249, 1996. [6] P. Fariselli, O. Olmea, A. Valencia, and R. Casadio. Prediction of contact maps with neural networks and correlated mutations. Protein Engineering, 14:835–843, 2001. [7] P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Trans. on Neural Networks, 9:768–786, 1998. [8] Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models Machine Learning, 29:245–273, 1997. [9] W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637, 1983. [10] A. M. Lesk, L. Lo Conte, and T. J. P. Hubbard. Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins, 45, S5:98–118, 2001. [11] G. Pollastri and P. Baldi. Predition of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Proceedings of 2002 ISMB (Intelligent Systems for Molecular Biology) Conference. Bioinformatics, 18, S1:62–70, 2002. [12] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235, 2002. [13] G. Pollastri, P. Baldi, P. Fariselli, and R. Casadio. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153, 2002. [14] M. Vendruscolo, E. Kussell, and E. Domany. Recovery of protein structure from contact maps. Folding and Design, 2:295–306, 1997.
4 0.11502013 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
Author: Tom Heskes
Abstract: We extend recent work on the connection between loopy belief propagation and the Bethe free energy. Constrained minimization of the Bethe free energy can be turned into an unconstrained saddle-point problem. Both converging double-loop algorithms and standard loopy belief propagation can be interpreted as attempts to solve this saddle-point problem. Stability analysis then leads us to conclude that stable fixed points of loopy belief propagation must be (local) minima of the Bethe free energy. Perhaps surprisingly, the converse need not be the case: minima can be unstable fixed points. We illustrate this with an example and discuss implications. 1
5 0.11469825 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
Author: Eleazar Eskin, Jason Weston, William S. Noble, Christina S. Leslie
Abstract: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings. ¡ ¢
6 0.10289155 181 nips-2002-Self Supervised Boosting
7 0.08847703 98 nips-2002-Going Metric: Denoising Pairwise Data
8 0.083507553 54 nips-2002-Combining Dimensions and Features in Similarity-Based Representations
9 0.073402271 194 nips-2002-The Decision List Machine
10 0.065532699 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
11 0.059160255 152 nips-2002-Nash Propagation for Loopy Graphical Games
12 0.059030659 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
13 0.057362642 124 nips-2002-Learning Graphical Models with Mercer Kernels
14 0.056617524 105 nips-2002-How to Combine Color and Shape Information for 3D Object Recognition: Kernels do the Trick
15 0.05384453 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
16 0.050509918 4 nips-2002-A Differential Semantics for Jointree Algorithms
17 0.049881045 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes
18 0.049356114 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
19 0.046430334 99 nips-2002-Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA
20 0.045405082 174 nips-2002-Regularized Greedy Importance Sampling
topicId topicWeight
[(0, -0.135), (1, -0.042), (2, -0.051), (3, 0.045), (4, -0.058), (5, 0.104), (6, -0.057), (7, 0.122), (8, -0.159), (9, -0.054), (10, -0.028), (11, 0.13), (12, -0.092), (13, 0.1), (14, -0.068), (15, -0.08), (16, 0.012), (17, 0.147), (18, -0.019), (19, -0.005), (20, 0.003), (21, -0.04), (22, 0.015), (23, -0.11), (24, -0.057), (25, -0.015), (26, 0.248), (27, 0.279), (28, 0.007), (29, -0.045), (30, 0.01), (31, 0.0), (32, -0.053), (33, -0.019), (34, -0.145), (35, 0.173), (36, 0.074), (37, -0.053), (38, 0.013), (39, 0.04), (40, 0.016), (41, 0.118), (42, -0.013), (43, 0.03), (44, 0.016), (45, -0.109), (46, 0.097), (47, -0.146), (48, 0.084), (49, -0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.96816874 32 nips-2002-Approximate Inference and Protein-Folding
Author: Chen Yanover, Yair Weiss
Abstract: Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF). In cases where exact inference was possible, max-product BP always found the global minimum of the energy (except in few cases where it failed to converge), while other approximation algorithms of similar complexity did not. In the full protein data set, maxproduct BP always found a lower energy configuration than the other algorithms, including a widely used protein-folding software (SCWRL). 1
2 0.61878997 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
Author: Gianluca Pollastri, Pierre Baldi, Alessandro Vullo, Paolo Frasconi
Abstract: We develop and test new machine learning methods for the prediction of topological representations of protein structures in the form of coarse- or fine-grained contact or distance maps that are translation and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to predict topology directly in the fine-grained case and, in the coarsegrained case, indirectly by first learning how to score candidate graphs and then using the scoring function to search the space of possible configurations. Computer simulations show that the predictors achieve state-of-the-art performance. 1 Introduction: Protein Topology Prediction Predicting the 3D structure of protein chains from the linear sequence of amino acids is a fundamental open problem in computational molecular biology [1]. Any approach to the problem must deal with the basic fact that protein structures are translation and rotation invariant. To address this invariance, we have proposed a machine learning approach to protein structure prediction [4] based on the prediction of topological representations of proteins, in the form of contact or distance maps. The contact or distance map is a 2D representation of neighborhood relationships consisting of an adjacency matrix at some distance cutoff (typically in the range of 6 to 12 ˚), or a matrix of pairwise Euclidean distances. Fine-grained maps A are derived at the amino acid or even atomic level. Coarse maps are obtained by looking at secondary structure elements, such as helices, and the distance between their centers of gravity or, as in the simulations below, the minimal distances between their Cα atoms. Reasonable methods for reconstructing 3D coordinates from contact/distance maps have been developed in the NMR literature and elsewhere Oi B Hi F Hi Ii Figure 1: Bayesian network for bidirectional IOHMMs consisting of input units, output units, and both forward and backward Markov chains of hidden states. [14] using distance geometry and stochastic optimization techniques. Thus the main focus here is on the more difficult task of contact map prediction. Various algorithms for the prediction of contact maps have been developed, in particular using feedforward neural networks [6]. The best contact map predictor in the literature and at the last CASP prediction experiment reports an average precision [True Positives/(True Positives + False Positives)] of 21% for distant contacts, i.e. with a linear distance of 8 amino acid or more [6] for fine-grained amino acid maps. While this result is encouraging and well above chance level by a factor greater than 6, it is still far from providing sufficient accuracy for reliable 3D structure prediction. A key issue in this area is the amount of noise that can be tolerated in a contact map prediction without compromising the 3D-reconstruction step. While systematic tests in this area have not yet been published, preliminary results appear to indicate that recovery of as little as half of the distant contacts may suffice for proper reconstruction, at least for proteins up to 150 amino acid long (Rita Casadio and Piero Fariselli, private communication and oral presentation during CASP4 [10]). It is important to realize that the input to a fine-grained contact map predictor need not be confined to the sequence of amino acids only, but may also include evolutionary information in the form of profiles derived by multiple alignment of homologue proteins, or structural feature information, such as secondary structure (alpha helices, beta strands, and coils), or solvent accessibility (surface/buried), derived by specialized predictors [12, 13]. In our approach, we use different GIOHMM and GRNN strategies to predict both structural features and contact maps. 2 GIOHMM Architectures Loosely speaking, GIOHMMs are Bayesian networks with input, hidden, and output units that can be used to process complex data structures such as sequences, images, trees, chemical compounds and so forth, built on work in, for instance, [5, 3, 7, 2, 11]. In general, the connectivity of the graphs associated with the hidden units matches the structure of the data being processed. Often multiple copies of the same hidden graph, but with different edge orientations, are used in the hidden layers to allow direct propagation of information in all relevant directions. Output Plane NE NW 4 Hidden Planes SW SE Input Plane Figure 2: 2D GIOHMM Bayesian network for processing two-dimensional objects such as contact maps, with nodes regularly arranged in one input plane, one output plane, and four hidden planes. In each hidden plane, nodes are arranged on a square lattice, and all edges are oriented towards the corresponding cardinal corner. Additional directed edges run vertically in column from the input plane to each hidden plane, and from each hidden plane to the output plane. To illustrate the general idea, a first example of GIOHMM is provided by the bidirectional IOHMMs (Figure 1) introduced in [2] to process sequences and predict protein structural features, such as secondary structure. Unlike standard HMMs or IOHMMS used, for instance in speech recognition, this architecture is based on two hidden markov chains running in opposite directions to leverage the fact that biological sequences are spatial objects rather than temporal sequences. Bidirectional IOHMMs have been used to derive a suite of structural feature predictors [12, 13, 4] available through http://promoter.ics.uci.edu/BRNN-PRED/. These predictors have accuracy rates in the 75-80% range on a per amino acid basis. 2.1 Direct Prediction of Topology To predict contact maps, we use a 2D generalization of the previous 1D Bayesian network. The basic version of this architecture (Figures 2) contains 6 layers of units: input, output, and four hidden layers, one for each cardinal corner. Within each column indexed by i and j, connections run from the input to the four hidden units, and from the four hidden units to the output unit. In addition, the hidden units in each hidden layer are arranged on a square or triangular lattice, with all the edges oriented towards the corresponding cardinal corner. Thus the parameters of this two-dimensional GIOHMMs, in the square lattice case, are the conditional probability distributions: NE NW SW SE P (Oi |Ii,j , Hi,j , Hi,j , Hi,j , Hi,j, ) NE NE NE P (Hi,j |Ii,j , Hi−1,j , Hi,j−1 ) N NW NW P (Hi,jW |Ii,j , Hi+1,j , Hi,j−1 ) SW SW SW P (Hi,j |Ii,j , Hi+1,j , Hi,j+1 ) SE SE SE P (Hi,j |Ii,j , Hi−1,j , Hi,j+1 ) (1) In a contact map prediction at the amino acid level, for instance, the (i, j) output represents the probability of whether amino acids i and j are in contact or not. This prediction depends directly on the (i, j) input and the four-hidden units in the same column, associated with omni-directional contextual propagation in the hidden planes. In the simulations reported below, we use a more elaborated input consisting of a 20 × 20 probability matrix over amino acid pairs derived from a multiple alignment of the given protein sequence and its homologues, as well as the structural features of the corresponding amino acids, including their secondary structure classification and their relative exposure to the solvent, derived from our corresponding predictors. It should be clear how GIOHMM ideas can be generalized to other data structures and problems in many ways. In the case of 3D data, for instance, a standard GIOHMM would have an input cube, an output cube, and up to 8 cubes of hidden units, one for each corner with connections inside each hidden cube oriented towards the corresponding corner. In the case of data with an underlying tree structure, the hidden layers would correspond to copies of the same tree with different orientations and so forth. Thus a fundamental advantage of GIOHMMs is that they can process a wide range of data structures of variable sizes and dimensions. 2.2 Indirect Prediction of Topology Although GIOHMMs allow flexible integration of contextual information over ranges that often exceed what can be achieved, for instance, with fixed-input neural networks, the models described above still suffer from the fact that the connections remain local and therefore long-ranged propagation of information during learning remains difficult. Introduction of large numbers of long-ranged connections is computationally intractable but in principle not necessary since the number of contacts in proteins is known to grow linearly with the length of the protein, and hence connectivity is inherently sparse. The difficulty of course is that the location of the long-ranged contacts is not known. To address this problem, we have developed also a complementary GIOHMM approach described in Figure 3 where a candidate graph structure is proposed in the hidden layers of the GIOHMM, with the two different orientations naturally associated with a protein sequence. Thus the hidden graphs change with each protein. In principle the output ought to be a single unit (Figure 3b) which directly computes a global score for the candidate structure presented in the hidden layer. In order to cope with long-ranged dependencies, however, it is preferable to compute a set of local scores (Figure 3c), one for each vertex, and combine the local scores into a global score by averaging. More specifically, consider a true topology represented by the undirected contact graph G∗ = (V, E ∗ ), and a candidate undirected prediction graph G = (V, E). A global measure of how well E approximates E ∗ is provided by the informationretrieval F1 score defined by the normalized edge-overlap F1 = 2|E ∩ E ∗ |/(|E| + |E ∗ |) = 2P R/(P + R), where P = |E ∩ E ∗ |/|E| is the precision (or specificity) and R = |E ∩ E ∗ |/|E ∗ | is the recall (or sensitivity) measure. Obviously, 0 ≤ F1 ≤ 1 and F1 = 1 if and only if E = E ∗ . The scoring function F1 has the property of being monotone in the sense that if |E| = |E | then F1 (E) < F1 (E ) if and only if |E ∩ E ∗ | < |E ∩ E ∗ |. Furthermore, if E = E ∪ {e} where e is an edge in E ∗ but not in E, then F1 (E ) > F1 (E). Monotonicity is important to guide the search in the space of possible topologies. It is easy to check that a simple search algorithm based on F1 takes on the order of O(|V |3 ) steps to find E ∗ , basically by trying all possible edges one after the other. The problem then is to learn F1 , or rather a good approximation to F1 . To approximate F1 , we first consider a similar local measure Fv by considering the O I(v) I(v) F B H (v) H (v) (a) I(v) F B H (v) H (v) (b) O(v) (c) Figure 3: Indirect prediction of contact maps. (a) target contact graph to be predicted. (b) GIOHMM with two hidden layers: the two hidden layers correspond to two copies of the same candidate graph oriented in opposite directions from one end of the protein to the other end. The single output O is the global score of how well the candidate graph approximates the true contact map. (c) Similar to (b) but with a local score O(v) at each vertex. The local scores can be averaged to produce a global score. In (b) and (c) I(v) represents the input for vertex v, and H F (v) and H B (v) are the corresponding hidden variables. ∗ ∗ set Ev of edges adjacent to vertex v and Fv = 2|Ev ∩ Ev |/(|Ev | + |Ev |) with the ¯ global average F = v Fv /|V |. If n and n∗ are the average degrees of G and G∗ , it can be shown that: F1 = 1 |V | v 2|Ev ∩ E ∗ | n + n∗ and 1 ¯ F = |V | v 2|Ev ∩ E ∗ | n + v + n∗ + ∗ v (2) where n + v (resp. n∗ + ∗ ) is the degree of v in G (resp. in G∗ ). In particular, if G v ¯ ¯ and G∗ are regular graphs, then F1 (E) = F (E) so that F is a good approximation to F1 . In the contact map regime where the number of contacts grows linearly with the length of the sequence, we should have in general |E| ≈ |E ∗ | ≈ (1 + α)|V | so that each node on average has n = n∗ = 2(1 + α) edges. The value of α depends of course on the neighborhood cutoff. As in reinforcement learning, to learn the scoring function one is faced with the problem of generating good training sets in a high dimensional space, where the states are the topologies (graphs), and the policies are algorithms for adding a single edge to a given graph. In the simulations we adopt several different strategies including static and dynamic generation. Within dynamic generation we use three exploration strategies: random exploration (successor graph chosen at random), pure exploitation (successor graph maximizes the current scoring function), and semi-uniform exploitation to find a balance between exploration and exploitation [with probability (resp. 1 − ) we choose random exploration (resp. pure exploitation)]. 3 GRNN Architectures Inference and learning in the protein GIOHMMs we have described is computationally intensive due to the large number of undirected loops they contain. This problem can be addressed using a neural network reparameterization assuming that: (a) all the nodes in the graphs are associated with a deterministic vector (note that in the case of the output nodes this vector can represent a probability distribution so that the overall model remains probabilistic); (b) each vector is a deterministic function of its parents; (c) each function is parameterized using a neural network (or some other class of approximators); and (d) weight-sharing or stationarity is used between similar neural networks in the model. For example, in the 2D GIOHMM contact map predictor, we can use a total of 5 neural networks to recursively compute the four hidden states and the output in each column in the form: NW NE SW SE Oij = NO (Iij , Hi,j , Hi,j , Hi,j , Hi,j ) NE NE NE Hi,j = NN E (Ii,j , Hi−1,j , Hi,j−1 ) N NW NW Hi,jW = NN W (Ii,j , Hi+1,j , Hi,j−1 ) SW SW SW Hi,j = NSW (Ii,j , Hi+1,j , Hi,j+1 ) SE SE SE Hi,j = NSE (Ii,j , Hi−1,j , Hi,j+1 ) (3) N In the NE plane, for instance, the boundary conditions are set to Hij E = 0 for i = 0 N or j = 0. The activity vector associated with the hidden unit Hij E depends on the NE NE local input Iij , and the activity vectors of the units Hi−1,j and Hi,j−1 . Activity in NE plane can be propagated row by row, West to East, and from the first row to the last (from South to North), or column by column South to North, and from the first column to the last. These GRNN architectures can be trained by gradient descent by unfolding the structures in space, leveraging the acyclic nature of the underlying GIOHMMs. 4 Data Many data sets are available or can be constructed for training and testing purposes, as described in the references. The data sets used in the present simulations are extracted from the publicly available Protein Data Bank (PDB) and then redundancy reduced, or from the non-homologous subset of PDB Select (ftp://ftp.emblheidelberg.de/pub/databases/). In addition, we typically exclude structures with poor resolution (less than 2.5-3 ˚), sequences containing less than 30 amino acids, A and structures containing multiple sequences or sequences with chain breaks. For coarse contact maps, we use the DSSP program [9] (CMBI version) to assign secondary structures and we remove also sequences for which DSSP crashes. The results we report for fine-grained contact maps are derived using 424 proteins with lengths in the 30-200 range for training and an additional non-homologous set of 48 proteins in the same length range for testing. For the coarse contact map, we use a set of 587 proteins of length less than 300. Because the average length of a secondary structure element is slightly above 7, the size of a coarse map is roughly 2% the size of the corresponding amino acid map. 5 Simulation Results and Conclusions We have trained several 2D GIOHMM/GRNN models on the direct prediction of fine-grained contact maps. Training of a single model typically takes on the order of a week on a fast workstation. A sample of validation results is reported in Table 1 for four different distance cutoffs. Overall percentages of correctly predicted contacts Table 1: Direct prediction of amino acid contact maps. Column 1: four distance cutoffs. Column 2, 3, and 4: overall percentages of amino acids correctly classified as contacts, non-contacts, and in total. Column 5: Precision percentage for distant contacts (|i − j| ≥ 8) with a threshold of 0.5. Single model results except for last line corresponding to an ensemble of 5 models. Cutoff 6˚ A 8˚ A 10 ˚ A 12 ˚ A 12 ˚ A Contact .714 .638 .512 .433 .445 Non-Contact .998 .998 .993 .987 .990 Total .985 .970 .931 .878 .883 Precision (P) .594 .670 .557 .549 .717 and non-contacts at all linear distances, as well as precision results for distant contacts (|i − j| ≥ 8) are reported for a single GIOHMM/GRNN model. The model has k = 14 hidden units in the hidden and output layers of the four hidden networks, as well as in the hidden layer of the output network. In the last row, we also report as an example the results obtained at 12˚ by an ensemble of 5 networks A with k = 11, 12, 13, 14 and 15. Note that precision for distant contacts exceeds all previously reported results and is well above 50%. For the prediction of coarse-grained contact maps, we use the indirect GIOHMM/GRNN strategy and compare different exploration/exploitation strategies: random exploration, pure exploitation, and their convex combination (semiuniform exploitation). In the semi-uniform case we set the probability of random uniform exploration to = 0.4. In addition, we also try a fourth hybrid strategy in which the search proceeds greedily (i.e. the best successor is chosen at each step, as in pure exploitation), but the network is trained by randomly sub-sampling the successors of the current state. Eight numerical features encode the input label of each node: one-hot encoding of secondary structure classes; normalized linear distances from the N to C terminus; average, maximum and minimum hydrophobic character of the segment (based on the Kyte-Doolittle scale with a moving window of length 7). A sample of results obtained with 5-fold cross-validation is shown in Table 2. Hidden state vectors have dimension k = 5 with no hidden layers. For each strategy we measure performances by means of several indices: micro and macroaveraged precision (mP , M P ), recall (mR, M R) and F1 measure (mF1 , M F1 ). Micro-averages are derived based on each pair of secondary structure elements in each protein, whereas macro-averages are obtained on a per-protein basis, by first computing precision and recall for each protein, and then averaging over the set of all proteins. In addition, we also measure the micro and macro averages for specificity in the sense of percentage of correct prediction for non-contacts (mP (nc), M P (nc)). Note the tradeoffs between precision and recall across the training methods, the hybrid method achieving the best F 1 results. Table 2: Indirect prediction of coarse contact maps with dynamic sampling. Strategy Random exploration Semi-uniform Pure exploitation Hybrid mP .715 .454 .431 .417 mP (nc) .769 .787 .806 .834 mR .418 .631 .726 .790 mF1 .518 .526 .539 .546 MP .767 .507 .481 .474 M P (nc) .709 .767 .793 .821 MR .469 .702 .787 .843 M F1 .574 .588 .596 .607 We have presented two approaches, based on a very general IOHMM/RNN framework, that achieve state-of-the-art performance in the prediction of proteins contact maps at fine and coarse-grained levels of resolution. In principle both methods can be applied to both resolution levels, although the indirect prediction is computationally too demanding for fine-grained prediction of large proteins. Several extensions are currently under development, including the integration of these methods into complete 3D structure predictors. While these systems require long training periods, once trained they can rapidly sift through large proteomic data sets. Acknowledgments The work of PB and GP is supported by a Laurel Wilkening Faculty Innovation award and awards from NIH, BREP, Sun Microsystems, and the California Institute for Telecommunications and Information Technology. The work of PF and AV is partially supported by a MURST grant. References [1] D. Baker and A. Sali. Protein structure prediction and structural genomics. Science, 294:93–96, 2001. [2] P. Baldi and S. Brunak and P. Frasconi and G. Soda and G. Pollastri. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15(11):937–946, 1999. [3] P. Baldi and Y. Chauvin. Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7):1541–1565, 1996. [4] P. Baldi and G. Pollastri. Machine learning structural and functional proteomics. IEEE Intelligent Systems. Special Issue on Intelligent Systems in Biology, 17(2), 2002. [5] Y. Bengio and P. Frasconi. Input-output HMM’s for sequence processing. IEEE Trans. on Neural Networks, 7:1231–1249, 1996. [6] P. Fariselli, O. Olmea, A. Valencia, and R. Casadio. Prediction of contact maps with neural networks and correlated mutations. Protein Engineering, 14:835–843, 2001. [7] P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data structures. IEEE Trans. on Neural Networks, 9:768–786, 1998. [8] Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models Machine Learning, 29:245–273, 1997. [9] W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577–2637, 1983. [10] A. M. Lesk, L. Lo Conte, and T. J. P. Hubbard. Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins, 45, S5:98–118, 2001. [11] G. Pollastri and P. Baldi. Predition of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Proceedings of 2002 ISMB (Intelligent Systems for Molecular Biology) Conference. Bioinformatics, 18, S1:62–70, 2002. [12] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins, 47:228–235, 2002. [13] G. Pollastri, P. Baldi, P. Fariselli, and R. Casadio. Prediction of coordination number and relative solvent accessibility in proteins. Proteins, 47:142–153, 2002. [14] M. Vendruscolo, E. Kussell, and E. Domany. Recovery of protein structure from contact maps. Folding and Design, 2:295–306, 1997.
3 0.44774243 94 nips-2002-Fractional Belief Propagation
Author: Wim Wiegerinck, Tom Heskes
Abstract: We consider loopy belief propagation for approximate inference in probabilistic graphical models. A limitation of the standard algorithm is that clique marginals are computed as if there were no loops in the graph. To overcome this limitation, we introduce fractional belief propagation. Fractional belief propagation is formulated in terms of a family of approximate free energies, which includes the Bethe free energy and the naive mean-field free as special cases. Using the linear response correction of the clique marginals, the scale parameters can be tuned. Simulation results illustrate the potential merits of the approach.
4 0.41020712 54 nips-2002-Combining Dimensions and Features in Similarity-Based Representations
Author: Daniel J. Navarro, Michael D. Lee
Abstract: unkown-abstract
5 0.39429083 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
Author: Tom Heskes
Abstract: We extend recent work on the connection between loopy belief propagation and the Bethe free energy. Constrained minimization of the Bethe free energy can be turned into an unconstrained saddle-point problem. Both converging double-loop algorithms and standard loopy belief propagation can be interpreted as attempts to solve this saddle-point problem. Stability analysis then leads us to conclude that stable fixed points of loopy belief propagation must be (local) minima of the Bethe free energy. Perhaps surprisingly, the converse need not be the case: minima can be unstable fixed points. We illustrate this with an example and discuss implications. 1
6 0.39156032 194 nips-2002-The Decision List Machine
7 0.32282495 145 nips-2002-Mismatch String Kernels for SVM Protein Classification
8 0.31309116 98 nips-2002-Going Metric: Denoising Pairwise Data
9 0.29976878 4 nips-2002-A Differential Semantics for Jointree Algorithms
10 0.27902889 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes
11 0.26735294 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
12 0.25661555 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
13 0.25307667 152 nips-2002-Nash Propagation for Loopy Graphical Games
14 0.24918605 181 nips-2002-Self Supervised Boosting
15 0.24854499 17 nips-2002-A Statistical Mechanics Approach to Approximate Analytical Bootstrap Averages
16 0.24187127 99 nips-2002-Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA
17 0.22912955 179 nips-2002-Scaling of Probability-Based Optimization Algorithms
18 0.22702989 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
19 0.20239882 174 nips-2002-Regularized Greedy Importance Sampling
20 0.20227882 101 nips-2002-Handling Missing Data with Variational Bayesian Learning of ICA
topicId topicWeight
[(11, 0.015), (42, 0.067), (54, 0.072), (55, 0.031), (57, 0.036), (67, 0.01), (68, 0.045), (74, 0.092), (75, 0.379), (79, 0.047), (87, 0.015), (92, 0.03), (98, 0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.78568721 32 nips-2002-Approximate Inference and Protein-Folding
Author: Chen Yanover, Yair Weiss
Abstract: Side-chain prediction is an important subtask in the protein-folding problem. We show that finding a minimal energy side-chain configuration is equivalent to performing inference in an undirected graphical model. The graphical model is relatively sparse yet has many cycles. We used this equivalence to assess the performance of approximate inference algorithms in a real-world setting. Specifically we compared belief propagation (BP), generalized BP (GBP) and naive mean field (MF). In cases where exact inference was possible, max-product BP always found the global minimum of the energy (except in few cases where it failed to converge), while other approximation algorithms of similar complexity did not. In the full protein data set, maxproduct BP always found a lower energy configuration than the other algorithms, including a widely used protein-folding software (SCWRL). 1
2 0.73676133 15 nips-2002-A Probabilistic Model for Learning Concatenative Morphology
Author: Matthew G. Snover, Michael R. Brent
Abstract: This paper describes a system for the unsupervised learning of morphological suffixes and stems from word lists. The system is composed of a generative probability model and hill-climbing and directed search algorithms. By extracting and examining morphologically rich subsets of an input lexicon, the directed search identifies highly productive paradigms. The hill-climbing algorithm then further maximizes the probability of the hypothesis. Quantitative results are shown by measuring the accuracy of the morphological relations identified. Experiments in English and Polish, as well as comparisons with another recent unsupervised morphology learning algorithm demonstrate the effectiveness of this technique.
3 0.53002787 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals
Author: Tatyana Sharpee, Nicole C. Rust, William Bialek
Abstract: unkown-abstract
4 0.45627373 190 nips-2002-Stochastic Neighbor Embedding
Author: Geoffrey E. Hinton, Sam T. Roweis
Abstract: We describe a probabilistic approach to the task of placing objects, described by high-dimensional vectors or by pairwise dissimilarities, in a low-dimensional space in a way that preserves neighbor identities. A Gaussian is centered on each object in the high-dimensional space and the densities under this Gaussian (or the given dissimilarities) are used to define a probability distribution over all the potential neighbors of the object. The aim of the embedding is to approximate this distribution as well as possible when the same operation is performed on the low-dimensional “images” of the objects. A natural cost function is a sum of Kullback-Leibler divergences, one per object, which leads to a simple gradient for adjusting the positions of the low-dimensional images. Unlike other dimensionality reduction methods, this probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images. This allows ambiguous objects, like the document count vector for the word “bank”, to have versions close to the images of both “river” and “finance” without forcing the images of outdoor concepts to be located close to those of corporate concepts.
5 0.38729641 69 nips-2002-Discriminative Learning for Label Sequences via Boosting
Author: Yasemin Altun, Thomas Hofmann, Mark Johnson
Abstract: This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function. The proposed method combines many of the advantages of boosting schemes with the efficiency of dynamic programming methods and is attractive both, conceptually and computationally. In addition, we also discuss alternative approaches based on the Hamming loss for label sequences. The sequence boosting algorithm offers an interesting alternative to methods based on HMMs and the more recently proposed Conditional Random Fields. Applications areas for the presented technique range from natural language processing and information extraction to computational biology. We include experiments on named entity recognition and part-of-speech tagging which demonstrate the validity and competitiveness of our approach. 1
6 0.3763555 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
7 0.37537387 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
8 0.37182769 135 nips-2002-Learning with Multiple Labels
9 0.36717185 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory
10 0.36620802 152 nips-2002-Nash Propagation for Loopy Graphical Games
11 0.3652156 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
12 0.36364341 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
13 0.36347544 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
14 0.3632195 163 nips-2002-Prediction and Semantic Association
15 0.3629266 53 nips-2002-Clustering with the Fisher Score
16 0.36212197 124 nips-2002-Learning Graphical Models with Mercer Kernels
17 0.36211991 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
19 0.36102116 74 nips-2002-Dynamic Structure Super-Resolution
20 0.36084622 173 nips-2002-Recovering Intrinsic Images from a Single Image