nips nips2004 nips2004-146 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Frank Dimaio, George Phillips, Jude W. Shavlik
Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract X-ray crystallography is currently the most common way protein structures are elucidated. [sent-5, score-0.266]
2 One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. [sent-6, score-0.308]
3 This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. [sent-7, score-0.589]
4 Matching this pictorial structure into the density map is a way of automating density-map interpretation. [sent-8, score-0.517]
5 Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. [sent-9, score-0.507]
6 Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. [sent-14, score-0.214]
7 The most common such method is x-ray crystallography, a rather tedious process in which x-rays are shot through a crystal of purified protein, producing a pattern of spots (or reflections) which is processed, yielding an electron density map. [sent-16, score-0.25]
8 The final step of x-ray crystallography – referred to as interpreting the map – involves fitting a complete molecular model (that is, the position of each atom) of the protein into the map. [sent-18, score-0.394]
9 When interpreting a density map, the amino-acid sequence of the protein is known in advance, giving the complete topology of the protein. [sent-22, score-0.353]
10 However, the intractably large conformational space of a protein – with hundreds of amino acids and thousands of atoms – makes automated map interpretation challenging. [sent-23, score-0.632]
11 This blurring is quantified as the resolution of the density map and is illustrated in Figure 1. [sent-29, score-0.248]
12 3Å or less – automated density map interpretation is essentially solved [1]. [sent-32, score-0.246]
13 The remainder of the paper describes DEFT (DEFormable Template), our computational framework for building a flexible three-dimensional model of a molecule, which is then used to locate patterns in the electron density map. [sent-34, score-0.371]
14 The template represents the object class as a collection of parts linked in a graph structure. [sent-36, score-0.214]
15 For example, a pictorial structure for a face may include the parts "left eye" and "right eye. [sent-38, score-0.399]
16 A dynamic programming (DP) matching algorithm of Felzenszwalb and Huttenlocher (hereafter referred to as the F-H matching algorithm) [5] allows pictorial structures to be quickly matched into a twodimensional image. [sent-40, score-0.585]
17 The matching algorithm finds the globally optimal position and orientation of each part in the pictorial structure, assuming conditional independence on the position of each part given its neighbors. [sent-41, score-0.624]
18 Formally, we represent the pictorial structure as a graph G = (V,E), V = {v1,v2,…,vn} the set of parts, and edge eij ∈ E connecting neighboring parts vi and vj if an explicit dependency exists between the configurations of the corresponding parts. [sent-42, score-0.57]
19 Each part vi is assigned a configuration li describing the part's position and orientation in the image. [sent-43, score-0.455]
20 We assume Markov independence: the probability distribution over a part's configurations is conditionally independent of every other part's configuration, given the configuration of all the part's neighbors in the graph. [sent-44, score-0.271]
21 We assign each edge a deformation cost dij(li,lj), and each part a "mismatch" cost mi(li,I). [sent-45, score-0.224]
22 These functions are the negative log likelihoods of a part (or pair of parts) taking a specified configuration, given the pictorial structure model. [sent-46, score-0.425]
23 That is, it finds the configuration L of parts in model Θ in image I maximizing P ( L I , Θ ) ∝ P ( I L, Θ) P ( L Θ ) = 1⎛ ⎛ ⎞⎞ ⎜ exp⎛ ∑ v ∈V m i (li , I )⎞ ⋅ exp⎜ ∑( v , v )∈E m i (li , I )⎟ ⎟ ⎜ ⎟ i i j ⎝ ⎠ Z⎝ ⎝ ⎠⎠ (1) O C N N Cα Cα C Cβ O Cβ Figure 2. [sent-48, score-0.337]
24 An example of the The right figure shows the arrangement of construction of a pictorial structure atoms that generated the observed density. [sent-51, score-0.632]
25 The F-H matching algorithm places several additional limitations on the pictorial structure. [sent-54, score-0.411]
26 , for proteins, the amino-acid sequence) of that molecule, our task is to determine the Cartesian coordinates in the 3D density map of each atom in the molecule. [sent-60, score-0.428]
27 DEFT finds the coordinates of all atoms simultaneously by first building a pictorial structure corresponding to the protein, then using F-H matching to optimally place the model into the density map. [sent-62, score-0.947]
28 This section describes DEFT's deformation cost function and matching cost function. [sent-63, score-0.261]
29 DEFT's deformation cost is related to the probability of observing a particular configuration of a molecule. [sent-64, score-0.326]
30 However, this potential is quite complicated and cannot be accurately approximated in a tree-structured pictorial structure graph. [sent-66, score-0.354]
31 DEFT constructs a pictorial structure graph where vertices correspond to nonhydrogen atoms, and edges correspond to the covalent bonds joining atoms. [sent-68, score-0.382]
32 The cost function each edge defines maintain invariants – interatomic distance and bond angles – while allowing free rotation around the bond. [sent-69, score-0.23]
33 Each part's configuration is defined by six parameters: three translational, three rotational (Euler angles α, β, and γ ). [sent-71, score-0.259]
34 For the cost function, we define a new connection type in the pictorial structure framework, the screw-joint, shown in Figure 4. [sent-72, score-0.471]
35 The screw-joint's cost function is mathematically specified in terms of a directed version of the pictorial structure's undirected graph. [sent-73, score-0.385]
36 We now define the screw joint in terms of a parent and a child. [sent-75, score-0.192]
37 We rotate the child such that its z axis is coincident with the vector from child to parent, and allow each part in the model (that is, each atom) to freely rotate about its local z axis. [sent-76, score-0.554]
38 The ideal geometry between child and parent is then described by three parameters stored at each edge, xij = (xij, yij, zij). [sent-77, score-0.308]
39 These three parameters define the optimal translation between parent and child, in the coordinate system of the parent (which in turn is defined such that its z-axis corresponds to the axis connecting it to its parent). [sent-78, score-0.322]
40 In using these to construct the cost function dij, we define the function Tij, which maps a parent vi's configuration li into the configuration lj of that parent's ideal child vj. [sent-79, score-0.925]
41 The expressions for βj and γj define the optimal orientation of each child: +z coincident with the axis that connects child and parent. [sent-82, score-0.344]
42 The screw-joint model sets the deformation cost between parent vi and child vj to the distance between child configuration lj and Tij(li), the ideal child configuration given parent configuration li (Tji in equation (2) is simply the identity function). [sent-84, score-1.494]
43 We use the 1-norm weighted in each dimension, d ij (li , l j ) = Tij (li ) − l j rotate (α i − α j ) = wij 2 2 orient ⎛ ⎞ + wij ⎜ ( β i − β j ) + atan( x′ + y ′ ,− z ′) + (γ j − γ i ) − π 2 + atan( y′, x′) ⎟ ⎠ ⎝ translate ( xi − x j ) − x′ + ( yi − y j ) − y ′ + ( zi − z j ) − z ′ . [sent-85, score-0.306]
44 + wij ( (3) ) rotate orient is the cost of rotating about a bond, wij is the cost In the above equation, wij translate is the cost of translating in x, y or z. [sent-86, score-0.554]
45 of rotating around any other axis, and wij orient translate rotate to 0, and wij and wij to +100. [sent-87, score-0.404]
46 DEFT's screw-joint model sets wij DEFT's match-cost function implementation is based upon Cowtan's fffear algorithm [4]. [sent-88, score-0.196]
47 This algorithm quickly and efficiently calculates the mean squared distance between a weighted 3D template of density and a region in a density map. [sent-89, score-0.391]
48 Given a learned template and a corresponding weight function, fffear uses a Fourier convolution to determine the maximum likelihood that the weighted template generated a region of density in the density map. [sent-90, score-0.687]
49 For each non-hydrogen atom in the protein, we create a target template corresponding to a neighborhood around that particular atom, using a training set of crystallographer-solved structures. [sent-91, score-0.407]
50 We build a separate template for each atom type – e. [sent-92, score-0.407]
51 , the β-carbon (2nd sidechain carbon) of leucine and the backbone oxygen of serine – producing 171 different templates in total. [sent-94, score-0.286]
52 A part's m function is the fffearcomputed mismatch score of that part's template over all positions and orientations. [sent-95, score-0.203]
53 By definition, vj is oriented such that its local z-axis is coincident with it's ideal bond orientation v xij = (xij ,vyij , zij )T in vi. [sent-98, score-0.418]
54 Learning the orientation parameters is fairly simple: for each atom we define canonic coordinates (where +z corresponds to the axis of rotation). [sent-101, score-0.495]
55 We average over all atoms of a given type in our training set – e. [sent-103, score-0.278]
56 A similarly-defined canonic coordinate frame is employed when learning the model templates; in this case, DEFT's learning algorithm computes target and weight templates based on the average and inverse variance over the training set, respectively. [sent-107, score-0.254]
57 Both enhancements can be applied to the general pictorial structure algorithm, and are not specific to DEFT. [sent-115, score-0.395]
58 Since DEFT only models covalent bonds, the matching algorithm sometimes returns a structure with non-bonded atoms impossibly close together. [sent-118, score-0.497]
59 Figure 6 shows such a collision (later corrected by the algorithm). [sent-120, score-0.176]
60 Given a candidate solution, it is straightforward to test for spatial collisions: we simply test if any two atoms in the structure are impossibly (physically) close together. [sent-121, score-0.361]
61 For each atom of a given type – here alanine Cα – we rotate the atom into a canonic orientation. [sent-130, score-0.667]
62 We then average over every atom of that type to get a template and average bond geometry. [sent-131, score-0.501]
63 On the left is a collision (the predicted molecule is in the darker color). [sent-134, score-0.239]
64 The amino acid's sidechain is placed coincident with the backbone. [sent-135, score-0.181]
65 On the right, collision avoidance finds the right structure. [sent-136, score-0.283]
66 If two atoms are both aligned to the same space in the most probable conformation, it seems quite likely that one of the atoms belongs there. [sent-138, score-0.556]
67 When a collision occurs, DEFT finds the closest branch point above the colliding nodes – that is, the root y of the minimum subtree containing all colliding nodes. [sent-140, score-0.499]
68 DEFT considers each child xi of this root, matching the subtree rooted at xi, keeping the remainder of the tree fixed. [sent-141, score-0.357]
69 In the case that the colliding node is due to a chain wrapping around on itself (and not two branches running into one another), the root y is defined as the colliding node nearest to the top of the tree. [sent-144, score-0.227]
70 2 Improved template matching In our original implementation, DEFT learned a template by averaging over each of the 171 atom types. [sent-147, score-0.684]
71 For example, for each of the 12 (non-hydrogen) atoms in the amino-acid tyrosine we build a single template – producing 12 tyrosine templates in total. [sent-148, score-0.701]
72 DEFT improves the template-matching algorithm by modeling the templates using a mixture of Gaussians, a generative model where each template is modeled using a mixture of basis templates. [sent-150, score-0.359]
73 For this paper, our experiments are conducted under the assumption that the mainchain atoms of the protein were known to within some error factor. [sent-166, score-0.492]
74 This assumption is fair; approaches exist for mainchain tracing in density maps [7]. [sent-167, score-0.211]
75 DEFT simply walks along the mainchain, placing atoms one residue at a time (considering each residue independently). [sent-168, score-0.308]
76 Using the training set we built a set of templates for matching using fffear. [sent-170, score-0.298]
77 The templates extended to a 6Å radius around each atom at 0. [sent-171, score-0.428]
78 Two sets of templates were built and subsequently matched: a large set of 171 produced by averaging all training set templates for each atom type, and a smaller set of 24 learned through by the EM algorithm. [sent-173, score-0.618]
79 We ran DEFT's pictorial structure matching algorithm using both sets of templates, with and without the collision detection code. [sent-174, score-0.672]
80 Although placing individual atoms into the sidechain is fairly quick, taking less than six hours for a 200-residue protein, computing fffear match scores is very CPUdemanding. [sent-175, score-0.558]
81 For each of our 171 templates, fffear takes 3-5 CPU-hours to compute the match score at each location in the image, for a total of one CPU-month to match templates into each protein! [sent-176, score-0.413]
82 Using individual-atom templates and the collision detection code, the all-atom RMS deviation varied from 1. [sent-179, score-0.4]
83 Using the EM-based clusters as templates produced slight or no improvement. [sent-182, score-0.19]
84 However, much less work is required; only 24 templates need to be matched to the image instead of 171 individual-atom templates. [sent-183, score-0.22]
85 Finally, it was promising that collision detection leads to significant error reduction. [sent-184, score-0.21]
86 0 Test Protein RMS Deviation It is interesting to note that individually using the improved templates and using the collision avoidance both improved the search results; however, using both together was a bit worse than with collision detection alone. [sent-186, score-0.668]
87 Further investigation is also needed balancing between the number and templates and template size. [sent-188, score-0.359]
88 0 collision detection + improved templates collision detection only 2. [sent-192, score-0.637]
89 6 Conclusions and future wo rk DEFT has applied the F-H pictorial structure matching algorithm to the task of interpreting electron density maps. [sent-201, score-0.76]
90 In order to model atoms rotating in 3D, we designed another joint type, the screw joint. [sent-203, score-0.339]
91 We also developed extensions to deal with spatial collisions of parts in the model, and implemented a slightly-improved template construction routine. [sent-204, score-0.283]
92 DEFT attempts to bridge the gap between two types of model-fitting approaches for interpreting electron density maps. [sent-206, score-0.298]
93 On the other hand, fffear [4] has had success finding rigid elements in very poor resolution maps, but is unable to locate highly flexible “loops”. [sent-209, score-0.297]
94 Our work extends the resolution threshold at which individual atoms can be identified in electron density maps. [sent-210, score-0.581]
95 DEFT's flexible model combines weakly-matching image templates to locate individual atoms from maps where individual atoms have been blurred away. [sent-211, score-0.971]
96 Rather than model the configuration of each individual atom, instead treat each amino acid as a single part in the flexible template, only modeling rotations along the backbone. [sent-214, score-0.459]
97 A different optimization algorithm that handles cycles in the pictorial structure graph would better handle collisions (allowing edges between non-bonded atoms). [sent-216, score-0.423]
98 The flexible molecular template we have described has the potential to produce an atomic model in a map where individual atoms may not be visible, through the power of combining weakly matching image templates. [sent-220, score-0.755]
99 A new software routine that automates the fitting of protein X-ray crystallographic electron density maps. [sent-233, score-0.471]
100 TEXTAL: a pattern recognition system for interpreting electron density maps. [sent-241, score-0.298]
wordName wordTfidf (topN-words)
[('deft', 0.524), ('pictorial', 0.303), ('atoms', 0.278), ('atom', 0.238), ('configuration', 0.223), ('templates', 0.19), ('collision', 0.176), ('template', 0.169), ('protein', 0.166), ('fffear', 0.127), ('child', 0.114), ('density', 0.111), ('electron', 0.111), ('matching', 0.108), ('rotate', 0.095), ('bond', 0.094), ('parent', 0.093), ('flexible', 0.084), ('colliding', 0.079), ('interpreting', 0.076), ('li', 0.074), ('xij', 0.072), ('tij', 0.071), ('finds', 0.069), ('collisions', 0.069), ('wij', 0.069), ('define', 0.067), ('orientation', 0.066), ('canonic', 0.064), ('coincident', 0.064), ('crystallography', 0.064), ('sidechain', 0.064), ('subtree', 0.063), ('molecule', 0.063), ('zij', 0.055), ('resolution', 0.053), ('amino', 0.053), ('deformation', 0.053), ('vi', 0.053), ('map', 0.052), ('maps', 0.052), ('structure', 0.051), ('cost', 0.05), ('configurations', 0.048), ('conformation', 0.048), ('cowtan', 0.048), ('crystallographic', 0.048), ('mainchain', 0.048), ('reflections', 0.048), ('automated', 0.045), ('parts', 0.045), ('enhancements', 0.041), ('tji', 0.041), ('rooted', 0.04), ('part', 0.039), ('proteins', 0.039), ('vj', 0.038), ('interpretation', 0.038), ('yij', 0.038), ('avoidance', 0.038), ('phillips', 0.038), ('orient', 0.038), ('defined', 0.036), ('molecular', 0.036), ('structures', 0.036), ('routine', 0.035), ('acta', 0.035), ('translate', 0.035), ('score', 0.034), ('detection', 0.034), ('root', 0.033), ('locate', 0.033), ('axis', 0.033), ('edge', 0.032), ('remainder', 0.032), ('alanine', 0.032), ('atan', 0.032), ('blurring', 0.032), ('euler', 0.032), ('impossibly', 0.032), ('leucine', 0.032), ('scorei', 0.032), ('screw', 0.032), ('specified', 0.032), ('tyrosine', 0.032), ('acid', 0.032), ('match', 0.031), ('placing', 0.03), ('matched', 0.03), ('ideal', 0.029), ('rotating', 0.029), ('refinement', 0.028), ('defines', 0.028), ('nlm', 0.028), ('covalent', 0.028), ('crystal', 0.028), ('individual', 0.028), ('improved', 0.027), ('coordinates', 0.027), ('rotation', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
Author: Frank Dimaio, George Phillips, Jude W. Shavlik
Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use
2 0.089349262 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag
Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high affinity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we find that 2/3 of the predicted active motifs correspond to actual interaction sites. 1
3 0.064923033 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing
Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann
Abstract: De novo Sequencing of peptides is a challenging task in proteome research. While there exist reliable DNA-sequencing methods, the highthroughput de novo sequencing of proteins by mass spectrometry is still an open problem. Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. Experiments effectively demonstrate that this new method significantly outperforms standard approaches in matching quality. 1
4 0.062909998 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
5 0.051204111 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
Author: Aharon Bar-hillel, Adam Spiro, Eran Stark
Abstract: Spike sorting involves clustering spike trains recorded by a microelectrode according to the source neuron. It is a complicated problem, which requires a lot of human labor, partly due to the non-stationary nature of the data. We propose an automated technique for the clustering of non-stationary Gaussian sources in a Bayesian framework. At a first search stage, data is divided into short time frames and candidate descriptions of the data as a mixture of Gaussians are computed for each frame. At a second stage transition probabilities between candidate mixtures are computed, and a globally optimal clustering is found as the MAP solution of the resulting probabilistic model. Transition probabilities are computed using local stationarity assumptions and are based on a Gaussian version of the Jensen-Shannon divergence. The method was applied to several recordings. The performance appeared almost indistinguishable from humans in a wide range of scenarios, including movement, merges, and splits of clusters. 1
6 0.048700228 161 nips-2004-Self-Tuning Spectral Clustering
7 0.045241617 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces
8 0.044436052 44 nips-2004-Conditional Random Fields for Object Recognition
9 0.044317849 177 nips-2004-Supervised Graph Inference
10 0.04415682 152 nips-2004-Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
11 0.043913893 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
12 0.040021699 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
13 0.038318355 82 nips-2004-Incremental Algorithms for Hierarchical Classification
14 0.036784526 40 nips-2004-Common-Frame Model for Object Recognition
15 0.036662564 43 nips-2004-Conditional Models of Identity Uncertainty with Application to Noun Coreference
16 0.036577266 99 nips-2004-Learning Hyper-Features for Visual Identification
17 0.036141068 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
18 0.035446968 169 nips-2004-Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes
19 0.035375763 107 nips-2004-Making Latin Manuscripts Searchable using gHMM's
20 0.034964047 52 nips-2004-Discrete profile alignment via constrained information bottleneck
topicId topicWeight
[(0, -0.117), (1, 0.01), (2, -0.028), (3, -0.065), (4, 0.023), (5, 0.024), (6, 0.003), (7, 0.036), (8, -0.065), (9, -0.07), (10, 0.014), (11, -0.008), (12, -0.02), (13, 0.007), (14, 0.009), (15, -0.001), (16, 0.008), (17, -0.041), (18, 0.014), (19, -0.001), (20, -0.03), (21, 0.076), (22, -0.09), (23, -0.011), (24, 0.083), (25, -0.056), (26, -0.081), (27, -0.089), (28, 0.018), (29, -0.005), (30, -0.246), (31, 0.019), (32, 0.037), (33, -0.062), (34, 0.077), (35, 0.054), (36, 0.095), (37, 0.101), (38, -0.021), (39, 0.13), (40, -0.036), (41, 0.091), (42, -0.106), (43, -0.073), (44, -0.045), (45, 0.131), (46, -0.167), (47, 0.091), (48, 0.146), (49, -0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.94678909 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
Author: Frank Dimaio, George Phillips, Jude W. Shavlik
Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use
2 0.75116289 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
3 0.71594518 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag
Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high affinity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we find that 2/3 of the predicted active motifs correspond to actual interaction sites. 1
4 0.64649856 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing
Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann
Abstract: De novo Sequencing of peptides is a challenging task in proteome research. While there exist reliable DNA-sequencing methods, the highthroughput de novo sequencing of proteins by mass spectrometry is still an open problem. Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. Experiments effectively demonstrate that this new method significantly outperforms standard approaches in matching quality. 1
5 0.39751902 52 nips-2004-Discrete profile alignment via constrained information bottleneck
Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin
Abstract: Amino acid profiles, which capture position-specific mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, profile comparisons are much more computationally expensive than discrete symbol comparisons, making profiles impractical for many large datasets. Furthermore, because they are such a rich representation, profiles can be difficult to visualize. To overcome these problems, we propose a discretization for profiles using an expanded alphabet representing not just individual amino acids, but common profiles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we find an informationally optimal alphabet. This discretization yields a concise, informative textual representation for profile sequences. Also alignments between these sequences, while nearly as accurate as the full profileprofile alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using profiles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which profile information can be applied. 1
6 0.34630787 177 nips-2004-Supervised Graph Inference
7 0.33890995 57 nips-2004-Economic Properties of Social Networks
8 0.33130261 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
9 0.30052587 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data
10 0.29708731 141 nips-2004-Optimal sub-graphical models
11 0.28977528 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
12 0.28926402 109 nips-2004-Mass Meta-analysis in Talairach Space
13 0.28777242 122 nips-2004-Modelling Uncertainty in the Game of Go
14 0.28330824 21 nips-2004-An Information Maximization Model of Eye Movements
15 0.27685317 130 nips-2004-Newscast EM
16 0.27118257 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis
17 0.27025506 29 nips-2004-Beat Tracking the Graphical Model Way
18 0.25744304 74 nips-2004-Harmonising Chorales by Probabilistic Inference
19 0.2558876 165 nips-2004-Semi-supervised Learning on Directed Graphs
20 0.25083074 51 nips-2004-Detecting Significant Multidimensional Spatial Clusters
topicId topicWeight
[(13, 0.057), (15, 0.091), (17, 0.01), (26, 0.029), (31, 0.018), (33, 0.163), (35, 0.025), (39, 0.012), (41, 0.407), (50, 0.038), (51, 0.017), (87, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.78090972 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
Author: Frank Dimaio, George Phillips, Jude W. Shavlik
Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use
2 0.66221082 51 nips-2004-Detecting Significant Multidimensional Spatial Clusters
Author: Daniel B. Neill, Andrew W. Moore, Francisco Pereira, Tom M. Mitchell
Abstract: Assume a uniform, multidimensional grid of bivariate data, where each cell of the grid has a count ci and a baseline bi . Our goal is to find spatial regions (d-dimensional rectangles) where the ci are significantly higher than expected given bi . We focus on two applications: detection of clusters of disease cases from epidemiological data (emergency department visits, over-the-counter drug sales), and discovery of regions of increased brain activity corresponding to given cognitive tasks (from fMRI data). Each of these problems can be solved using a spatial scan statistic (Kulldorff, 1997), where we compute the maximum of a likelihood ratio statistic over all spatial regions, and find the significance of this region by randomization. However, computing the scan statistic for all spatial regions is generally computationally infeasible, so we introduce a novel fast spatial scan algorithm, generalizing the 2D scan algorithm of (Neill and Moore, 2004) to arbitrary dimensions. Our new multidimensional multiresolution algorithm allows us to find spatial clusters up to 1400x faster than the naive spatial scan, without any loss of accuracy. 1
3 0.43844509 77 nips-2004-Hierarchical Clustering of a Mixture Model
Author: Jacob Goldberger, Sam T. Roweis
Abstract: In this paper we propose an efficient algorithm for reducing a large mixture of Gaussians into a smaller mixture while still preserving the component structure of the original model; this is achieved by clustering (grouping) the components. The method minimizes a new, easily computed distance measure between two Gaussian mixtures that can be motivated from a suitable stochastic model and the iterations of the algorithm use only the model parameters, avoiding the need for explicit resampling of datapoints. We demonstrate the method by performing hierarchical clustering of scenery images and handwritten digits. 1
4 0.43839535 54 nips-2004-Distributed Information Regularization on Graphs
Author: Adrian Corduneanu, Tommi S. Jaakkola
Abstract: We provide a principle for semi-supervised learning based on optimizing the rate of communicating labels for unlabeled points with side information. The side information is expressed in terms of identities of sets of points or regions with the purpose of biasing the labels in each region to be the same. The resulting regularization objective is convex, has a unique solution, and the solution can be found with a pair of local propagation operations on graphs induced by the regions. We analyze the properties of the algorithm and demonstrate its performance on document classification tasks. 1
5 0.43832916 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
Author: Francis R. Bach, Michael I. Jordan
Abstract: We present an algorithm to perform blind, one-microphone speech separation. Our algorithm separates mixtures of speech without modeling individual speakers. Instead, we formulate the problem of speech separation as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these features into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artificially superposing separately-recorded signals. Thus the parameters of the affinity matrices can be tuned using recent work on learning spectral clustering [1]. This yields an adaptive, speech-specific segmentation algorithm that can successfully separate one-microphone speech mixtures. 1
6 0.43775177 44 nips-2004-Conditional Random Fields for Object Recognition
7 0.43759221 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
8 0.43751013 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
9 0.43721953 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
10 0.43660206 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
11 0.4363454 179 nips-2004-Surface Reconstruction using Learned Shape Models
12 0.43597522 127 nips-2004-Neighbourhood Components Analysis
13 0.43591756 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
14 0.43590868 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
15 0.43586633 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
16 0.43567988 161 nips-2004-Self-Tuning Spectral Clustering
17 0.43566811 99 nips-2004-Learning Hyper-Features for Visual Identification
18 0.43483928 102 nips-2004-Learning first-order Markov models for control
19 0.43480852 166 nips-2004-Semi-supervised Learning via Gaussian Processes
20 0.43444782 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss