nips nips2004 nips2004-95 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ie Abstract The formation of disulphide bridges among cysteines is an important feature of protein structures. [sent-5, score-1.329]
2 Here we develop new methods for the prediction of disulphide bond connectivity. [sent-6, score-0.782]
3 We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. [sent-7, score-1.184]
4 In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. [sent-10, score-0.41]
5 Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. [sent-11, score-0.207]
6 The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. [sent-12, score-1.014]
7 The method also yields an estimate for the total number of disulphide bridges in each chain. [sent-13, score-0.796]
8 1 Introduction The formation of covalent links among cysteine (Cys) residues with disulphide bridges is an important and unique feature of protein folding and structure. [sent-14, score-1.094]
9 Simulations [1], experiments in protein engineering [15, 8, 14], theoretical studies [7, 18], and even evolutionary models [9] stress the importance of disulphide bonds in stabilizing the native state of proteins. [sent-15, score-1.019]
10 Disulphide bridges may link distant portions of a protein sequence, providing strong structural constraints in the form of long-range interactions. [sent-16, score-0.336]
11 Thus prediction/knowledge of the disulphide connectivity of a protein is important and provides essential insights into its structure and possibly also into its function and evolution. [sent-17, score-0.936]
12 Only recently has the problem of predicting disulphide bridges received increased attention. [sent-18, score-0.801]
13 of the actual pairings between bonded cysteines (see Fig. [sent-21, score-0.75]
14 In this paper, we address the problem of intra-chain connectivity prediction, and AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRKMHHTCPCAPNLACVQTSPKKFKCLSK Figure 1: Structure (top) and connectivity pattern (bottom) of intestinal toxin 1, PDB code 1IMT. [sent-23, score-0.434]
15 Disulphide bonds in the structure are shown as thick lines. [sent-24, score-0.226]
16 Existing approaches to connectivity prediction use stochastic global optimization [10], combinatorial optimization [13] and machine learning techniques [11, 17]. [sent-26, score-0.324]
17 The method in [10] represents the set of potential disulphide bridges in a sequence as a complete weighted undirected graph. [sent-27, score-0.805]
18 Vertices are oxidized cysteines and edges are labeled by the strength of interaction (contact potential) in the associated pair of cysteines. [sent-28, score-0.55]
19 After a complete labeled graph is obtained, candidate bridges are then located by finding the maximum weight perfect matching1 . [sent-30, score-0.297]
20 Unfortunately, for computational reasons, both this method and the previous one can only deal with sequences containing a limited number of bonds (K ≤ 5). [sent-37, score-0.251]
21 A different approach to predicting disulphide bridges is reported in [13], where finding disulphide bridges is part of a more general protocol aimed at predicting the topology of β-sheets in proteins. [sent-38, score-1.629]
22 Residue-to-residue contacts (including Cys-Cys bridges) are predicted by solving a series of integer linear programming problems in which customized hydrophobic contact energies must be maximized. [sent-39, score-0.099]
23 This method cannot be compared with the other approaches because the authors report validation results only for two relatively short polypeptides with few bonds (2 and 3). [sent-40, score-0.263]
24 In this paper we use 2-Dimensional Recursive Neural Network (2D-RNN, [4]) to predict disulphide connectivity in proteins starting from their primary sequence and its homologues. [sent-41, score-0.863]
25 The output of 2D-RNN are the pairwise probabilities of the existence of a bridge between any pair of cysteines. [sent-42, score-0.119]
26 Candidate disulphide connectivities are predicted by finding the maximum weight perfect matching. [sent-43, score-0.671]
27 The proposed framework represents a significant improvement in disulphide connectivity prediction for several reasons. [sent-44, score-0.884]
28 Second, our architecture can easily cope with chains with arbitrary number 1 A perfect matching of a graph (V, E) is a subset E ⊆ E such that each vertex v ∈ V is met by only one edge in E . [sent-46, score-0.178]
29 The hidden planes contain directed edges associated with the square lattices. [sent-49, score-0.176]
30 All the edges of the square lattice in each hidden plane are oriented in the direction of one of the four possible cardinal corners: NE, NW, SW, SE. [sent-50, score-0.237]
31 Additional directed edges run vertically in column from the input plane to each hidden plane, and from each hidden plane to the output plane. [sent-51, score-0.393]
32 Therefore, it overcomes the limitation of previous approaches which restrict predictions to chains with no more than 10 oxidized cysteines. [sent-55, score-0.189]
33 Third, our methods can be applied both to situations where the bonded state of each cysteine is known or unknown. [sent-56, score-0.524]
34 Here the underlying directed graph for disulphide connectivity has six 2D-layers: input, output, and four hidden layers (Figure 2(a)). [sent-60, score-0.945]
35 Vertical connections, within an (i, j) column, run from input to hidden and output layers, and from hidden layers to output (Figure 2(b)). [sent-61, score-0.217]
36 In each one of the four hidden planes, square lattice connections are oriented towards one of the four cardinal corners. [sent-62, score-0.154]
37 In a disulphide contact map prediction, the (i, j) output represents the probability of whether the i-th and j-th cysteines in the sequence are linked by a disulphide bridge or not. [sent-66, score-1.705]
38 This prediction depends directly on the (i, j) input and the four-hidden units in the same column, associated with omni-directional contextual propagation in the hidden planes. [sent-67, score-0.193]
39 For a sequence of length N and containing M cysteines, the output layer contains M × M units. [sent-71, score-0.107]
40 The input and hidden layer can scale like N × N if the full sequence is used, or like M × M if only fixed-size windows around each cysteine are used, as in the experiments reported here. [sent-72, score-0.279]
41 The input of each position within a window is the normalized frequency of all 20 amino acids at that position in the multiple alignment generated by aligning the sequence with the sequences in the NR database using the PSI-BLAST program as described, for instance, in [16]. [sent-74, score-0.141]
42 In the first mode, we can assume that the bonded state of the individual cysteines is known, for instance through the use of a specialized predictor for bonded/non-bonded states. [sent-78, score-0.806]
43 Then if the sequence contains M cysteines, 2K (2K ≤ M ) of which are intra-chain disulphide bonded, the prediction of the connectivity can focus on the 2K bonded cysteines exclusively and ignore the remaining M − 2K cysteines that are not bonded. [sent-79, score-2.079]
44 In the second mode, we can try to solve both prediction problems–bond state and connectivity–at the same time by focusing on all cysteines in a given sequence. [sent-80, score-0.576]
45 In both cases, the output is an array of pairwise probabilities from which the connectivity pattern graph must be inferred. [sent-81, score-0.332]
46 In the first case, the total number of bonds or edges in the connectivity graph is known (K). [sent-82, score-0.517]
47 In section 3, we show that sum of all probabilities across the output array can be used to estimate the number of disulphide contacts. [sent-84, score-0.642]
48 Data Preparation In order to assess our method, two data sets of known disulphide connectivities were compiled from the Swiss-Prot archive [2]. [sent-85, score-0.664]
49 We filtered out proteins with disulphide bonds assigned tentatively or disulphide bonds inferred by similarity. [sent-93, score-1.626]
50 We finally ended up with 966 chains, each with a number of disulphide bonds in the range of 1 to 24. [sent-94, score-0.786]
51 As previously pointed out, our methodology is not limited by the number of disulphide bonds, hence we were able to retain and test the algorithm on the whole filtered set of non-trivial chains. [sent-95, score-0.578]
52 This set consists of 712 sequences, each containing at least two bridges (K ≥ 2)–the case K = 1 being trivial when the bonded state is known. [sent-96, score-0.606]
53 By comparison, SP39 includes 446 chains with no more than 5 bridges; SP41 additionally includes 266 sequences and 112 of these have more than 10 oxidized cysteines. [sent-97, score-0.159]
54 Graph Matching to Derive Connectivity from Output Probabilities In the case where the bonded state of the cysteines is known, one has a graph with 2K nodes, one for each bonded cysteine. [sent-100, score-1.176]
55 The problem is then to find a connectivity pattern with K edges and maximum weight, where each cysteine is paired uniquely with another cysteine. [sent-102, score-0.418]
56 The maximum weight matching algorithm of Gabow [12] is used to chosen paired cysteines (edges), whose time complexity is cubic O(V 3 ) = O(K 3 ), where V is the number of vertices and linear O(V ) = O(K) space complexity beyond the storage of the graph. [sent-103, score-0.474]
57 Note that because the number of bonded cysteines in general is not very large, it is also possible in many cases to use an exhaustive search of all possible combinations. [sent-104, score-0.75]
58 (2K − 1) which yields 945 connectivity patterns in the case of 10 bonded cysteines. [sent-108, score-0.536]
59 The case where the bonded state of the cysteines is not known is slightly more involved and the Gabow algorithm cannot be applied directly since the graph has M nodes but, if some of the cysteines are not bonded, only a subset of 2K < M nodes participate in the final maximum weighted matching. [sent-109, score-1.296]
60 Alternatively, we use a greedy algorithm to derive the connectivity pattern using the estimate of the total number of bonds. [sent-110, score-0.285]
61 Then we pick the next edge with highest probability that is not incident to the first edge and so forth, until K edges have been selected. [sent-113, score-0.106]
62 For L reasonably large, the optimal connectivity pattern can usually be found. [sent-121, score-0.233]
63 We have compared this method with Gabow’s algorithm in the case where the bonding state is known and observed that when L = 6, this greedy heuristic yields results that are as good as those obtained with Gabow’s algorithm which, in this case, is guaranteed to find a global optimum. [sent-122, score-0.265]
64 It is important to note that this method ends up by producing a prediction of both the connectivity pattern and of the bonding state of each cysteine. [sent-125, score-0.572]
65 3 Results Disulphide Connectivity Prediction for Bonded Cysteines Here we assume that the bonding state is known. [sent-126, score-0.234]
66 44) Table 1: Disulphide connectivity prediction with 2D-RNN assuming the bonding state is known. [sent-152, score-0.54]
67 * denote levels of precision that exceeds previously reported best results in the literature [17] (in parentheses). [sent-154, score-0.12]
68 Figure 3: Correlation between number of bonded cysteines (2K) and i=j Oi,j log M . [sent-155, score-0.75]
69 For instance, in the case of 3 bonded cysteines, the precision reaches 0. [sent-158, score-0.409]
70 51 at the pair and pattern levels, whereas the best similar results reported in the literature are 0. [sent-160, score-0.102]
71 Estimation of the Number K of Bonds Analysis of the prediction results shows that there is a relationship between the sum of all the probabilities, i=j Oi,j , in the graph (or the output layer of the 2D-RNN) and the total number of bonded cysteines (2K). [sent-163, score-0.97]
72 As shown in Figure 3, there is a reasonably linear relationship between the total number 2K of bonded cysteines and the product i=j Oi,j log M , where M is the total number of cysteines in the sequence being considered. [sent-168, score-1.237]
73 Using this, we estimate the total number of bonded cysteines using linear regression and rounding off, making sure that the total number is even and does not exceed the total number of cysteines in the sequence. [sent-172, score-1.228]
74 03 Table 2: Prediction of disulphide connectivity pattern with 2D-RNN on all the cysteines, without assuming knowledge of the bonding state. [sent-186, score-0.989]
75 Disulphide Connectivity Prediction from Scratch In the last set of experiments, we do not assume any knowledge of the bonding state and apply the 2D-RNN approach to all the cysteines (both bonded and not bonded) in each sequence. [sent-187, score-0.984]
76 We predict the number of bonds, the bonding state, and connectivity pattern using one predictor. [sent-188, score-0.411]
77 Table 3 shows the kind of results that are obtained when the method is applied to sequences with more than K = 5 bonds in SP41. [sent-191, score-0.233]
78 Finally, the precision of bonded state prediction is 0. [sent-193, score-0.57]
79 85, and the recall of bonded state prediction is 0. [sent-194, score-0.521]
80 The precision and recall of bond number prediction is 0. [sent-196, score-0.303]
81 The average absolute difference between true bond and predicted bond number is 0. [sent-198, score-0.223]
82 The average absolute difference between true bond number and wrongly predicted bond number is 1. [sent-200, score-0.223]
83 24 Table 3: Prediction of disulphide connectivity pattern with 2D-RNN on all the cysteines, without assuming knowledge of the bonding state and when the number of bridges K exceeds 5. [sent-215, score-1.242]
84 4 Conclusion We have presented a complete system for disulphide connectivity prediction in cysteinerich proteins. [sent-216, score-0.884]
85 Assuming knowledge of cysteine bonding state, the method outperforms existing approaches on the same validation data. [sent-217, score-0.366]
86 The results also show that the 2D-RNN method achieves good recall and accuracy on the prediction of connectivity pattern even when the bonding state of individual cysteines is not known. [sent-218, score-1.012]
87 Differently from previous approaches, our method can be applied to chains with K > 5 bonds and yields good, cooperative, predictions of the total number of bonds, as well as of the bonding states and bond locations. [sent-219, score-0.6]
88 Training can take days but once trained predictions can be carried on a proteomic or protein engineering scale. [sent-220, score-0.158]
89 The current version of our disulphide prediction server DIpro (which includes step (a)) is available through: http://www. [sent-222, score-0.683]
90 What can disulfide bonds tell us about protein energetics, function and folding: simulations and bioinformatics analysis. [sent-233, score-0.378]
91 The SWISS-PROT protein sequence database and its supplement TrEMBL. [sent-241, score-0.169]
92 The principled design of large-scale recursive neural network architectures–dag-rnns and the protein structure prediction problem. [sent-253, score-0.303]
93 Engineered disulfide bonds as probes of the folding pathway of barnase - increasing stability of proteins against the rate of denaturation. [sent-288, score-0.337]
94 Thermodynamics and kinetics of protein folding: an evolutionary perpective. [sent-292, score-0.177]
95 A neural network-based method for predicting the disulfide connectivity in proteins. [sent-307, score-0.227]
96 Prediction of β-sheet topology and disulfide bridges in polypeptides. [sent-322, score-0.197]
97 Contribution of disulfide bonds to the conformational stability and catalytic activity of ribonuclease A. [sent-336, score-0.236]
98 Substantial increase of protein stability by multiple disulfide bonds. [sent-343, score-0.167]
99 Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. [sent-350, score-0.312]
100 Disulfide connectivity prediction using recursive neural networks and evolutionary information. [sent-355, score-0.405]
wordName wordTfidf (topN-words)
[('disulphide', 0.578), ('cysteines', 0.415), ('bonded', 0.335), ('bonds', 0.208), ('connectivity', 0.201), ('bridges', 0.197), ('bonding', 0.178), ('disul', 0.148), ('protein', 0.139), ('cysteine', 0.133), ('prediction', 0.105), ('nw', 0.103), ('bond', 0.099), ('sw', 0.089), ('chains', 0.075), ('precision', 0.074), ('se', 0.07), ('plane', 0.064), ('gabow', 0.059), ('oxidized', 0.059), ('state', 0.056), ('proteins', 0.054), ('edges', 0.052), ('baldi', 0.052), ('hidden', 0.049), ('folding', 0.047), ('directed', 0.042), ('recursive', 0.041), ('architectures', 0.039), ('tp', 0.039), ('pdb', 0.039), ('contacts', 0.039), ('output', 0.038), ('evolutionary', 0.038), ('validation', 0.037), ('contact', 0.035), ('graph', 0.035), ('planes', 0.033), ('hi', 0.033), ('pattern', 0.032), ('greedy', 0.031), ('bridge', 0.031), ('bioinformatics', 0.031), ('sequence', 0.03), ('archive', 0.03), ('cardinal', 0.03), ('connectivities', 0.03), ('dublin', 0.03), ('fariselli', 0.03), ('iij', 0.03), ('oij', 0.03), ('solvent', 0.03), ('vullo', 0.03), ('secondary', 0.03), ('nn', 0.03), ('stability', 0.028), ('acids', 0.028), ('candidate', 0.027), ('reported', 0.027), ('edge', 0.027), ('predicting', 0.026), ('probabilities', 0.026), ('lattice', 0.026), ('compiled', 0.026), ('scratch', 0.026), ('accessibility', 0.026), ('recall', 0.025), ('predicted', 0.025), ('sequences', 0.025), ('layers', 0.024), ('pair', 0.024), ('ltered', 0.024), ('fp', 0.024), ('biochemistry', 0.022), ('matching', 0.022), ('layer', 0.021), ('total', 0.021), ('pro', 0.02), ('networks', 0.02), ('nodes', 0.02), ('amino', 0.02), ('contextual', 0.02), ('perfect', 0.019), ('input', 0.019), ('irvine', 0.019), ('weight', 0.019), ('alignment', 0.019), ('literature', 0.019), ('predictions', 0.019), ('structure', 0.018), ('intelligent', 0.018), ('overcomes', 0.018), ('correlation', 0.018), ('vertices', 0.018), ('containing', 0.018), ('approaches', 0.018), ('sharing', 0.017), ('connections', 0.017), ('four', 0.016), ('column', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
2 0.074912496 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag
Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high affinity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we find that 2/3 of the predicted active motifs correspond to actual interaction sites. 1
3 0.062909998 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
Author: Frank Dimaio, George Phillips, Jude W. Shavlik
Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use
4 0.054062758 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
5 0.049932204 52 nips-2004-Discrete profile alignment via constrained information bottleneck
Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin
Abstract: Amino acid profiles, which capture position-specific mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, profile comparisons are much more computationally expensive than discrete symbol comparisons, making profiles impractical for many large datasets. Furthermore, because they are such a rich representation, profiles can be difficult to visualize. To overcome these problems, we propose a discretization for profiles using an expanded alphabet representing not just individual amino acids, but common profiles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we find an informationally optimal alphabet. This discretization yields a concise, informative textual representation for profile sequences. Also alignments between these sequences, while nearly as accurate as the full profileprofile alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using profiles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which profile information can be applied. 1
6 0.044609878 165 nips-2004-Semi-supervised Learning on Directed Graphs
7 0.037220195 122 nips-2004-Modelling Uncertainty in the Game of Go
8 0.035315346 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern
9 0.034593057 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing
10 0.03446117 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
11 0.032548103 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks
12 0.032453783 183 nips-2004-Temporal-Difference Networks
13 0.032212876 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
14 0.030083807 44 nips-2004-Conditional Random Fields for Object Recognition
15 0.029858176 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data
16 0.028161123 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
17 0.027516035 124 nips-2004-Multiple Alignment of Continuous Time Series
18 0.026884703 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
19 0.026683833 28 nips-2004-Bayesian inference in spiking neurons
20 0.026092838 131 nips-2004-Non-Local Manifold Tangent Learning
topicId topicWeight
[(0, -0.094), (1, -0.01), (2, 0.005), (3, -0.024), (4, 0.006), (5, 0.029), (6, 0.003), (7, 0.037), (8, -0.007), (9, -0.083), (10, -0.018), (11, -0.052), (12, -0.023), (13, 0.032), (14, 0.031), (15, -0.004), (16, -0.03), (17, -0.054), (18, 0.045), (19, -0.002), (20, 0.047), (21, 0.069), (22, -0.111), (23, 0.037), (24, 0.051), (25, 0.027), (26, -0.121), (27, -0.074), (28, 0.075), (29, -0.022), (30, -0.137), (31, 0.094), (32, 0.12), (33, -0.038), (34, 0.015), (35, -0.033), (36, 0.105), (37, 0.103), (38, -0.003), (39, 0.067), (40, -0.045), (41, -0.002), (42, -0.069), (43, -0.017), (44, -0.046), (45, 0.116), (46, -0.139), (47, 0.05), (48, 0.118), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.93376994 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
2 0.75651193 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
Author: Frank Dimaio, George Phillips, Jude W. Shavlik
Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use
3 0.7460283 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag
Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high affinity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we find that 2/3 of the predicted active motifs correspond to actual interaction sites. 1
4 0.56223989 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing
Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann
Abstract: De novo Sequencing of peptides is a challenging task in proteome research. While there exist reliable DNA-sequencing methods, the highthroughput de novo sequencing of proteins by mass spectrometry is still an open problem. Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. Experiments effectively demonstrate that this new method significantly outperforms standard approaches in matching quality. 1
5 0.4749369 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
6 0.4422296 52 nips-2004-Discrete profile alignment via constrained information bottleneck
7 0.42822716 57 nips-2004-Economic Properties of Social Networks
8 0.35844636 141 nips-2004-Optimal sub-graphical models
9 0.35195863 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data
10 0.35183302 165 nips-2004-Semi-supervised Learning on Directed Graphs
11 0.31932628 122 nips-2004-Modelling Uncertainty in the Game of Go
12 0.31384459 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification
13 0.31377739 183 nips-2004-Temporal-Difference Networks
14 0.30492157 130 nips-2004-Newscast EM
15 0.27487659 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models
16 0.26208249 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
17 0.24851005 180 nips-2004-Synchronization of neural networks by mutual learning and its application to cryptography
18 0.23859142 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
19 0.23516254 124 nips-2004-Multiple Alignment of Continuous Time Series
20 0.23476316 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process
topicId topicWeight
[(9, 0.393), (13, 0.078), (15, 0.118), (17, 0.013), (25, 0.013), (26, 0.037), (31, 0.025), (33, 0.14), (35, 0.019), (39, 0.021), (50, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.74293864 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi
Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1
2 0.74108529 87 nips-2004-Integrating Topics and Syntax
Author: Thomas L. Griffiths, Mark Steyvers, David M. Blei, Joshua B. Tenenbaum
Abstract: Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively. 1
3 0.63449275 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers
Author: Ligen Wang, Balázs Kégl
Abstract: In this paper we propose to combine two powerful ideas, boosting and manifold learning. On the one hand, we improve A DA B OOST by incorporating knowledge on the structure of the data into base classifier design and selection. On the other hand, we use A DA B OOST’s efficient learning mechanism to significantly improve supervised and semi-supervised algorithms proposed in the context of manifold learning. Beside the specific manifold-based penalization, the resulting algorithm also accommodates the boosting of a large family of regularized learning algorithms. 1
4 0.60169494 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
Author: Odelia Schwartz, Terrence J. Sejnowski, Peter Dayan
Abstract: In the analysis of natural images, Gaussian scale mixtures (GSM) have been used to account for the statistics of filter responses, and to inspire hierarchical cortical representational learning schemes. GSMs pose a critical assignment problem, working out which filter responses were generated by a common multiplicative factor. We present a new approach to solving this assignment problem through a probabilistic extension to the basic GSM, and show how to perform inference in the model using Gibbs sampling. We demonstrate the efficacy of the approach on both synthetic and image data. Understanding the statistical structure of natural images is an important goal for visual neuroscience. Neural representations in early cortical areas decompose images (and likely other sensory inputs) in a way that is sensitive to sophisticated aspects of their probabilistic structure. This structure also plays a key role in methods for image processing and coding. A striking aspect of natural images that has reflections in both top-down and bottom-up modeling is coordination across nearby locations, scales, and orientations. From a topdown perspective, this structure has been modeled using what is known as a Gaussian Scale Mixture model (GSM).1–3 GSMs involve a multi-dimensional Gaussian (each dimension of which captures local structure as in a linear filter), multiplied by a spatialized collection of common hidden scale variables or mixer variables∗ (which capture the coordination). GSMs have wide implications in theories of cortical receptive field development, eg the comprehensive bubbles framework of Hyv¨ rinen.4 The mixer variables provide the a top-down account of two bottom-up characteristics of natural image statistics, namely the ‘bowtie’ statistical dependency,5, 6 and the fact that the marginal distributions of receptive field-like filters have high kurtosis.7, 8 In hindsight, these ideas also bear a close relationship with Ruderman and Bialek’s multiplicative bottom-up image analysis framework 9 and statistical models for divisive gain control.6 Coordinated structure has also been addressed in other image work,10–14 and in other domains such as speech15 and finance.16 Many approaches to the unsupervised specification of representations in early cortical areas rely on the coordinated structure.17–21 The idea is to learn linear filters (eg modeling simple cells as in22, 23 ), and then, based on the coordination, to find combinations of these (perhaps non-linearly transformed) as a way of finding higher order filters (eg complex cells). One critical facet whose specification from data is not obvious is the neighborhood arrangement, ie which linear filters share which mixer variables. ∗ Mixer variables are also called mutlipliers, but are unrelated to the scales of a wavelet. Here, we suggest a method for finding the neighborhood based on Bayesian inference of the GSM random variables. In section 1, we consider estimating these components based on information from different-sized neighborhoods and show the modes of failure when inference is too local or too global. Based on these observations, in section 2 we propose an extension to the GSM generative model, in which the mixer variables can overlap probabilistically. We solve the neighborhood assignment problem using Gibbs sampling, and demonstrate the technique on synthetic data. In section 3, we apply the technique to image data. 1 GSM inference of Gaussian and mixer variables In a simple, n-dimensional, version of a GSM, filter responses l are synthesized † by multiplying an n-dimensional Gaussian with values g = {g1 . . . gn }, by a common mixer variable v. l = vg (1) We assume g are uncorrelated (σ 2 along diagonal of the covariance matrix). For the analytical calculations, we assume that v has a Rayleigh distribution: where 0 < a ≤ 1 parameterizes the strength of the prior p[v] ∝ [v exp −v 2 /2]a (2) For ease, we develop the theory for a = 1. As is well known,2 and repeated in figure 1(B), the marginal distribution of the resulting GSM is sparse and highly kurtotic. The joint conditional distribution of two elements l1 and l2 , follows a bowtie shape, with the width of the distribution of one dimension increasing for larger values (both positive and negative) of the other dimension. The inverse problem is to estimate the n+1 variables g1 . . . gn , v from the n filter responses l1 . . . ln . It is formally ill-posed, though regularized through the prior distributions. Four posterior distributions are particularly relevant, and can be derived analytically from the model: rv distribution posterior mean ” “ √ σ |l1 | 2 2 l1 |l1 | B“ 1, σ ” |l1 | ” exp − v − “ p[v|l1 ] 2 2v 2 σ 2 σ 1 |l1 | 1 |l1 | B p[v|l] p[|g1 ||l1 ] p[|g1 ||l] √ B 2, σ 1 (n−2) 2 2 2 ( ) −(n−1) exp − v2 − 2vl2 σ2 l v B(1− n , σ ) 2 √ σ|l1 | g2 l2 “ ” 1 exp − 12 − 12 2σ 1 |l1 | g2 2g l σ B −2, σ|l1 | ”1 |l1 | 2 (2−n) l n l 2 −1, σ “ B( ) σ (n−3) g1 1 l σ σ 1 g2 2 1 exp − 2σ2 l2 − l 1 2 l1 2 2g1 σ |l1 | σ ( ( 2, σ ) ) l B 3−n,σ 2 2 l B 1− n , σ “ 2 ” |l1 | B 0, σ |l1 | “ ” σ B − 1 , |l1 | 2 σ n 1 l |l1 | B( 2 − 2 , σ ) n l B( −1, l ) 2 σ 2 where B(n, x) is the modified Bessel function of the second kind (see also24 ), l = i li and gi is forced to have the same sign as li , since the mixer variables are always positive. Note that p[v|l1 ] and p[g1 |l1 ] (rows 1,3) are local estimates, while p[v|l] and p[g|l] (rows 2,4) are estimates according to filter outputs {l1 . . . ln }. The posterior p[v|l] has also been estimated numerically in noise removal for other mixer priors, by Portilla et al 25 The full GSM specifies a hierarchy of mixer variables. Wainwright2 considered a prespecified tree-based hierarhical arrangement. In practice, for natural sensory data, given a heterogeneous collection of li , it is advantageous to learn the hierachical arrangement from examples. In an approach related to that of the GSM, Karklin and Lewicki19 suggested We describe the l as being filter responses even in the synthetic case, to facilitate comparison with images. † B A α 1 ... g v 20 1 ... β 0.1 l 0 -5 0 l 2 0 21 0 0 5 l 1 0 l 1 1 l ... l 21 40 20 Actual Distribution 0 D Gaussian 0 5 0 0 -5 0 0 5 0 5 -5 0 g 1 0 5 E(g 1 | l1) 1 .. 40 ) 0.06 -5 0 0 5 2 E(g |l 1 1 .. 20 ) 0 1 E(g | l ) -5 5 E(g | l 1 2 1 .. 20 5 α E(g |l 1 .. 20 ) E(g |l 0 E(v | l α 0.06 E(g | l2) 2 2 0 5 E(v | l 1 .. 20 ) E(g | l1) 1 1 g 0 1 0.06 0 0.06 E(vαl | ) g 40 filters, too global 0.06 0.06 0.06 Distribution 20 filters 1 filter, too local 0.06 vα E Gaussian joint conditional 40 l l C Mixer g ... 21 Multiply Multiply l g Distribution g v 1 .. 40 1 .. 40 ) ) E(g | l 1 1 .. 40 ) Figure 1: A Generative model: each filter response is generated by multiplying its Gaussian variable by either mixer variable vα , or mixer variable vβ . B Marginal and joint conditional statistics (bowties) of sample synthetic filter responses. For the joint conditional statistics, intensity is proportional to the bin counts, except that each column is independently re-scaled to fill the range of intensities. C-E Left: actual distributions of mixer and Gaussian variables; other columns: estimates based on different numbers of filter responses. C Distribution of estimate of the mixer variable vα . Note that mixer variable values are by definition positive. D Distribution of estimate of one of the Gaussian variables, g1 . E Joint conditional statistics of the estimates of Gaussian variables g1 and g2 . generating log mixer values for all the filters and learning the linear combinations of a smaller collection of underlying values. Here, we consider the problem in terms of multiple mixer variables, with the linear filters being clustered into groups that share a single mixer. This poses a critical assignment problem of working out which filter responses share which mixer variables. We first study this issue using synthetic data in which two groups of filter responses l1 . . . l20 and l21 . . . l40 are generated by two mixer variables vα and vβ (figure 1). We attempt to infer the components of the GSM model from the synthetic data. Figure 1C;D shows the empirical distributions of estimates of the conditional means of a mixer variable E(vα |{l}) and one of the Gaussian variables E(g1 |{l}) based on different assumed assignments. For estimation based on too few filter responses, the estimates do not well match the actual distributions. For example, for a local estimate based on a single filter response, the Gaussian estimate peaks away from zero. For assignments including more filter responses, the estimates become good. However, inference is also compromised if the estimates for vα are too global, including filter responses actually generated from vβ (C and D, last column). In (E), we consider the joint conditional statistics of two components, each 1 v v α vγ β g 1 ... v vα B Actual A Generative model 1 100 1 100 0 v 01 l1 ... l100 0 l 1 20 2 0 0 l 1 0 -4 100 Filter number vγ β 1 100 1 0 Filter number 100 1 Filter number 0 E(g 1 | l ) Gibbs fit assumed 0.15 E(g | l ) 0 2 0 1 Mixer Gibbs fit assumed 0.1 4 0 E(g 1 | l ) Distribution Distribution Distribution l 100 Filter number Gaussian 0.2 -20 1 1 0 Filter number Inferred v α Multiply 100 1 Filter number Pixel vγ 1 g 0 C β E(v | l ) β 0 0 0 15 E(v | l ) α 0 E(v | l ) α Figure 2: A Generative model in which each filter response is generated by multiplication of its Gaussian variable by a mixer variable. The mixer variable, v α , vβ , or vγ , is chosen probabilistically upon each filter response sample, from a Rayleigh distribution with a = .1. B Top: actual probability of filter associations with vα , vβ , and vγ ; Bottom: Gibbs estimates of probability of filter associations corresponding to vα , vβ , and vγ . C Statistics of generated filter responses, and of Gaussian and mixer estimates from Gibbs sampling. estimating their respective g1 and g2 . Again, as the number of filter responses increases, the estimates improve, provided that they are taken from the right group of filter responses with the same mixer variable. Specifically, the mean estimates of g1 and g2 become more independent (E, third column). Note that for estimations based on a single filter response, the joint conditional distribution of the Gaussian appears correlated rather than independent (E, second column); for estimation based on too many filter responses (40 in this example), the joint conditional distribution of the Gaussian estimates shows a dependent (rather than independent) bowtie shape (E, last column). Mixer variable joint statistics also deviate from the actual when the estimations are too local or global (not shown). We have observed qualitatively similar statistics for estimation based on coefficients in natural images. Neighborhood size has also been discussed in the context of the quality of noise removal, assuming a GSM model.26 2 Neighborhood inference: solving the assignment problem The plots in figure 1 suggest that it should be possible to infer the assignments, ie work out which filter responses share common mixers, by learning from the statistics of the resulting joint dependencies. Hard assignment problems (in which each filter response pays allegiance to just one mixer) are notoriously computationally brittle. Soft assignment problems (in which there is a probabilistic relationship between filter responses and mixers) are computationally better behaved. Further, real world stimuli are likely better captured by the possibility that filter responses are coordinated in somewhat different collections in different images. We consider a richer, mixture GSM as a generative model (Figure 2). To model the generation of filter responses li for a single image patch, we multiply each Gaussian variable gi by a single mixer variable from the set v1 . . . vm . We assume that gi has association probabil- ity pij (satisfying j pij = 1, ∀i) of being assigned to mixer variable vj . The assignments are assumed to be made independently for each patch. We use si ∈ {1, 2, . . . m} for the assignments: li = g i vs i (3) Inference and learning in this model proceeds in two stages, according to the expectation maximization algorithm. First, given a filter response li , we use Gibbs sampling for the E phase to find possible appropriate (posterior) assignments. Williams et al.27 suggested using Gibbs sampling to solve a similar assignment problem in the context of dynamic tree models. Second, for the M phase, given the collection of assignments across multiple filter responses, we update the association probabilities pij . Given sample mixer assignments, we can estimate the Gaussian and mixer components of the GSM using the table of section 1, but restricting the filter response samples just to those associated with each mixer variable. We tested the ability of this inference method to find the associations in the probabilistic mixer variable synthetic example shown in figure 2, (A,B). The true generative model specifies probabilistic overlap of 3 mixer variables. We generated 5000 samples for each filter according to the generative model. We ran the Gibbs sampling procedure, setting the number of possible neighborhoods to 5 (e.g., > 3); after 500 iterations the weights converged near to the proper probabilities. In (B, top), we plot the actual probability distributions for the filter associations with each of the mixer variables. In (B, bottom), we show the estimated associations: the three non-zero estimates closely match the actual distributions; the other two estimates are zero (not shown). The procedure consistently finds correct associations even in larger examples of data generated with up to 10 mixer variables. In (C) we show an example of the actual and estimated distributions of the mixer and Gaussian components of the GSM. Note that the joint conditional statistics of both mixer and Gaussian are independent, since the variables were generated as such in the synthetic example. The Gibbs procedure can be adjusted for data generated with different parameters a of equation 2, and for related mixers,2 allowing for a range of image coefficient behaviors. 3 Image data Having validated the inference model using synthetic data, we turned to natural images. We derived linear filters from a multi-scale oriented steerable pyramid,28 with 100 filters, at 2 preferred orientations, 25 non-overlapping spatial positions (with spatial subsampling of 8 pixels), and two phases (quadrature pairs), and a single spatial frequency peaked at 1/6 cycles/pixel. The image ensemble is 4 images from a standard image compression database (boats, goldhill, plant leaves, and mountain) and 4000 samples. We ran our method with the same parameters as for synthetic data, with 7 possible neighborhoods and Rayleigh parameter a = .1 (as in figure 2). Figure 3 depicts the association weights pij of the coefficients for each of the obtained mixer variables. In (A), we show a schematic (template) of the association representation that will follow in (B, C) for the actual data. Each mixer variable neighborhood is shown for coefficients of two phases and two orientations along a spatial grid (one grid for each phase). The neighborhood is illustrated via the probability of each coefficient to be generated from a given mixer variable. For the first two neighborhoods (B), we also show the image patches that yielded the maximum log likelihood of P (v|patch). The first neighborhood (in B) prefers vertical patterns across most of its “receptive field”, while the second has a more localized region of horizontal preference. This can also be seen by averaging the 200 image patches with the maximum log likelihood. Strikingly, all the mixer variables group together two phases of quadrature pair (B, C). Quadrature pairs have also been extracted from cortical data, and are the components of ideal complex cell models. Another tendency is to group Phase 2 Phase 1 19 Y position Y position A 0 -19 Phase 1 Phase 2 19 0 -19 -19 0 19 X position -19 0 19 X position B Neighborhood Example max patches Average Neighborhood Example max patches C Neighborhood Average Gaussian 0.25 l2 0 -50 0 l 1 50 0 l 1 Mixer Gibbs fit assumed Gibbs fit assumed Distribution Distribution Distribution D Coefficient 0.12 E(g | l ) 0 2 0 -5 0 E(g 1 | l ) 5 0 E(g 1 | l ) 0.15 ) E(v | l ) β 0 00 15 E(v | l ) α 0 E(v | l ) α Figure 3: A Schematic of the mixer variable neighborhood representation. The probability that each coefficient is associated with the mixer variable ranges from 0 (black) to 1 (white). Left: Vertical and horizontal filters, at two orientations, and two phases. Each phase is plotted separately, on a 38 by 38 pixel spatial grid. Right: summary of representation, with filter shapes replaced by oriented lines. Filters are approximately 6 pixels in diameter, with the spacing between filters 8 pixels. B First two image ensemble neighborhoods obtained from Gibbs sampling. Also shown, are four 38×38 pixel patches that had the maximum log likelihood of P (v|patch), and the average of the first 200 maximal patches. C Other image ensemble neighborhoods. D Statistics of representative coefficients of two spatially displaced vertical filters, and of inferred Gaussian and mixer variables. orientations across space. The phase and iso-orientation grouping bear some interesting similarity to other recent suggestions;17, 18 as do the maximal patches.19 Wavelet filters have the advantage that they can span a wider spatial extent than is possible with current ICA techniques, and the analysis of parameters such as phase grouping is more controlled. We are comparing the analysis with an ICA first-stage representation, which has other obvious advantages. We are also extending the analysis to correlated wavelet filters; 25 and to simulations with a larger number of neighborhoods. From the obtained associations, we estimated the mixer and Gaussian variables according to our model. In (D) we show representative statistics of the coefficients and of the inferred variables. The learned distributions of Gaussian and mixer variables are quite close to our assumptions. The Gaussian estimates exhibit joint conditional statistics that are roughly independent, and the mixer variables are weakly dependent. We have thus far demonstrated neighborhood inference for an image ensemble, but it is also interesting and perhaps more intuitive to consider inference for particular images or image classes. In figure 4 (A-B) we demonstrate example mixer variable neighborhoods derived from learning patches of a zebra image (Corel CD-ROM). As before, the neighborhoods are composed of quadrature pairs; however, the spatial configurations are richer and have A Neighborhood B Neighborhood Average Example max patches Top 25 max patches Average Example max patches Top 25 max patches Figure 4: Example of Gibbs on Zebra image. Image is 151×151 pixels, and each spatial neighborhood spans 38×38 pixels. A, B Example mixer variable neighborhoods. Left: example mixer variable neighborhood, and average of 200 patches that yielded the maximum likelihood of P (v|patch). Right: Image and marked on top of it example patches that yielded the maximum likelihood of P (v|patch). not been previously reported with unsupervised hierarchical methods: for example, in (A), the mixture neighborhood captures a horizontal-bottom/vertical-top spatial configuration. This appears particularly relevant in segmenting regions of the front zebra, as shown by marking in the image the patches i that yielded the maximum log likelihood of P (v|patch). In (B), the mixture neighborhood captures a horizontal configuration, more focused on the horizontal stripes of the front zebra. This example demonstrates the logic behind a probabilistic mixture: coefficients corresponding to the bottom horizontal stripes might be linked with top vertical stripes (A) or to more horizontal stripes (B). 4 Discussion Work on the study of natural image statistics has recently evolved from issues about scalespace hierarchies, wavelets, and their ready induction through unsupervised learning models (loosely based on cortical development) towards the coordinated statistical structure of the wavelet components. This includes bottom-up (eg bowties, hierarchical representations such as complex cells) and top-down (eg GSM) viewpoints. The resulting new insights inform a wealth of models and ideas and form the essential backdrop for the work in this paper. They also link to impressive engineering results in image coding and processing. A most critical aspect of an hierarchical representational model is the way that the structure of the hierarchy is induced. We addressed the hierarchy question using a novel extension to the GSM generative model in which mixer variables (at one level of the hierarchy) enjoy probabilistic assignments to filter responses (at a lower level). We showed how these assignments can be learned (using Gibbs sampling), and illustrated some of their attractive properties using both synthetic and a variety of image data. We grounded our method firmly in Bayesian inference of the posterior distributions over the two classes of random variables in a GSM (mixer and Gaussian), placing particular emphasis on the interplay between the generative model and the statistical properties of its components. An obvious question raised by our work is the neural correlate of the two different posterior variables. The Gaussian variable has characteristics resembling those of the output of divisively normalized simple cells;6 the mixer variable is more obviously related to the output of quadrature pair neurons (such as orientation energy or motion energy cells, which may also be divisively normalized). How these different information sources may subsequently be used is of great interest. Acknowledgements This work was funded by the HHMI (OS, TJS) and the Gatsby Charitable Foundation (PD). We are very grateful to Patrik Hoyer, Mike Lewicki, Zhaoping Li, Simon Osindero, Javier Portilla and Eero Simoncelli for discussion. References [1] D Andrews and C Mallows. Scale mixtures of normal distributions. J. Royal Stat. Soc., 36:99–102, 1974. [2] M J Wainwright and E P Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In S. A. Solla, T. K. Leen, and K.-R. M¨ ller, editors, Adv. Neural Information Processing Systems, volume 12, pages 855–861, Cambridge, MA, u May 2000. MIT Press. [3] M J Wainwright, E P Simoncelli, and A S Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Applied and Computational Harmonic Analysis, 11(1):89–123, July 2001. Special issue on wavelet applications. [4] A Hyv¨ rinen, J Hurri, and J Vayrynen. Bubbles: a unifying framework for low-level statistical properties of natural image a sequences. Journal of the Optical Society of America A, 20:1237–1252, May 2003. [5] R W Buccigrossi and E P Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans Image Proc, 8(12):1688–1701, December 1999. [6] O Schwartz and E P Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819–825, August 2001. [7] D J Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4(12):2379–2394, 1987. [8] H Attias and C E Schreiner. Temporal low-order statistics of natural sounds. In M Jordan, M Kearns, and S Solla, editors, Adv in Neural Info Processing Systems, volume 9, pages 27–33. MIT Press, 1997. [9] D L Ruderman and W Bialek. Statistics of natural images: Scaling in the woods. Phys. Rev. Letters, 73(6):814–817, 1994. [10] C Zetzsche, B Wegmann, and E Barth. Nonlinear aspects of primary vision: Entropy reduction beyond decorrelation. In Int’l Symposium, Society for Information Display, volume XXIV, pages 933–936, 1993. [11] J Huang and D Mumford. Statistics of natural images and models. In CVPR, page 547, 1999. [12] J. Romberg, H. Choi, and R. Baraniuk. Bayesian wavelet domain image modeling using hidden Markov trees. In Proc. IEEE Int’l Conf on Image Proc, Kobe, Japan, October 1999. [13] A Turiel, G Mato, N Parga, and J P Nadal. The self-similarity properties of natural images resemble those of turbulent flows. Phys. Rev. Lett., 80:1098–1101, 1998. [14] J Portilla and E P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients. Int’l Journal of Computer Vision, 40(1):49–71, 2000. [15] Helmut Brehm and Walter Stammler. Description and generation of spherically invariant speech-model signals. Signal Processing, 12:119–141, 1987. [16] T Bollersley, K Engle, and D Nelson. ARCH models. In B Engle and D McFadden, editors, Handbook of Econometrics V. 1994. [17] A Hyv¨ rinen and P Hoyer. Emergence of topography and complex cell properties from natural images using extensions of a ¨ ICA. In S. A. Solla, T. K. Leen, and K.-R. Muller, editors, Adv. Neural Information Processing Systems, volume 12, pages 827–833, Cambridge, MA, May 2000. MIT Press. [18] P Hoyer and A Hyv¨ rinen. A multi-layer sparse coding network learns contour coding from natural images. Vision Research, a 42(12):1593–1605, 2002. [19] Y Karklin and M S Lewicki. Learning higher-order structures in natural images. Network: Computation in Neural Systems, 14:483–499, 2003. [20] W Laurenz and T Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715– 770, 2002. [21] C Kayser, W Einh¨ user, O D¨ mmer, P K¨ nig, and K P K¨ rding. Extracting slow subspaces from natural videos leads to a u o o complex cells. In G Dorffner, H Bischof, and K Hornik, editors, Proc. Int’l Conf. on Artificial Neural Networks (ICANN-01), pages 1075–1080, Vienna, Aug 2001. Springer-Verlag, Heidelberg. [22] B A Olshausen and D J Field. Emergence of simple-cell receptive field properties by learning a sparse factorial code. Nature, 381:607–609, 1996. [23] A J Bell and T J Sejnowski. The ’independent components’ of natural scenes are edge filters. Vision Research, 37(23):3327– 3338, 1997. [24] U Grenander and A Srivastava. Probabibility models for clutter in natural images. IEEE Trans. on Patt. Anal. and Mach. Intel., 23:423–429, 2002. [25] J Portilla, V Strela, M Wainwright, and E Simoncelli. Adaptive Wiener denoising using a Gaussian scale mixture model in the wavelet domain. In Proc 8th IEEE Int’l Conf on Image Proc, pages 37–40, Thessaloniki, Greece, Oct 7-10 2001. IEEE Computer Society. [26] J Portilla, V Strela, M Wainwright, and E P Simoncelli. Image denoising using a scale mixture of Gaussians in the wavelet domain. IEEE Trans Image Processing, 12(11):1338–1351, November 2003. [27] C K I Williams and N J Adams. Dynamic trees. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Adv. Neural Information Processing Systems, volume 11, pages 634–640, Cambridge, MA, 1999. MIT Press. [28] E P Simoncelli, W T Freeman, E H Adelson, and D J Heeger. Shiftable multi-scale transforms. IEEE Trans Information Theory, 38(2):587–607, March 1992. Special Issue on Wavelets.
5 0.55204719 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
Author: Balázs Kégl
Abstract: We have recently proposed an extension of A DA B OOST to regression that uses the median of the base regressors as the final regressor. In this paper we extend theoretical results obtained for A DA B OOST to median boosting and to its localized variant. First, we extend recent results on efficient margin maximizing to show that the algorithm can converge to the maximum achievable margin within a preset precision in a finite number of steps. Then we provide confidence-interval-type bounds on the generalization error. 1
6 0.49779111 200 nips-2004-Using Random Forests in the Structured Language Model
7 0.49674064 145 nips-2004-Parametric Embedding for Class Visualization
8 0.48626482 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling
9 0.46245307 178 nips-2004-Support Vector Classification with Input Data Uncertainty
10 0.46212402 131 nips-2004-Non-Local Manifold Tangent Learning
11 0.46119049 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
12 0.46116996 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
13 0.4569712 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
14 0.45687434 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
15 0.45666465 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
16 0.45655262 127 nips-2004-Neighbourhood Components Analysis
17 0.45645875 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
18 0.4564549 102 nips-2004-Learning first-order Markov models for control
19 0.45626825 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill
20 0.45618364 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data