nips nips2004 nips2004-95 knowledge-graph by maker-knowledge-mining

95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

Source: pdf

Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi

Abstract: The formation of disulphide bridges among cysteines is an important feature of protein structures. Here we develop new methods for the prediction of disulphide bond connectivity. We ﬁrst build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efﬁciently. We show how the method consistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ie Abstract The formation of disulphide bridges among cysteines is an important feature of protein structures. [sent-5, score-1.329]

2 Here we develop new methods for the prediction of disulphide bond connectivity. [sent-6, score-0.782]

3 We ﬁrst build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs. [sent-7, score-1.184]

4 In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. [sent-10, score-0.41]

5 Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. [sent-11, score-0.207]

6 The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. [sent-12, score-1.014]

7 The method also yields an estimate for the total number of disulphide bridges in each chain. [sent-13, score-0.796]

8 1 Introduction The formation of covalent links among cysteine (Cys) residues with disulphide bridges is an important and unique feature of protein folding and structure. [sent-14, score-1.094]

9 Simulations [1], experiments in protein engineering [15, 8, 14], theoretical studies [7, 18], and even evolutionary models [9] stress the importance of disulphide bonds in stabilizing the native state of proteins. [sent-15, score-1.019]

10 Disulphide bridges may link distant portions of a protein sequence, providing strong structural constraints in the form of long-range interactions. [sent-16, score-0.336]

11 Thus prediction/knowledge of the disulphide connectivity of a protein is important and provides essential insights into its structure and possibly also into its function and evolution. [sent-17, score-0.936]

12 Only recently has the problem of predicting disulphide bridges received increased attention. [sent-18, score-0.801]

13 of the actual pairings between bonded cysteines (see Fig. [sent-21, score-0.75]

14 In this paper, we address the problem of intra-chain connectivity prediction, and AVITGACERDLQCGKGTCCAVSLWIKSVRVCTPVGTSGEDCHPASHKIPFSGQRKMHHTCPCAPNLACVQTSPKKFKCLSK Figure 1: Structure (top) and connectivity pattern (bottom) of intestinal toxin 1, PDB code 1IMT. [sent-23, score-0.434]

15 Disulphide bonds in the structure are shown as thick lines. [sent-24, score-0.226]

16 Existing approaches to connectivity prediction use stochastic global optimization [10], combinatorial optimization [13] and machine learning techniques [11, 17]. [sent-26, score-0.324]

17 The method in [10] represents the set of potential disulphide bridges in a sequence as a complete weighted undirected graph. [sent-27, score-0.805]

18 Vertices are oxidized cysteines and edges are labeled by the strength of interaction (contact potential) in the associated pair of cysteines. [sent-28, score-0.55]

19 After a complete labeled graph is obtained, candidate bridges are then located by ﬁnding the maximum weight perfect matching1 . [sent-30, score-0.297]

20 Unfortunately, for computational reasons, both this method and the previous one can only deal with sequences containing a limited number of bonds (K ≤ 5). [sent-37, score-0.251]

21 A different approach to predicting disulphide bridges is reported in [13], where ﬁnding disulphide bridges is part of a more general protocol aimed at predicting the topology of β-sheets in proteins. [sent-38, score-1.629]

22 Residue-to-residue contacts (including Cys-Cys bridges) are predicted by solving a series of integer linear programming problems in which customized hydrophobic contact energies must be maximized. [sent-39, score-0.099]

23 This method cannot be compared with the other approaches because the authors report validation results only for two relatively short polypeptides with few bonds (2 and 3). [sent-40, score-0.263]

24 In this paper we use 2-Dimensional Recursive Neural Network (2D-RNN, [4]) to predict disulphide connectivity in proteins starting from their primary sequence and its homologues. [sent-41, score-0.863]

25 The output of 2D-RNN are the pairwise probabilities of the existence of a bridge between any pair of cysteines. [sent-42, score-0.119]

26 Candidate disulphide connectivities are predicted by ﬁnding the maximum weight perfect matching. [sent-43, score-0.671]

27 The proposed framework represents a signiﬁcant improvement in disulphide connectivity prediction for several reasons. [sent-44, score-0.884]

28 Second, our architecture can easily cope with chains with arbitrary number 1 A perfect matching of a graph (V, E) is a subset E ⊆ E such that each vertex v ∈ V is met by only one edge in E . [sent-46, score-0.178]

29 The hidden planes contain directed edges associated with the square lattices. [sent-49, score-0.176]

30 All the edges of the square lattice in each hidden plane are oriented in the direction of one of the four possible cardinal corners: NE, NW, SW, SE. [sent-50, score-0.237]

31 Additional directed edges run vertically in column from the input plane to each hidden plane, and from each hidden plane to the output plane. [sent-51, score-0.393]

32 Therefore, it overcomes the limitation of previous approaches which restrict predictions to chains with no more than 10 oxidized cysteines. [sent-55, score-0.189]

33 Third, our methods can be applied both to situations where the bonded state of each cysteine is known or unknown. [sent-56, score-0.524]

34 Here the underlying directed graph for disulphide connectivity has six 2D-layers: input, output, and four hidden layers (Figure 2(a)). [sent-60, score-0.945]

35 Vertical connections, within an (i, j) column, run from input to hidden and output layers, and from hidden layers to output (Figure 2(b)). [sent-61, score-0.217]

36 In each one of the four hidden planes, square lattice connections are oriented towards one of the four cardinal corners. [sent-62, score-0.154]

37 In a disulphide contact map prediction, the (i, j) output represents the probability of whether the i-th and j-th cysteines in the sequence are linked by a disulphide bridge or not. [sent-66, score-1.705]

38 This prediction depends directly on the (i, j) input and the four-hidden units in the same column, associated with omni-directional contextual propagation in the hidden planes. [sent-67, score-0.193]

39 For a sequence of length N and containing M cysteines, the output layer contains M × M units. [sent-71, score-0.107]

40 The input and hidden layer can scale like N × N if the full sequence is used, or like M × M if only ﬁxed-size windows around each cysteine are used, as in the experiments reported here. [sent-72, score-0.279]

41 The input of each position within a window is the normalized frequency of all 20 amino acids at that position in the multiple alignment generated by aligning the sequence with the sequences in the NR database using the PSI-BLAST program as described, for instance, in [16]. [sent-74, score-0.141]

42 In the ﬁrst mode, we can assume that the bonded state of the individual cysteines is known, for instance through the use of a specialized predictor for bonded/non-bonded states. [sent-78, score-0.806]

43 Then if the sequence contains M cysteines, 2K (2K ≤ M ) of which are intra-chain disulphide bonded, the prediction of the connectivity can focus on the 2K bonded cysteines exclusively and ignore the remaining M − 2K cysteines that are not bonded. [sent-79, score-2.079]

44 In the second mode, we can try to solve both prediction problems–bond state and connectivity–at the same time by focusing on all cysteines in a given sequence. [sent-80, score-0.576]

45 In both cases, the output is an array of pairwise probabilities from which the connectivity pattern graph must be inferred. [sent-81, score-0.332]

46 In the ﬁrst case, the total number of bonds or edges in the connectivity graph is known (K). [sent-82, score-0.517]

47 In section 3, we show that sum of all probabilities across the output array can be used to estimate the number of disulphide contacts. [sent-84, score-0.642]

48 Data Preparation In order to assess our method, two data sets of known disulphide connectivities were compiled from the Swiss-Prot archive [2]. [sent-85, score-0.664]

49 We ﬁltered out proteins with disulphide bonds assigned tentatively or disulphide bonds inferred by similarity. [sent-93, score-1.626]

50 We ﬁnally ended up with 966 chains, each with a number of disulphide bonds in the range of 1 to 24. [sent-94, score-0.786]

51 As previously pointed out, our methodology is not limited by the number of disulphide bonds, hence we were able to retain and test the algorithm on the whole ﬁltered set of non-trivial chains. [sent-95, score-0.578]

52 This set consists of 712 sequences, each containing at least two bridges (K ≥ 2)–the case K = 1 being trivial when the bonded state is known. [sent-96, score-0.606]

53 By comparison, SP39 includes 446 chains with no more than 5 bridges; SP41 additionally includes 266 sequences and 112 of these have more than 10 oxidized cysteines. [sent-97, score-0.159]

54 Graph Matching to Derive Connectivity from Output Probabilities In the case where the bonded state of the cysteines is known, one has a graph with 2K nodes, one for each bonded cysteine. [sent-100, score-1.176]

55 The problem is then to ﬁnd a connectivity pattern with K edges and maximum weight, where each cysteine is paired uniquely with another cysteine. [sent-102, score-0.418]

56 The maximum weight matching algorithm of Gabow [12] is used to chosen paired cysteines (edges), whose time complexity is cubic O(V 3 ) = O(K 3 ), where V is the number of vertices and linear O(V ) = O(K) space complexity beyond the storage of the graph. [sent-103, score-0.474]

57 Note that because the number of bonded cysteines in general is not very large, it is also possible in many cases to use an exhaustive search of all possible combinations. [sent-104, score-0.75]

58 (2K − 1) which yields 945 connectivity patterns in the case of 10 bonded cysteines. [sent-108, score-0.536]

59 The case where the bonded state of the cysteines is not known is slightly more involved and the Gabow algorithm cannot be applied directly since the graph has M nodes but, if some of the cysteines are not bonded, only a subset of 2K < M nodes participate in the ﬁnal maximum weighted matching. [sent-109, score-1.296]

60 Alternatively, we use a greedy algorithm to derive the connectivity pattern using the estimate of the total number of bonds. [sent-110, score-0.285]

61 Then we pick the next edge with highest probability that is not incident to the ﬁrst edge and so forth, until K edges have been selected. [sent-113, score-0.106]

62 For L reasonably large, the optimal connectivity pattern can usually be found. [sent-121, score-0.233]

63 We have compared this method with Gabow’s algorithm in the case where the bonding state is known and observed that when L = 6, this greedy heuristic yields results that are as good as those obtained with Gabow’s algorithm which, in this case, is guaranteed to ﬁnd a global optimum. [sent-122, score-0.265]

64 It is important to note that this method ends up by producing a prediction of both the connectivity pattern and of the bonding state of each cysteine. [sent-125, score-0.572]

65 3 Results Disulphide Connectivity Prediction for Bonded Cysteines Here we assume that the bonding state is known. [sent-126, score-0.234]

66 44) Table 1: Disulphide connectivity prediction with 2D-RNN assuming the bonding state is known. [sent-152, score-0.54]

67 * denote levels of precision that exceeds previously reported best results in the literature [17] (in parentheses). [sent-154, score-0.12]

68 Figure 3: Correlation between number of bonded cysteines (2K) and i=j Oi,j log M . [sent-155, score-0.75]

69 For instance, in the case of 3 bonded cysteines, the precision reaches 0. [sent-158, score-0.409]

70 51 at the pair and pattern levels, whereas the best similar results reported in the literature are 0. [sent-160, score-0.102]

71 Estimation of the Number K of Bonds Analysis of the prediction results shows that there is a relationship between the sum of all the probabilities, i=j Oi,j , in the graph (or the output layer of the 2D-RNN) and the total number of bonded cysteines (2K). [sent-163, score-0.97]

72 As shown in Figure 3, there is a reasonably linear relationship between the total number 2K of bonded cysteines and the product i=j Oi,j log M , where M is the total number of cysteines in the sequence being considered. [sent-168, score-1.237]

73 Using this, we estimate the total number of bonded cysteines using linear regression and rounding off, making sure that the total number is even and does not exceed the total number of cysteines in the sequence. [sent-172, score-1.228]

74 03 Table 2: Prediction of disulphide connectivity pattern with 2D-RNN on all the cysteines, without assuming knowledge of the bonding state. [sent-186, score-0.989]

75 Disulphide Connectivity Prediction from Scratch In the last set of experiments, we do not assume any knowledge of the bonding state and apply the 2D-RNN approach to all the cysteines (both bonded and not bonded) in each sequence. [sent-187, score-0.984]

76 We predict the number of bonds, the bonding state, and connectivity pattern using one predictor. [sent-188, score-0.411]

77 Table 3 shows the kind of results that are obtained when the method is applied to sequences with more than K = 5 bonds in SP41. [sent-191, score-0.233]

78 Finally, the precision of bonded state prediction is 0. [sent-193, score-0.57]

79 85, and the recall of bonded state prediction is 0. [sent-194, score-0.521]

80 The precision and recall of bond number prediction is 0. [sent-196, score-0.303]

81 The average absolute difference between true bond and predicted bond number is 0. [sent-198, score-0.223]

82 The average absolute difference between true bond number and wrongly predicted bond number is 1. [sent-200, score-0.223]

83 24 Table 3: Prediction of disulphide connectivity pattern with 2D-RNN on all the cysteines, without assuming knowledge of the bonding state and when the number of bridges K exceeds 5. [sent-215, score-1.242]

84 4 Conclusion We have presented a complete system for disulphide connectivity prediction in cysteinerich proteins. [sent-216, score-0.884]

85 Assuming knowledge of cysteine bonding state, the method outperforms existing approaches on the same validation data. [sent-217, score-0.366]

86 The results also show that the 2D-RNN method achieves good recall and accuracy on the prediction of connectivity pattern even when the bonding state of individual cysteines is not known. [sent-218, score-1.012]

87 Differently from previous approaches, our method can be applied to chains with K > 5 bonds and yields good, cooperative, predictions of the total number of bonds, as well as of the bonding states and bond locations. [sent-219, score-0.6]

88 Training can take days but once trained predictions can be carried on a proteomic or protein engineering scale. [sent-220, score-0.158]

89 The current version of our disulphide prediction server DIpro (which includes step (a)) is available through: http://www. [sent-222, score-0.683]

90 What can disulﬁde bonds tell us about protein energetics, function and folding: simulations and bioinformatics analysis. [sent-233, score-0.378]

91 The SWISS-PROT protein sequence database and its supplement TrEMBL. [sent-241, score-0.169]

92 The principled design of large-scale recursive neural network architectures–dag-rnns and the protein structure prediction problem. [sent-253, score-0.303]

93 Engineered disulﬁde bonds as probes of the folding pathway of barnase - increasing stability of proteins against the rate of denaturation. [sent-288, score-0.337]

94 Thermodynamics and kinetics of protein folding: an evolutionary perpective. [sent-292, score-0.177]

95 A neural network-based method for predicting the disulﬁde connectivity in proteins. [sent-307, score-0.227]

96 Prediction of β-sheet topology and disulﬁde bridges in polypeptides. [sent-322, score-0.197]

97 Contribution of disulﬁde bonds to the conformational stability and catalytic activity of ribonuclease A. [sent-336, score-0.236]

98 Substantial increase of protein stability by multiple disulﬁde bonds. [sent-343, score-0.167]

99 Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and proﬁles. [sent-350, score-0.312]

100 Disulﬁde connectivity prediction using recursive neural networks and evolutionary information. [sent-355, score-0.405]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('disulphide', 0.578), ('cysteines', 0.415), ('bonded', 0.335), ('bonds', 0.208), ('connectivity', 0.201), ('bridges', 0.197), ('bonding', 0.178), ('disul', 0.148), ('protein', 0.139), ('cysteine', 0.133), ('prediction', 0.105), ('nw', 0.103), ('bond', 0.099), ('sw', 0.089), ('chains', 0.075), ('precision', 0.074), ('se', 0.07), ('plane', 0.064), ('gabow', 0.059), ('oxidized', 0.059), ('state', 0.056), ('proteins', 0.054), ('edges', 0.052), ('baldi', 0.052), ('hidden', 0.049), ('folding', 0.047), ('directed', 0.042), ('recursive', 0.041), ('architectures', 0.039), ('tp', 0.039), ('pdb', 0.039), ('contacts', 0.039), ('output', 0.038), ('evolutionary', 0.038), ('validation', 0.037), ('contact', 0.035), ('graph', 0.035), ('planes', 0.033), ('hi', 0.033), ('pattern', 0.032), ('greedy', 0.031), ('bridge', 0.031), ('bioinformatics', 0.031), ('sequence', 0.03), ('archive', 0.03), ('cardinal', 0.03), ('connectivities', 0.03), ('dublin', 0.03), ('fariselli', 0.03), ('iij', 0.03), ('oij', 0.03), ('solvent', 0.03), ('vullo', 0.03), ('secondary', 0.03), ('nn', 0.03), ('stability', 0.028), ('acids', 0.028), ('candidate', 0.027), ('reported', 0.027), ('edge', 0.027), ('predicting', 0.026), ('probabilities', 0.026), ('lattice', 0.026), ('compiled', 0.026), ('scratch', 0.026), ('accessibility', 0.026), ('recall', 0.025), ('predicted', 0.025), ('sequences', 0.025), ('layers', 0.024), ('pair', 0.024), ('ltered', 0.024), ('fp', 0.024), ('biochemistry', 0.022), ('matching', 0.022), ('layer', 0.021), ('total', 0.021), ('pro', 0.02), ('networks', 0.02), ('nodes', 0.02), ('amino', 0.02), ('contextual', 0.02), ('perfect', 0.019), ('input', 0.019), ('irvine', 0.019), ('weight', 0.019), ('alignment', 0.019), ('literature', 0.019), ('predictions', 0.019), ('structure', 0.018), ('intelligent', 0.018), ('overcomes', 0.018), ('correlation', 0.018), ('vertices', 0.018), ('containing', 0.018), ('approaches', 0.018), ('sharing', 0.017), ('connections', 0.017), ('four', 0.016), ('column', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi

2 0.074912496 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale

Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag

Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high afﬁnity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we ﬁnd that 2/3 of the predicted active motifs correspond to actual interaction sites. 1

3 0.062909998 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps

Author: Frank Dimaio, George Phillips, Jude W. Shavlik

Abstract: X-ray crystallography is currently the most common way protein structures are elucidated. One of the most time-consuming steps in the crystallographic process is interpretation of the electron density map, a task that involves finding patterns in a three-dimensional picture of a protein. This paper describes DEFT (DEFormable Template), an algorithm using pictorial structures to build a flexible protein model from the protein's amino-acid sequence. Matching this pictorial structure into the density map is a way of automating density-map interpretation. Also described are several extensions to the pictorial structure matching algorithm necessary for this automated interpretation. DEFT is tested on a set of density maps ranging from 2 to 4Å resolution, producing rootmean-squared errors ranging from 1.38 to 1.84Å. 1 In trod u ction An important question in molecular biology is what is the structure of a particular protein? Knowledge of a protein’s unique conformation provides insight into the mechanisms by which a protein acts. However, no algorithm exists that accurately maps sequence to structure, and one is forced to use

4 0.054062758 177 nips-2004-Supervised Graph Inference

Author: Jean-philippe Vert, Yoshihiro Yamanishi

Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1

5 0.049932204 52 nips-2004-Discrete profile alignment via constrained information bottleneck

Author: Sean O'rourke, Gal Chechik, Robin Friedman, Eleazar Eskin

Abstract: Amino acid proﬁles, which capture position-speciﬁc mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, proﬁle comparisons are much more computationally expensive than discrete symbol comparisons, making proﬁles impractical for many large datasets. Furthermore, because they are such a rich representation, proﬁles can be diﬃcult to visualize. To overcome these problems, we propose a discretization for proﬁles using an expanded alphabet representing not just individual amino acids, but common proﬁles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we ﬁnd an informationally optimal alphabet. This discretization yields a concise, informative textual representation for proﬁle sequences. Also alignments between these sequences, while nearly as accurate as the full proﬁleproﬁle alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using proﬁles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which proﬁle information can be applied. 1

6 0.044609878 165 nips-2004-Semi-supervised Learning on Directed Graphs

7 0.037220195 122 nips-2004-Modelling Uncertainty in the Game of Go

8 0.035315346 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern

9 0.034593057 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

10 0.03446117 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

11 0.032548103 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks

12 0.032453783 183 nips-2004-Temporal-Difference Networks

13 0.032212876 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

14 0.030083807 44 nips-2004-Conditional Random Fields for Object Recognition

15 0.029858176 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data

16 0.028161123 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

17 0.027516035 124 nips-2004-Multiple Alignment of Continuous Time Series

18 0.026884703 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

19 0.026683833 28 nips-2004-Bayesian inference in spiking neurons

20 0.026092838 131 nips-2004-Non-Local Manifold Tangent Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.094), (1, -0.01), (2, 0.005), (3, -0.024), (4, 0.006), (5, 0.029), (6, 0.003), (7, 0.037), (8, -0.007), (9, -0.083), (10, -0.018), (11, -0.052), (12, -0.023), (13, 0.032), (14, 0.031), (15, -0.004), (16, -0.03), (17, -0.054), (18, 0.045), (19, -0.002), (20, 0.047), (21, 0.069), (22, -0.111), (23, 0.037), (24, 0.051), (25, 0.027), (26, -0.121), (27, -0.074), (28, 0.075), (29, -0.022), (30, -0.137), (31, 0.094), (32, 0.12), (33, -0.038), (34, 0.015), (35, -0.033), (36, 0.105), (37, 0.103), (38, -0.003), (39, 0.067), (40, -0.045), (41, -0.002), (42, -0.069), (43, -0.017), (44, -0.046), (45, 0.116), (46, -0.139), (47, 0.05), (48, 0.118), (49, -0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93376994 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi

2 0.75651193 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps

Author: Frank Dimaio, George Phillips, Jude W. Shavlik

3 0.7460283 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale

Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag

4 0.56223989 6 nips-2004-A Hidden Markov Model for de Novo Peptide Sequencing

Author: Bernd Fischer, Volker Roth, Jonas Grossmann, Sacha Baginsky, Wilhelm Gruissem, Franz Roos, Peter Widmayer, Joachim M. Buhmann

Abstract: De novo Sequencing of peptides is a challenging task in proteome research. While there exist reliable DNA-sequencing methods, the highthroughput de novo sequencing of proteins by mass spectrometry is still an open problem. Current approaches suffer from a lack in precision to detect mass peaks in the spectrograms. In this paper we present a novel method for de novo peptide sequencing based on a hidden Markov model. Experiments effectively demonstrate that this new method significantly outperforms standard approaches in matching quality. 1

5 0.4749369 177 nips-2004-Supervised Graph Inference

Author: Jean-philippe Vert, Yoshihiro Yamanishi

6 0.4422296 52 nips-2004-Discrete profile alignment via constrained information bottleneck

7 0.42822716 57 nips-2004-Economic Properties of Social Networks

8 0.35844636 141 nips-2004-Optimal sub-graphical models

9 0.35195863 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data

10 0.35183302 165 nips-2004-Semi-supervised Learning on Directed Graphs

11 0.31932628 122 nips-2004-Modelling Uncertainty in the Game of Go

12 0.31384459 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification

13 0.31377739 183 nips-2004-Temporal-Difference Networks

14 0.30492157 130 nips-2004-Newscast EM

15 0.27487659 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models

16 0.26208249 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data

17 0.24851005 180 nips-2004-Synchronization of neural networks by mutual learning and its application to cryptography

18 0.23859142 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)

19 0.23516254 124 nips-2004-Multiple Alignment of Continuous Time Series

20 0.23476316 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(9, 0.393), (13, 0.078), (15, 0.118), (17, 0.013), (25, 0.013), (26, 0.037), (31, 0.025), (33, 0.14), (35, 0.019), (39, 0.021), (50, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74293864 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity

Author: Jianlin Cheng, Alessandro Vullo, Pierre F. Baldi

2 0.74108529 87 nips-2004-Integrating Topics and Syntax

Author: Thomas L. Griffiths, Mark Steyvers, David M. Blei, Joshua B. Tenenbaum

Abstract: Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously ﬁnd syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classiﬁcation with models that exclusively use short- and long-range dependencies respectively. 1

3 0.63449275 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers

Author: Ligen Wang, Balázs Kégl

Abstract: In this paper we propose to combine two powerful ideas, boosting and manifold learning. On the one hand, we improve A DA B OOST by incorporating knowledge on the structure of the data into base classiﬁer design and selection. On the other hand, we use A DA B OOST’s efﬁcient learning mechanism to signiﬁcantly improve supervised and semi-supervised algorithms proposed in the context of manifold learning. Beside the speciﬁc manifold-based penalization, the resulting algorithm also accommodates the boosting of a large family of regularized learning algorithms. 1

4 0.60169494 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

Author: Odelia Schwartz, Terrence J. Sejnowski, Peter Dayan

Abstract: In the analysis of natural images, Gaussian scale mixtures (GSM) have been used to account for the statistics of ﬁlter responses, and to inspire hierarchical cortical representational learning schemes. GSMs pose a critical assignment problem, working out which ﬁlter responses were generated by a common multiplicative factor. We present a new approach to solving this assignment problem through a probabilistic extension to the basic GSM, and show how to perform inference in the model using Gibbs sampling. We demonstrate the efﬁcacy of the approach on both synthetic and image data. Understanding the statistical structure of natural images is an important goal for visual neuroscience. Neural representations in early cortical areas decompose images (and likely other sensory inputs) in a way that is sensitive to sophisticated aspects of their probabilistic structure. This structure also plays a key role in methods for image processing and coding. A striking aspect of natural images that has reﬂections in both top-down and bottom-up modeling is coordination across nearby locations, scales, and orientations. From a topdown perspective, this structure has been modeled using what is known as a Gaussian Scale Mixture model (GSM).1–3 GSMs involve a multi-dimensional Gaussian (each dimension of which captures local structure as in a linear ﬁlter), multiplied by a spatialized collection of common hidden scale variables or mixer variables∗ (which capture the coordination). GSMs have wide implications in theories of cortical receptive ﬁeld development, eg the comprehensive bubbles framework of Hyv¨ rinen.4 The mixer variables provide the a top-down account of two bottom-up characteristics of natural image statistics, namely the ‘bowtie’ statistical dependency,5, 6 and the fact that the marginal distributions of receptive ﬁeld-like ﬁlters have high kurtosis.7, 8 In hindsight, these ideas also bear a close relationship with Ruderman and Bialek’s multiplicative bottom-up image analysis framework 9 and statistical models for divisive gain control.6 Coordinated structure has also been addressed in other image work,10–14 and in other domains such as speech15 and ﬁnance.16 Many approaches to the unsupervised speciﬁcation of representations in early cortical areas rely on the coordinated structure.17–21 The idea is to learn linear ﬁlters (eg modeling simple cells as in22, 23 ), and then, based on the coordination, to ﬁnd combinations of these (perhaps non-linearly transformed) as a way of ﬁnding higher order ﬁlters (eg complex cells). One critical facet whose speciﬁcation from data is not obvious is the neighborhood arrangement, ie which linear ﬁlters share which mixer variables. ∗ Mixer variables are also called mutlipliers, but are unrelated to the scales of a wavelet. Here, we suggest a method for ﬁnding the neighborhood based on Bayesian inference of the GSM random variables. In section 1, we consider estimating these components based on information from different-sized neighborhoods and show the modes of failure when inference is too local or too global. Based on these observations, in section 2 we propose an extension to the GSM generative model, in which the mixer variables can overlap probabilistically. We solve the neighborhood assignment problem using Gibbs sampling, and demonstrate the technique on synthetic data. In section 3, we apply the technique to image data. 1 GSM inference of Gaussian and mixer variables In a simple, n-dimensional, version of a GSM, ﬁlter responses l are synthesized † by multiplying an n-dimensional Gaussian with values g = {g1 . . . gn }, by a common mixer variable v. l = vg (1) We assume g are uncorrelated (σ 2 along diagonal of the covariance matrix). For the analytical calculations, we assume that v has a Rayleigh distribution: where 0 < a ≤ 1 parameterizes the strength of the prior p[v] ∝ [v exp −v 2 /2]a (2) For ease, we develop the theory for a = 1. As is well known,2 and repeated in ﬁgure 1(B), the marginal distribution of the resulting GSM is sparse and highly kurtotic. The joint conditional distribution of two elements l1 and l2 , follows a bowtie shape, with the width of the distribution of one dimension increasing for larger values (both positive and negative) of the other dimension. The inverse problem is to estimate the n+1 variables g1 . . . gn , v from the n ﬁlter responses l1 . . . ln . It is formally ill-posed, though regularized through the prior distributions. Four posterior distributions are particularly relevant, and can be derived analytically from the model: rv distribution posterior mean ” “ √ σ |l1 | 2 2 l1 |l1 | B“ 1, σ ” |l1 | ” exp − v − “ p[v|l1 ] 2 2v 2 σ 2 σ 1 |l1 | 1 |l1 | B p[v|l] p[|g1 ||l1 ] p[|g1 ||l] √ B 2, σ 1 (n−2) 2 2 2 ( ) −(n−1) exp − v2 − 2vl2 σ2 l v B(1− n , σ ) 2 √ σ|l1 | g2 l2 “ ” 1 exp − 12 − 12 2σ 1 |l1 | g2 2g l σ B −2, σ|l1 | ”1 |l1 | 2 (2−n) l n l 2 −1, σ “ B( ) σ (n−3) g1 1 l σ σ 1 g2 2 1 exp − 2σ2 l2 − l 1 2 l1 2 2g1 σ |l1 | σ ( ( 2, σ ) ) l B 3−n,σ 2 2 l B 1− n , σ “ 2 ” |l1 | B 0, σ |l1 | “ ” σ B − 1 , |l1 | 2 σ n 1 l |l1 | B( 2 − 2 , σ ) n l B( −1, l ) 2 σ 2 where B(n, x) is the modiﬁed Bessel function of the second kind (see also24 ), l = i li and gi is forced to have the same sign as li , since the mixer variables are always positive. Note that p[v|l1 ] and p[g1 |l1 ] (rows 1,3) are local estimates, while p[v|l] and p[g|l] (rows 2,4) are estimates according to ﬁlter outputs {l1 . . . ln }. The posterior p[v|l] has also been estimated numerically in noise removal for other mixer priors, by Portilla et al 25 The full GSM speciﬁes a hierarchy of mixer variables. Wainwright2 considered a prespeciﬁed tree-based hierarhical arrangement. In practice, for natural sensory data, given a heterogeneous collection of li , it is advantageous to learn the hierachical arrangement from examples. In an approach related to that of the GSM, Karklin and Lewicki19 suggested We describe the l as being ﬁlter responses even in the synthetic case, to facilitate comparison with images. † B A α 1 ... g v 20 1 ... β 0.1 l 0 -5 0 l 2 0 21 0 0 5 l 1 0 l 1 1 l ... l 21 40 20 Actual Distribution 0 D Gaussian 0 5 0 0 -5 0 0 5 0 5 -5 0 g 1 0 5 E(g 1 | l1) 1 .. 40 ) 0.06 -5 0 0 5 2 E(g |l 1 1 .. 20 ) 0 1 E(g | l ) -5 5 E(g | l 1 2 1 .. 20 5 α E(g |l 1 .. 20 ) E(g |l 0 E(v | l α 0.06 E(g | l2) 2 2 0 5 E(v | l 1 .. 20 ) E(g | l1) 1 1 g 0 1 0.06 0 0.06 E(vαl | ) g 40 filters, too global 0.06 0.06 0.06 Distribution 20 filters 1 filter, too local 0.06 vα E Gaussian joint conditional 40 l l C Mixer g ... 21 Multiply Multiply l g Distribution g v 1 .. 40 1 .. 40 ) ) E(g | l 1 1 .. 40 ) Figure 1: A Generative model: each ﬁlter response is generated by multiplying its Gaussian variable by either mixer variable vα , or mixer variable vβ . B Marginal and joint conditional statistics (bowties) of sample synthetic ﬁlter responses. For the joint conditional statistics, intensity is proportional to the bin counts, except that each column is independently re-scaled to ﬁll the range of intensities. C-E Left: actual distributions of mixer and Gaussian variables; other columns: estimates based on different numbers of ﬁlter responses. C Distribution of estimate of the mixer variable vα . Note that mixer variable values are by deﬁnition positive. D Distribution of estimate of one of the Gaussian variables, g1 . E Joint conditional statistics of the estimates of Gaussian variables g1 and g2 . generating log mixer values for all the ﬁlters and learning the linear combinations of a smaller collection of underlying values. Here, we consider the problem in terms of multiple mixer variables, with the linear ﬁlters being clustered into groups that share a single mixer. This poses a critical assignment problem of working out which ﬁlter responses share which mixer variables. We ﬁrst study this issue using synthetic data in which two groups of ﬁlter responses l1 . . . l20 and l21 . . . l40 are generated by two mixer variables vα and vβ (ﬁgure 1). We attempt to infer the components of the GSM model from the synthetic data. Figure 1C;D shows the empirical distributions of estimates of the conditional means of a mixer variable E(vα |{l}) and one of the Gaussian variables E(g1 |{l}) based on different assumed assignments. For estimation based on too few ﬁlter responses, the estimates do not well match the actual distributions. For example, for a local estimate based on a single ﬁlter response, the Gaussian estimate peaks away from zero. For assignments including more ﬁlter responses, the estimates become good. However, inference is also compromised if the estimates for vα are too global, including ﬁlter responses actually generated from vβ (C and D, last column). In (E), we consider the joint conditional statistics of two components, each 1 v v α vγ β g 1 ... v vα B Actual A Generative model 1 100 1 100 0 v 01 l1 ... l100 0 l 1 20 2 0 0 l 1 0 -4 100 Filter number vγ β 1 100 1 0 Filter number 100 1 Filter number 0 E(g 1 | l ) Gibbs fit assumed 0.15 E(g | l ) 0 2 0 1 Mixer Gibbs fit assumed 0.1 4 0 E(g 1 | l ) Distribution Distribution Distribution l 100 Filter number Gaussian 0.2 -20 1 1 0 Filter number Inferred v α Multiply 100 1 Filter number Pixel vγ 1 g 0 C β E(v | l ) β 0 0 0 15 E(v | l ) α 0 E(v | l ) α Figure 2: A Generative model in which each ﬁlter response is generated by multiplication of its Gaussian variable by a mixer variable. The mixer variable, v α , vβ , or vγ , is chosen probabilistically upon each ﬁlter response sample, from a Rayleigh distribution with a = .1. B Top: actual probability of ﬁlter associations with vα , vβ , and vγ ; Bottom: Gibbs estimates of probability of ﬁlter associations corresponding to vα , vβ , and vγ . C Statistics of generated ﬁlter responses, and of Gaussian and mixer estimates from Gibbs sampling. estimating their respective g1 and g2 . Again, as the number of ﬁlter responses increases, the estimates improve, provided that they are taken from the right group of ﬁlter responses with the same mixer variable. Speciﬁcally, the mean estimates of g1 and g2 become more independent (E, third column). Note that for estimations based on a single ﬁlter response, the joint conditional distribution of the Gaussian appears correlated rather than independent (E, second column); for estimation based on too many ﬁlter responses (40 in this example), the joint conditional distribution of the Gaussian estimates shows a dependent (rather than independent) bowtie shape (E, last column). Mixer variable joint statistics also deviate from the actual when the estimations are too local or global (not shown). We have observed qualitatively similar statistics for estimation based on coefﬁcients in natural images. Neighborhood size has also been discussed in the context of the quality of noise removal, assuming a GSM model.26 2 Neighborhood inference: solving the assignment problem The plots in ﬁgure 1 suggest that it should be possible to infer the assignments, ie work out which ﬁlter responses share common mixers, by learning from the statistics of the resulting joint dependencies. Hard assignment problems (in which each ﬁlter response pays allegiance to just one mixer) are notoriously computationally brittle. Soft assignment problems (in which there is a probabilistic relationship between ﬁlter responses and mixers) are computationally better behaved. Further, real world stimuli are likely better captured by the possibility that ﬁlter responses are coordinated in somewhat different collections in different images. We consider a richer, mixture GSM as a generative model (Figure 2). To model the generation of ﬁlter responses li for a single image patch, we multiply each Gaussian variable gi by a single mixer variable from the set v1 . . . vm . We assume that gi has association probabil- ity pij (satisfying j pij = 1, ∀i) of being assigned to mixer variable vj . The assignments are assumed to be made independently for each patch. We use si ∈ {1, 2, . . . m} for the assignments: li = g i vs i (3) Inference and learning in this model proceeds in two stages, according to the expectation maximization algorithm. First, given a ﬁlter response li , we use Gibbs sampling for the E phase to ﬁnd possible appropriate (posterior) assignments. Williams et al.27 suggested using Gibbs sampling to solve a similar assignment problem in the context of dynamic tree models. Second, for the M phase, given the collection of assignments across multiple ﬁlter responses, we update the association probabilities pij . Given sample mixer assignments, we can estimate the Gaussian and mixer components of the GSM using the table of section 1, but restricting the ﬁlter response samples just to those associated with each mixer variable. We tested the ability of this inference method to ﬁnd the associations in the probabilistic mixer variable synthetic example shown in ﬁgure 2, (A,B). The true generative model speciﬁes probabilistic overlap of 3 mixer variables. We generated 5000 samples for each ﬁlter according to the generative model. We ran the Gibbs sampling procedure, setting the number of possible neighborhoods to 5 (e.g., > 3); after 500 iterations the weights converged near to the proper probabilities. In (B, top), we plot the actual probability distributions for the ﬁlter associations with each of the mixer variables. In (B, bottom), we show the estimated associations: the three non-zero estimates closely match the actual distributions; the other two estimates are zero (not shown). The procedure consistently ﬁnds correct associations even in larger examples of data generated with up to 10 mixer variables. In (C) we show an example of the actual and estimated distributions of the mixer and Gaussian components of the GSM. Note that the joint conditional statistics of both mixer and Gaussian are independent, since the variables were generated as such in the synthetic example. The Gibbs procedure can be adjusted for data generated with different parameters a of equation 2, and for related mixers,2 allowing for a range of image coefﬁcient behaviors. 3 Image data Having validated the inference model using synthetic data, we turned to natural images. We derived linear ﬁlters from a multi-scale oriented steerable pyramid,28 with 100 ﬁlters, at 2 preferred orientations, 25 non-overlapping spatial positions (with spatial subsampling of 8 pixels), and two phases (quadrature pairs), and a single spatial frequency peaked at 1/6 cycles/pixel. The image ensemble is 4 images from a standard image compression database (boats, goldhill, plant leaves, and mountain) and 4000 samples. We ran our method with the same parameters as for synthetic data, with 7 possible neighborhoods and Rayleigh parameter a = .1 (as in ﬁgure 2). Figure 3 depicts the association weights pij of the coefﬁcients for each of the obtained mixer variables. In (A), we show a schematic (template) of the association representation that will follow in (B, C) for the actual data. Each mixer variable neighborhood is shown for coefﬁcients of two phases and two orientations along a spatial grid (one grid for each phase). The neighborhood is illustrated via the probability of each coefﬁcient to be generated from a given mixer variable. For the ﬁrst two neighborhoods (B), we also show the image patches that yielded the maximum log likelihood of P (v|patch). The ﬁrst neighborhood (in B) prefers vertical patterns across most of its “receptive ﬁeld”, while the second has a more localized region of horizontal preference. This can also be seen by averaging the 200 image patches with the maximum log likelihood. Strikingly, all the mixer variables group together two phases of quadrature pair (B, C). Quadrature pairs have also been extracted from cortical data, and are the components of ideal complex cell models. Another tendency is to group Phase 2 Phase 1 19 Y position Y position A 0 -19 Phase 1 Phase 2 19 0 -19 -19 0 19 X position -19 0 19 X position B Neighborhood Example max patches Average Neighborhood Example max patches C Neighborhood Average Gaussian 0.25 l2 0 -50 0 l 1 50 0 l 1 Mixer Gibbs fit assumed Gibbs fit assumed Distribution Distribution Distribution D Coefficient 0.12 E(g | l ) 0 2 0 -5 0 E(g 1 | l ) 5 0 E(g 1 | l ) 0.15 ) E(v | l ) β 0 00 15 E(v | l ) α 0 E(v | l ) α Figure 3: A Schematic of the mixer variable neighborhood representation. The probability that each coefﬁcient is associated with the mixer variable ranges from 0 (black) to 1 (white). Left: Vertical and horizontal ﬁlters, at two orientations, and two phases. Each phase is plotted separately, on a 38 by 38 pixel spatial grid. Right: summary of representation, with ﬁlter shapes replaced by oriented lines. Filters are approximately 6 pixels in diameter, with the spacing between ﬁlters 8 pixels. B First two image ensemble neighborhoods obtained from Gibbs sampling. Also shown, are four 38×38 pixel patches that had the maximum log likelihood of P (v|patch), and the average of the ﬁrst 200 maximal patches. C Other image ensemble neighborhoods. D Statistics of representative coefﬁcients of two spatially displaced vertical ﬁlters, and of inferred Gaussian and mixer variables. orientations across space. The phase and iso-orientation grouping bear some interesting similarity to other recent suggestions;17, 18 as do the maximal patches.19 Wavelet ﬁlters have the advantage that they can span a wider spatial extent than is possible with current ICA techniques, and the analysis of parameters such as phase grouping is more controlled. We are comparing the analysis with an ICA ﬁrst-stage representation, which has other obvious advantages. We are also extending the analysis to correlated wavelet ﬁlters; 25 and to simulations with a larger number of neighborhoods. From the obtained associations, we estimated the mixer and Gaussian variables according to our model. In (D) we show representative statistics of the coefﬁcients and of the inferred variables. The learned distributions of Gaussian and mixer variables are quite close to our assumptions. The Gaussian estimates exhibit joint conditional statistics that are roughly independent, and the mixer variables are weakly dependent. We have thus far demonstrated neighborhood inference for an image ensemble, but it is also interesting and perhaps more intuitive to consider inference for particular images or image classes. In ﬁgure 4 (A-B) we demonstrate example mixer variable neighborhoods derived from learning patches of a zebra image (Corel CD-ROM). As before, the neighborhoods are composed of quadrature pairs; however, the spatial conﬁgurations are richer and have A Neighborhood B Neighborhood Average Example max patches Top 25 max patches Average Example max patches Top 25 max patches Figure 4: Example of Gibbs on Zebra image. Image is 151×151 pixels, and each spatial neighborhood spans 38×38 pixels. A, B Example mixer variable neighborhoods. Left: example mixer variable neighborhood, and average of 200 patches that yielded the maximum likelihood of P (v|patch). Right: Image and marked on top of it example patches that yielded the maximum likelihood of P (v|patch). not been previously reported with unsupervised hierarchical methods: for example, in (A), the mixture neighborhood captures a horizontal-bottom/vertical-top spatial conﬁguration. This appears particularly relevant in segmenting regions of the front zebra, as shown by marking in the image the patches i that yielded the maximum log likelihood of P (v|patch). In (B), the mixture neighborhood captures a horizontal conﬁguration, more focused on the horizontal stripes of the front zebra. This example demonstrates the logic behind a probabilistic mixture: coefﬁcients corresponding to the bottom horizontal stripes might be linked with top vertical stripes (A) or to more horizontal stripes (B). 4 Discussion Work on the study of natural image statistics has recently evolved from issues about scalespace hierarchies, wavelets, and their ready induction through unsupervised learning models (loosely based on cortical development) towards the coordinated statistical structure of the wavelet components. This includes bottom-up (eg bowties, hierarchical representations such as complex cells) and top-down (eg GSM) viewpoints. The resulting new insights inform a wealth of models and ideas and form the essential backdrop for the work in this paper. They also link to impressive engineering results in image coding and processing. A most critical aspect of an hierarchical representational model is the way that the structure of the hierarchy is induced. We addressed the hierarchy question using a novel extension to the GSM generative model in which mixer variables (at one level of the hierarchy) enjoy probabilistic assignments to ﬁlter responses (at a lower level). We showed how these assignments can be learned (using Gibbs sampling), and illustrated some of their attractive properties using both synthetic and a variety of image data. We grounded our method ﬁrmly in Bayesian inference of the posterior distributions over the two classes of random variables in a GSM (mixer and Gaussian), placing particular emphasis on the interplay between the generative model and the statistical properties of its components. An obvious question raised by our work is the neural correlate of the two different posterior variables. The Gaussian variable has characteristics resembling those of the output of divisively normalized simple cells;6 the mixer variable is more obviously related to the output of quadrature pair neurons (such as orientation energy or motion energy cells, which may also be divisively normalized). How these different information sources may subsequently be used is of great interest. Acknowledgements This work was funded by the HHMI (OS, TJS) and the Gatsby Charitable Foundation (PD). We are very grateful to Patrik Hoyer, Mike Lewicki, Zhaoping Li, Simon Osindero, Javier Portilla and Eero Simoncelli for discussion. References [1] D Andrews and C Mallows. Scale mixtures of normal distributions. J. Royal Stat. Soc., 36:99–102, 1974. [2] M J Wainwright and E P Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In S. A. Solla, T. K. Leen, and K.-R. M¨ ller, editors, Adv. Neural Information Processing Systems, volume 12, pages 855–861, Cambridge, MA, u May 2000. MIT Press. [3] M J Wainwright, E P Simoncelli, and A S Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Applied and Computational Harmonic Analysis, 11(1):89–123, July 2001. Special issue on wavelet applications. [4] A Hyv¨ rinen, J Hurri, and J Vayrynen. Bubbles: a unifying framework for low-level statistical properties of natural image a sequences. Journal of the Optical Society of America A, 20:1237–1252, May 2003. [5] R W Buccigrossi and E P Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans Image Proc, 8(12):1688–1701, December 1999. [6] O Schwartz and E P Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819–825, August 2001. [7] D J Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4(12):2379–2394, 1987. [8] H Attias and C E Schreiner. Temporal low-order statistics of natural sounds. In M Jordan, M Kearns, and S Solla, editors, Adv in Neural Info Processing Systems, volume 9, pages 27–33. MIT Press, 1997. [9] D L Ruderman and W Bialek. Statistics of natural images: Scaling in the woods. Phys. Rev. Letters, 73(6):814–817, 1994. [10] C Zetzsche, B Wegmann, and E Barth. Nonlinear aspects of primary vision: Entropy reduction beyond decorrelation. In Int’l Symposium, Society for Information Display, volume XXIV, pages 933–936, 1993. [11] J Huang and D Mumford. Statistics of natural images and models. In CVPR, page 547, 1999. [12] J. Romberg, H. Choi, and R. Baraniuk. Bayesian wavelet domain image modeling using hidden Markov trees. In Proc. IEEE Int’l Conf on Image Proc, Kobe, Japan, October 1999. [13] A Turiel, G Mato, N Parga, and J P Nadal. The self-similarity properties of natural images resemble those of turbulent ﬂows. Phys. Rev. Lett., 80:1098–1101, 1998. [14] J Portilla and E P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefﬁcients. Int’l Journal of Computer Vision, 40(1):49–71, 2000. [15] Helmut Brehm and Walter Stammler. Description and generation of spherically invariant speech-model signals. Signal Processing, 12:119–141, 1987. [16] T Bollersley, K Engle, and D Nelson. ARCH models. In B Engle and D McFadden, editors, Handbook of Econometrics V. 1994. [17] A Hyv¨ rinen and P Hoyer. Emergence of topography and complex cell properties from natural images using extensions of a ¨ ICA. In S. A. Solla, T. K. Leen, and K.-R. Muller, editors, Adv. Neural Information Processing Systems, volume 12, pages 827–833, Cambridge, MA, May 2000. MIT Press. [18] P Hoyer and A Hyv¨ rinen. A multi-layer sparse coding network learns contour coding from natural images. Vision Research, a 42(12):1593–1605, 2002. [19] Y Karklin and M S Lewicki. Learning higher-order structures in natural images. Network: Computation in Neural Systems, 14:483–499, 2003. [20] W Laurenz and T Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715– 770, 2002. [21] C Kayser, W Einh¨ user, O D¨ mmer, P K¨ nig, and K P K¨ rding. Extracting slow subspaces from natural videos leads to a u o o complex cells. In G Dorffner, H Bischof, and K Hornik, editors, Proc. Int’l Conf. on Artiﬁcial Neural Networks (ICANN-01), pages 1075–1080, Vienna, Aug 2001. Springer-Verlag, Heidelberg. [22] B A Olshausen and D J Field. Emergence of simple-cell receptive ﬁeld properties by learning a sparse factorial code. Nature, 381:607–609, 1996. [23] A J Bell and T J Sejnowski. The ’independent components’ of natural scenes are edge ﬁlters. Vision Research, 37(23):3327– 3338, 1997. [24] U Grenander and A Srivastava. Probabibility models for clutter in natural images. IEEE Trans. on Patt. Anal. and Mach. Intel., 23:423–429, 2002. [25] J Portilla, V Strela, M Wainwright, and E Simoncelli. Adaptive Wiener denoising using a Gaussian scale mixture model in the wavelet domain. In Proc 8th IEEE Int’l Conf on Image Proc, pages 37–40, Thessaloniki, Greece, Oct 7-10 2001. IEEE Computer Society. [26] J Portilla, V Strela, M Wainwright, and E P Simoncelli. Image denoising using a scale mixture of Gaussians in the wavelet domain. IEEE Trans Image Processing, 12(11):1338–1351, November 2003. [27] C K I Williams and N J Adams. Dynamic trees. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Adv. Neural Information Processing Systems, volume 11, pages 634–640, Cambridge, MA, 1999. MIT Press. [28] E P Simoncelli, W T Freeman, E H Adelson, and D J Heeger. Shiftable multi-scale transforms. IEEE Trans Information Theory, 38(2):587–607, March 1992. Special Issue on Wavelets.

5 0.55204719 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting

Author: Balázs Kégl

Abstract: We have recently proposed an extension of A DA B OOST to regression that uses the median of the base regressors as the ﬁnal regressor. In this paper we extend theoretical results obtained for A DA B OOST to median boosting and to its localized variant. First, we extend recent results on efﬁcient margin maximizing to show that the algorithm can converge to the maximum achievable margin within a preset precision in a ﬁnite number of steps. Then we provide conﬁdence-interval-type bounds on the generalization error. 1

6 0.49779111 200 nips-2004-Using Random Forests in the Structured Language Model

7 0.49674064 145 nips-2004-Parametric Embedding for Class Visualization

8 0.48626482 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling

9 0.46245307 178 nips-2004-Support Vector Classification with Input Data Uncertainty

10 0.46212402 131 nips-2004-Non-Local Manifold Tangent Learning

11 0.46119049 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

12 0.46116996 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

13 0.4569712 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

14 0.45687434 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

15 0.45666465 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

16 0.45655262 127 nips-2004-Neighbourhood Components Analysis

17 0.45645875 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

18 0.4564549 102 nips-2004-Learning first-order Markov models for control

19 0.45626825 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

20 0.45618364 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data