nips nips2012 nips2012-93 knowledge-graph by maker-knowledge-mining

93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction


Source: pdf

Author: Pietro D. Lena, Ken Nagata, Pierre F. Baldi

Abstract: Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Residue-residue contact prediction is a fundamental problem in protein structure prediction. [sent-3, score-0.967]

2 Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. [sent-4, score-0.713]

3 Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. [sent-5, score-0.399]

4 For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. [sent-6, score-1.611]

5 The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. [sent-7, score-0.411]

6 Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. [sent-8, score-0.496]

7 Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. [sent-9, score-0.749]

8 The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. [sent-10, score-0.529]

9 To date, the more accurate and reliable computational methods for protein structure prediction are based on homology modeling [27]. [sent-14, score-0.376]

10 However, when good templates do not exist in protein structure repositories or when sequence similarity is very poor–which is often the case–homology modeling is no more effective. [sent-16, score-0.281]

11 This is the realm of ab initio modeling methods, which attempt to recover three-dimensional protein models more or less from scratch. [sent-17, score-0.311]

12 One such representation is the contact map, essentially a sparse binary matrix representing which amino acids are in contact in the 3D structure. [sent-19, score-1.301]

13 While contact map prediction can be viewed as a sub-problem in protein structure prediction, it is well known that it is essentially equivalent to protein structure predictions since 3D structures can be completely recovered from sufficiently large subsets of true contacts [20, 26, 23]. [sent-20, score-1.699]

14 Furthermore, even small sets of correctly predicted contacts can be useful for improving ab initio methods [25]. [sent-21, score-0.488]

15 In short, contact map prediction plays a fundamental role in protein structure prediction and most of the state-of-the art contact predictors use some form of machine learning. [sent-22, score-1.737]

16 There are two main issues arising in contact prediction that have not been addressed systematically: (1) Residue contacts are not randomly distributed in native protein structures, rather they are spatially correlated. [sent-26, score-1.37]

17 Current contact predictors generally do not take into account these correlations, not even at the local level, since the contact probability for a residue pair is typically learned/inferred independently of the contact probabilities in the neighborhood of the pair. [sent-27, score-2.141]

18 In contrast, current machine learning approaches attempt to learn contact map probabilities in a single step. [sent-29, score-0.671]

19 To address these issues, here we introduce a new machine-learning deep architecture, designed as a deep stack of neural networks, in such a way that each level in the stack receives in input and refines the predictions produced at the previous level. [sent-30, score-0.715]

20 Each level can be trained in a fully supervised fashion on the same set of target contacts/non-contacts, thus overcoming the gradient vanishing problem, typical of deep architectures. [sent-31, score-0.233]

21 The present work represents, to our knowledge, the first attempt to introduce spatial correlation in protein contact prediction. [sent-34, score-0.935]

22 1 Data preparation Contact definition and evaluation criteria We define two residues to be in contact if the Euclidean distance between their Cβ atoms (Cα ˚ for Glycines) is lower than 8A. [sent-36, score-0.709]

23 This is the contact definition adopted for the contact prediction assessment in CASP experiments [15]. [sent-37, score-1.36]

24 The protein map of contact (or contact map) provides a two-dimensional translation and rotation invariant representation of the protein three-dimensional structure. [sent-38, score-1.722]

25 The information content of the contact map is not uniform within different regions of the map. [sent-39, score-0.649]

26 Contacts between residues separated by less than 6 residues are dense and can be easily predicted from the secondary structure. [sent-41, score-0.296]

27 Conversely, the sparse long-range contacts are the most informative and also the most difficult to predict. [sent-42, score-0.382]

28 Thus, as in the CASP experiments, we focus primarily on long-range contact prediction for performance assessment. [sent-43, score-0.713]

29 The contact prediction performance is evaluated using the standard accuracy measure [15]: Acc = TP/(TP+FP), where TP and FP are the true positive and false positive predicted contacts, respectively. [sent-44, score-0.809]

30 The most widely accepted measure of performance for contact prediction assessment is Acc for L/5 pairs and sequence separation ≥ 24 [15]. [sent-46, score-0.835]

31 2 Training and test sets In order to asses the performance of our method, a training and a test set of protein domains are derived from the ASTRAL database [6]. [sent-48, score-0.338]

32 73 the precompiled set of protein domains with less than 20% pairwise sequence identity. [sent-50, score-0.292]

33 This yields a training set of 2,191 structures (the list of protein domains can be found as supplementary material of [8]). [sent-54, score-0.348]

34 3 Feature and training example selection In this work, we do not attempt to determine the best static input features for contact prediction. [sent-60, score-0.722]

35 Rather, we focus on a minimal set of features commonly used in machine learning-based contact prediction [11, 2, 21, 7, 24, 5]. [sent-61, score-0.743]

36 001 and up to ten iterations against the non redundant protein sequence database (NR). [sent-64, score-0.274]

37 The secondary structure is predicted with SSPRO [18] and the solvent accessibility with ACCPRO [19]. [sent-65, score-0.256]

38 For a pair of residues, these features are included in the network input by using a 9-residue long sliding window centered at each residue in the pair. [sent-66, score-0.341]

39 The uneven distribution of positive (residue pairs in contact) and negative (residue pairs not in contact) examples in native protein structures requires some rebalancing of the training data. [sent-68, score-0.398]

40 We do not include in our set of selected examples residue pairs with sequence separation less than 6. [sent-70, score-0.336]

41 All the neural networks in the stack have the same topology (same input, hidden, and output layer sizes) with a single hidden layer, and a single sigmoidal output unit estimating the probability of contact between i and j at the level k (Figure 1(a) and 1(b)). [sent-73, score-1.125]

42 Each level k can be trained in a fully supervised fashion, using the same contact maps as targets. [sent-75, score-0.702]

43 In this way, each level of the deep architecture represents a distinct contact predictor. [sent-76, score-0.834]

44 The inputs into NNk can be separated into purely spatial inputs, and temporal inputs (which are not ij purely temporal but include also a spatial component). [sent-77, score-0.516]

45 These purely spatial inputs include evolutionary profiles, predicted secondary structure, and solvent accessibility in a window around residue i and residue j. [sent-79, score-0.874]

46 These are the standard inputs used by most other predictors which attempt to predict contacts in one shot and are described in more detail in Section 2. [sent-80, score-0.466]

47 1 Temporal Features The temporal inputs for NNk correspond to the outputs of the networks NNk−1 at the previous rs ik level in the stack, where r and s range over a neighborhood of i and j. [sent-84, score-0.251]

48 The temporal features capture the idea that residue contacts are not randomly distributed in native protein structures, rather they are spatially correlated: a contacting residue pair is very likely to be in the proximity of a different pair of contacting residues. [sent-86, score-1.411]

49 Although the contact predictions at a given level of the stack are inaccurate, the contact probabilities included in the temporal feature vector can still provide some rough estimation of the contact distribution in a given neighborhood. [sent-88, score-2.232]

50 In particular, through the temporal inputs the architecture ought to be able to capture spatial correlations between contacts, at least over some range. [sent-90, score-0.321]

51 Spatial features for i and j Spatial features Output of NN1 j i j 2 NNi j i Spatial features All zeros Temporal features for i and j Contact map predicted with k the networks NNi j 1 NNi j (b) Temporal input features for NNk ij (a) DST-NN architecture Figure 1: DST-NN architecture. [sent-99, score-0.416]

52 (b) For a pair of residues (i, j), the temporal inputs into NNk+1 conij sist of the contact probabilities produced by the network at the previous level over a neighborhood of (i, j). [sent-102, score-0.987]

53 In contrast, in the proposed model, the learning capabilities are not directly degraded by the depth of the stack, since each level of the stack can be trained in a supervised fashion using true contact maps to provide the targets. [sent-105, score-1.024]

54 The weights ij of NN1 are then used to initialize the weights of NN2 and the predictions obtained with NN1 are ij ij ij used to setup the temporal feature vector of NN2 . [sent-109, score-0.389]

55 The network NN2 is then trained for one epoch ij ij on the same set of examples used for NN1 and this procedure is repeated up to a certain depth. [sent-110, score-0.26]

56 typically the hidden layer size, which affects the total number of connections in the network). [sent-125, score-0.226]

57 The learning time and the 4 generalization capabilities of the particular neural network model are highly affected by the network size parameter. [sent-126, score-0.255]

58 In order to take into account the intrinsic incomparable capabilities of the different DST-NN, NN, and RNN architectures, we perform our tests by considering a range of exponentially increasing hidden layer sizes (4,8,16,32,64, and 128 units) for each architecture. [sent-127, score-0.327]

59 The total number of connection weights for each architecture in function of the hidden layer size, as well as the time needed to perform one training epoch, are shown in Table 1. [sent-128, score-0.392]

60 Figure 2 shows the learning curves of the three methods as a function of the training epoch and the different hidden layer sizes. [sent-129, score-0.342]

61 In particular, for hidden layer sizes larger than or equal to 8, the DST-NN performance are superior to those of NN and RNN, regardless of their sizes. [sent-135, score-0.233]

62 Moreover, note that hidden layer sizes larger than 32 do not increase the generalization capabilities of any one of the three methods (Table 2). [sent-136, score-0.317]

63 The counterintuitive learning curves of the RNN for hidden layer sizes larger than 8 can be explained by considering the structure of the RNN architecture. [sent-137, score-0.282]

64 In our experiments, we use the same boundaries for all the tested hidden layer sizes and, apparently, these proved to be ineffective for hidden layer sizes larger than or equal to 16. [sent-141, score-0.466]

65 For instance, we notice some small overfitting for the DST-NN starting with hidden layer size 32, while the NN starts to show some small overfitting only at hidden layer size 128. [sent-144, score-0.46]

66 On the contrary, the RNN does not show any sign of overfitting in 100 epochs of training, regardless the hidden layer size in the tested range, and the performance in training is somewhat equivalent to the performance in testing. [sent-145, score-0.37]

67 As a final consideration, from Table 2, the NN and RNN best performance on L/5 long range contacts reflect quite well the state-of-the-art in contact prediction [9, 15] with an accuracy in the 21-23% range. [sent-146, score-1.16]

68 6) on the top-scored L/5 long range contacts, it is evident that the DST-NN provides an overall better prediction of the contact map topology. [sent-150, score-0.762]

69 Since the training time for the DST-NN increases substantially with the size of the hidden layers, in these tests we consider only hidden layers of size 16 and 32. [sent-251, score-0.336]

70 On the other end, as shown in Table 2, a hidden layer of size 32 does not limit the generalization performance of our method in comparison to larger sizes. [sent-252, score-0.254]

71 Recall that, according to our general training strategy, when a new network is added to the stack its initial connection weights are copied from the previous-level network in the stack. [sent-254, score-0.429]

72 In all three figures, the lower triangle shows the native contacts (black dots). [sent-258, score-0.428]

73 The blue and red dots in the upper triangle represent the correctly (blue) and incorrectly (red) predicted contacts among the N topscored residue pairs, where N is the number of native contacts at sequence separation ≥ 6. [sent-259, score-1.172]

74 In these tests, according to our general training strategy, each network in the stack has been trained for one single epoch. [sent-281, score-0.37]

75 The approach of training each network for more than one single epoch leads to slightly better accuracy (< 1% of improvement) at the cost of a larger training time (data not shown). [sent-282, score-0.279]

76 Another natural issue concerning DST-NNs is whether the depth of the stack affects the generalization capabilities of the model. [sent-283, score-0.337]

77 To assess this issue, we train a new DST-NN by limiting the depth of the stack to a fixed number of networks and then repeating the training procedure up to 100 epochs (DST-NN3 ). [sent-284, score-0.443]

78 For this test, we use a limit size of 20 networks, which roughly corresponds to the interval with highest learning peaks for hidden layer size 16 (see Figure 2). [sent-285, score-0.251]

79 Due to the increased training time for this model (20 times slower), testing different stack depths is not practical. [sent-286, score-0.264]

80 As shown in Figure 4 and Table 3, although more time consuming, this training technique allows an improvement of approximatively 2% points of accuracy with respect to our general training approach (at least for a hidden layer of size 16). [sent-288, score-0.39]

81 For this reason, restarting the training on a fixed size stack is more advantageous in terms of prediction performance than having a very deep stack. [sent-289, score-0.48]

82 Unfortunately, the optimal stack depth is very likely related to the specific classification problem and it cannot be inferred a priori from the architecture topology. [sent-290, score-0.329]

83 37 Concluding remarks We have presented a novel and general deep machine-learning architecture for contact prediction, implemented as a stack of Neural Networks NNk with two spatial dimensions and one temporal ij dimension. [sent-315, score-1.231]

84 The stack architecture is used to organize the prediction in such a way that each level in the stack can receive in input, through the temporal feature vectors, and refine the predictions produced by the previous stages in the stack. [sent-316, score-0.834]

85 While our architecture is not meant to simulate the folding process, the idea to model the contact prediction in a multi-level fashion seems more natural than the traditional single-shot approach. [sent-318, score-0.943]

86 The proposed architecture is somewhat general and it can be adopted as a starting point for more sophisticate methods for contact prediction or other problems. [sent-320, score-0.806]

87 Moreover, here we considered a simple square neighborhood for encoding the contact predictions in the temporal feature vector; more complex relationships could be discovered by exploiting different topologies for such feature vector . [sent-322, score-0.808]

88 While we have used the true contact map as the target for all the levels in the architecture, it is clear that different targets could be used at different levels [3]. [sent-325, score-0.649]

89 For instance, experimental or simulation data1 on protein folding could be used to generate contact maps at different stages of folding and use those as targets. [sent-326, score-1.012]

90 DST-NNs of the form NNk , with one spatial and one temporal coordinate, could be applied to sequence probi lems, for instance to the prediction of secondary structure or relative solvent accessibility. [sent-330, score-0.447]

91 Likewise, DST-NNs of the form NNl , with three spatial and one temporal coordinate, could be applied, for ijk instance, to problems in weather forecasting [13] or trajectory prediction in robot movements [14]. [sent-331, score-0.284]

92 (1997) Gapped a BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. [sent-343, score-0.247]

93 (2009) Using multio data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. [sent-365, score-0.386]

94 (2007) Improved residue contact prediction using support vector machines and a large feature set, BMC Bioinformatics, 8, 113. [sent-382, score-0.976]

95 (2009) Assessment of domain boundary predictions n and the prediction of intramolecular contacts in CASP8, Proteins, 77(suppl 9), 196-209 [10] Farabet,C. [sent-393, score-0.514]

96 (2001) Progress in predicting inter-residue contacts of proteins with neural networks and correlated mutations. [sent-403, score-0.55]

97 (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles, Proteins, 47(2), 228-235. [sent-441, score-0.477]

98 (2004) Reconstruction of protein structures from a vectorial representation, Phys. [sent-452, score-0.257]

99 (2010) Predicted residue-residue contacts can help the scoring of 3D models. [sent-480, score-0.382]

100 (2008) FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps. [sent-488, score-0.869]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('contact', 0.615), ('contacts', 0.382), ('residue', 0.246), ('protein', 0.229), ('stack', 0.209), ('rnn', 0.193), ('nnk', 0.137), ('nn', 0.135), ('dst', 0.12), ('layer', 0.118), ('proteins', 0.105), ('temporal', 0.098), ('prediction', 0.098), ('deep', 0.097), ('residues', 0.094), ('architecture', 0.093), ('hidden', 0.091), ('epochs', 0.085), ('folding', 0.084), ('nni', 0.082), ('spatial', 0.069), ('casp', 0.068), ('solvent', 0.068), ('network', 0.065), ('secondary', 0.062), ('capabilities', 0.06), ('training', 0.055), ('accessibility', 0.055), ('contacting', 0.055), ('scop', 0.055), ('epoch', 0.054), ('ij', 0.05), ('accuracy', 0.05), ('predicted', 0.046), ('native', 0.046), ('bioinformatics', 0.045), ('separation', 0.043), ('networks', 0.043), ('acids', 0.041), ('astral', 0.041), ('initio', 0.041), ('trained', 0.041), ('randomization', 0.041), ('architectures', 0.04), ('inputs', 0.039), ('layers', 0.038), ('domains', 0.036), ('predictions', 0.034), ('map', 0.034), ('conn', 0.033), ('hl', 0.032), ('assessment', 0.032), ('features', 0.03), ('amino', 0.03), ('level', 0.029), ('structures', 0.028), ('acc', 0.027), ('les', 0.027), ('purely', 0.027), ('neighborhood', 0.027), ('suppl', 0.027), ('depth', 0.027), ('sequence', 0.027), ('organize', 0.027), ('meant', 0.027), ('fashion', 0.026), ('structure', 0.025), ('generalization', 0.024), ('sizes', 0.024), ('proximity', 0.024), ('homology', 0.024), ('curves', 0.024), ('train', 0.024), ('pro', 0.023), ('predictors', 0.023), ('vanishing', 0.023), ('ought', 0.022), ('attempt', 0.022), ('size', 0.021), ('modules', 0.021), ('pairs', 0.02), ('produced', 0.02), ('neural', 0.02), ('weights', 0.02), ('tests', 0.019), ('fp', 0.019), ('ijk', 0.019), ('ab', 0.019), ('coordinates', 0.019), ('database', 0.018), ('table', 0.018), ('affects', 0.017), ('feature', 0.017), ('re', 0.017), ('supervised', 0.017), ('evolutionary', 0.016), ('tting', 0.016), ('strategies', 0.015), ('connection', 0.015), ('range', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Author: Pietro D. Lena, Ken Nagata, Pierre F. Baldi

Abstract: Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components. 1

2 0.12088549 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

3 0.096064128 197 nips-2012-Learning with Recursive Perceptual Representations

Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell

Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1

4 0.095238827 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

Author: Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng

Abstract: We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset. 1

5 0.092209913 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

6 0.088550612 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

7 0.084280498 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines

8 0.062760748 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

9 0.059316512 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

10 0.059263725 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

11 0.058752023 193 nips-2012-Learning to Align from Scratch

12 0.056005698 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

13 0.054032758 65 nips-2012-Cardinality Restricted Boltzmann Machines

14 0.049468994 170 nips-2012-Large Scale Distributed Deep Networks

15 0.046733618 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

16 0.046375647 148 nips-2012-Hamming Distance Metric Learning

17 0.046090007 100 nips-2012-Discriminative Learning of Sum-Product Networks

18 0.045939036 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

19 0.044190403 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

20 0.041807998 62 nips-2012-Burn-in, bias, and the rationality of anchoring


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.113), (1, 0.027), (2, -0.116), (3, 0.014), (4, 0.045), (5, 0.001), (6, -0.012), (7, -0.037), (8, -0.036), (9, -0.016), (10, 0.008), (11, 0.057), (12, -0.05), (13, 0.109), (14, -0.064), (15, -0.051), (16, -0.009), (17, -0.025), (18, 0.08), (19, -0.093), (20, -0.029), (21, -0.068), (22, 0.018), (23, 0.013), (24, 0.049), (25, -0.057), (26, -0.018), (27, 0.03), (28, 0.032), (29, -0.037), (30, 0.068), (31, 0.071), (32, -0.058), (33, 0.024), (34, 0.025), (35, -0.037), (36, 0.004), (37, 0.04), (38, 0.001), (39, 0.036), (40, -0.026), (41, -0.079), (42, -0.005), (43, -0.031), (44, -0.027), (45, -0.004), (46, 0.012), (47, -0.019), (48, 0.053), (49, 0.083)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92753881 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Author: Pietro D. Lena, Ken Nagata, Pierre F. Baldi

Abstract: Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components. 1

2 0.73867899 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

3 0.72048193 170 nips-2012-Large Scale Distributed Deep Networks

Author: Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm. 1

4 0.69155788 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

Author: Dan Ciresan, Alessandro Giusti, Luca M. Gambardella, Jürgen Schmidhuber

Abstract: We address a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy (EM) images. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or nonmembrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succession of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific postprocessing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer. 1

5 0.67586887 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

6 0.66089261 193 nips-2012-Learning to Align from Scratch

7 0.63990051 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

8 0.59165406 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines

9 0.58035976 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

10 0.57626224 65 nips-2012-Cardinality Restricted Boltzmann Machines

11 0.55156475 72 nips-2012-Cocktail Party Processing via Structured Prediction

12 0.54404885 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

13 0.50615555 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

14 0.49854666 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

15 0.49407661 197 nips-2012-Learning with Recursive Perceptual Representations

16 0.44084746 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

17 0.41965091 8 nips-2012-A Generative Model for Parts-based Object Segmentation

18 0.41707113 100 nips-2012-Discriminative Learning of Sum-Product Networks

19 0.40632752 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

20 0.39103159 39 nips-2012-Analog readout for optical reservoir computers


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (21, 0.037), (29, 0.316), (38, 0.091), (42, 0.052), (54, 0.017), (55, 0.05), (74, 0.045), (76, 0.099), (80, 0.091), (92, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71925968 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Author: Pietro D. Lena, Ken Nagata, Pierre F. Baldi

Abstract: Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components. 1

2 0.62830633 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

Author: Yanping Huang, Timothy Hanks, Mike Shadlen, Abram L. Friesen, Rajesh P. Rao

Abstract: How does the brain combine prior knowledge with sensory evidence when making decisions under uncertainty? Two competing descriptive models have been proposed based on experimental data. The first posits an additive offset to a decision variable, implying a static effect of the prior. However, this model is inconsistent with recent data from a motion discrimination task involving temporal integration of uncertain sensory evidence. To explain this data, a second model has been proposed which assumes a time-varying influence of the prior. Here we present a normative model of decision making that incorporates prior knowledge in a principled way. We show that the additive offset model and the time-varying prior model emerge naturally when decision making is viewed within the framework of partially observable Markov decision processes (POMDPs). Decision making in the model reduces to (1) computing beliefs given observations and prior information in a Bayesian manner, and (2) selecting actions based on these beliefs to maximize the expected sum of future rewards. We show that the model can explain both data previously explained using the additive offset model as well as more recent data on the time-varying influence of prior knowledge on decision making. 1

3 0.5569905 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

4 0.55642331 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

Author: Claudio Gentile, Francesco Orabona

Abstract: We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T ) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1

5 0.49993268 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

Author: Nitish Srivastava, Ruslan Salakhutdinov

Abstract: A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains. 1

6 0.49939951 197 nips-2012-Learning with Recursive Perceptual Representations

7 0.49723408 65 nips-2012-Cardinality Restricted Boltzmann Machines

8 0.49543408 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

9 0.49342611 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

10 0.49235946 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

11 0.49224749 168 nips-2012-Kernel Latent SVM for Visual Recognition

12 0.49201873 188 nips-2012-Learning from Distributions via Support Measure Machines

13 0.49198723 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

14 0.49147946 200 nips-2012-Local Supervised Learning through Space Partitioning

15 0.49138406 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

16 0.49044558 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

17 0.49014837 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

18 0.48949188 292 nips-2012-Regularized Off-Policy TD-Learning

19 0.48889166 193 nips-2012-Learning to Align from Scratch

20 0.48878449 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs