nips nips2012 nips2012-177 knowledge-graph by maker-knowledge-mining

177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction


Source: pdf

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 of Brain and Cognitive Engineering, Korea University Abstract The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. [sent-8, score-0.516]

2 The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. [sent-9, score-0.132]

3 In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. [sent-10, score-0.364]

4 Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. [sent-12, score-0.209]

5 1 Introduction The accurate prediction of molecular energetics in chemical compound space (CCS) is a crucial ingredient for compound design efforts in chemical and pharmaceutical industries. [sent-13, score-0.748]

6 One of the major challenges consists of making quantitative estimates in CCS at moderate computational cost (milliseconds per compound or faster). [sent-14, score-0.066]

7 Currently only high level quantum-chemistry calculations, which can take days per molecule depending on property and system, yield the desired “chemical accuracy” of 1 kcal/mol required for computational molecular design. [sent-15, score-0.439]

8 The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. [sent-18, score-0.132]

9 A central question is how to represent molecules in a way that makes prediction of molecular properties feasible and accurate (Von Lilienfeld and Tuckerman, 2006). [sent-19, score-0.3]

10 This question has already been extensively discussed in the cheminformatics literature, and many so-called molecular descriptors exist (Todeschini and Consonni, 2009). [sent-20, score-0.132]

11 Furthermore, they are not necessarily transferable across the whole chemical compound space. [sent-22, score-0.275]

12 We learn the mapping between the molecule and its atomization energy from scratch1 using the “Coulomb matrix” as a low-level molecular descriptor (Rupp et al. [sent-25, score-0.66]

13 inherent problem of the Coulomb matrix descriptor is that it lacks invariance with respect to permutation of atom indices, thus, leading to an exponential blow-up of the problem’s dimensionality. [sent-34, score-0.107]

14 We center the discussion around the two following questions: How to inject permutation invariance optimally into the machine learning model? [sent-35, score-0.068]

15 These three representations are then compared in the light of several models such as Gaussian kernel ridge regression or multilayer neural networks where the Gaussian prior is traded against more flexibility and the ability to learn the representation directly from the data. [sent-39, score-0.384]

16 Related Work In atomic-scale physics and in material sciences, neural networks have been used to model the potential energy surface of single systems (e. [sent-40, score-0.152]

17 , the dynamics of a single molecule over time) since the early 1990s (Lorenz et al. [sent-42, score-0.333]

18 The major difference to the problem presented here is that previous work in modeling quantum mechanical energies looked mostly at the dynamics of one molecule, whereas we use data from different molecules simultaneously (“learning across chemical compound space”). [sent-46, score-0.55]

19 2 Representing Molecules Electronic structure methods based on quantum-mechanical first principles, only require a set of nuclear charges Zi and the corresponding Cartesian coordinates of the atomic positions in 3D space Ri as an input for the calculation of molecular energetics. [sent-49, score-0.169]

20 Specifically, for each molecule, we construct the socalled Coulomb matrix C, that contains information about Zi and Ri in a way that preserves many of the required properties of a good descriptor (Rupp et al. [sent-51, score-0.065]

21 (1) The diagonal elements of the Coulomb matrix correspond to a polynomial fit of the potential energies of the free atoms, while the off-diagonal elements encode the Coulomb repulsion between all possible pairs of nuclei in the molecule. [sent-55, score-0.118]

22 As such, the Coulomb matrix is invariant to translations and rotations of the molecule in 3D space; both transformations must keep the potential energy of the molecule constant by definition. [sent-56, score-0.719]

23 These invisible atoms do not influence the physics of the molecule of interest and make the total number of atoms in the molecule sum to a constant d. [sent-59, score-0.789]

24 In practice, this corresponds to padding the Coulomb matrix by zero-valued entries so that the Coulomb matrix has size d × d, as it has been done by Rupp et al. [sent-60, score-0.07]

25 1 Eigenspectrum Representation The eigenspectrum representation (Rupp et al. [sent-65, score-0.211]

26 It is easy to see that this representation is invariant to permutation of atoms in the Coulomb matrix. [sent-71, score-0.15]

27 On the other hand, the dimensionality of the eigenspectrum d is low compared to the initial 3d − 6 degrees of freedom of most molecules. [sent-72, score-0.148]

28 2 Sorted Coulomb Matrices Another solution to the ordering problem is to choose the permutation of atoms whose associated Coulomb matrix C satisfies ||Ci || ≥ ||Ci+1 || ∀ i where Ci denotes the ith row of the Coulomb matrix. [sent-75, score-0.118]

29 Unlike the eigenspectrum representation, two different molecules have necessarily different associated sorted Coulomb matrices. [sent-76, score-0.391]

30 3 Random(-ly sorted) Coulomb Matrices A way to deal with the larger dimensionality subsequent to taking the whole Coulomb matrix instead of the eigenspectrum is to extend the dataset with Coulomb matrices that are randomly sorted. [sent-78, score-0.23]

31 This is achieved by associating a conditional distribution over Coulomb matrices p(C|M ) to each molecule M . [sent-79, score-0.35]

32 Let C(M ) define the set of matrices that are valid Coulomb matrices of the molecule M . [sent-80, score-0.393]

33 Take any Coulomb matrix C among the set of matrices that are valid Coulomb matrices of M and compute its row norm ||C|| = (||C1 ||, . [sent-83, score-0.108]

34 Draw n ∼ N (0, σI) and find the permutation P that sorts ||C|| + n, that is, find the permutation that satisfies permuteP (||C|| + n) = sort(||C|| + n). [sent-88, score-0.074]

35 Molecules with low atomization energies are depicted in red and molecules with high atomization energies are depicted in blue. [sent-95, score-0.6]

36 3 Predicting Atomization Energies The atomization energy E quantifies the potential energy stored in all chemical bonds. [sent-98, score-0.453]

37 As such, it is defined as the difference between the potential energy of a molecule and the sum of potential energies of its composing isolated atoms. [sent-99, score-0.469]

38 The potential energy of a molecule is the solution to the electronic Schrödinger equation HΦ = EΦ, where H is the Hamiltonian of the molecule and Φ is the state of the system. [sent-100, score-0.703]

39 Obtaining atomization energies from the Schrödinger equation solver is computationally expensive and, as a consequence, only a fraction of the molecules in the chemical compound space can be labeled. [sent-107, score-0.631]

40 In this section, we show how two algorithms of study, kernel ridge regression and the multilayer neural network, are applied to this problem. [sent-109, score-0.31]

41 In kernel ridge regression, the measure of similarity is encoded in the kernel. [sent-111, score-0.121]

42 On the other hand, in multilayer neural networks, the measure of similarity is learned essentially from data and implicitly given by the mapping onto increasingly many layers. [sent-112, score-0.15]

43 1 Kernel Ridge Regression The most basic algorithm to solve the nonlinear regression problem at hand is kernel ridge regression (cf. [sent-116, score-0.199]

44 As is well known, the solution of the minimization problem min α i ref E est (xi ) − Ei 2 2 αi +λ i reads α = (K + λI)−1 E ref , where K is the empirical kernel and the input data xi is either the eigenspectrum of the Coulomb matrix or the vectorized sorted Coulomb matrix. [sent-120, score-0.368]

45 Expanding the dataset with the randomly generated Coulomb matrices described in Section 2. [sent-121, score-0.06]

46 3 yields a huge dataset that is difficult to handle with standard kernel ridge regression algorithms. [sent-122, score-0.177]

47 Although approximations of the kernel can improve its scalability, random Coulomb matrices can be handled more easily by encoding permutations directly into the kernel. [sent-123, score-0.112]

48 The molecule (a) is converted to its randomly sorted Coulomb matrix representation (b). [sent-125, score-0.461]

49 The Coulomb matrix is then converted into a suitable sensory input (c) that is fed to the neural network (d). [sent-126, score-0.124]

50 The output of the neural network is then rescaled to the original energy unit (e). [sent-127, score-0.115]

51 a sum over permutations: 1 ˜ K(xi , xj ) = 2 L (K(xi , Pl (xj )) + K(Pl (xi ), xj )) (3) l=1 where Pl is the l-th permutation of atoms corresponding to the l-th realization of the random Coulomb matrix and L is the total number of permutations. [sent-128, score-0.118]

52 Note that the summation can be replaced by a “max” operator in order to focus on correct alignments of molecules and ignore poor alignments. [sent-130, score-0.165]

53 2 Multilayer Neural Networks A main feature of multilayer neural networks is their ability to learn internal representations that potentially make models statistically and computationally more efficient. [sent-132, score-0.187]

54 Often, a crucial factor for training neural networks successfully, is to start with a favorable initial conditioning of the learning problem, that is, a good sensory input representation and a proper weights initialization. [sent-134, score-0.102]

55 θ θ θ (4) The new representation x is fed as input to the neural network. [sent-143, score-0.099]

56 The full data flow from the raw molecule to the predicted atomization energy is depicted in Figure 3. [sent-148, score-0.526]

57 (2012), we select a subset of 7165 small molecules extracted from a huge database of nearly one billion small molecules collected by Blum and Reymond (2009). [sent-150, score-0.296]

58 These molecules are composed of a maximum of 23 atoms, a maximum of 7 of them are heavy atoms. [sent-151, score-0.148]

59 5 Molecules are converted to a suitable Cartesian coordinates representation using universal forcefield method (Rappé et al. [sent-152, score-0.063]

60 Atomization energies are calculated for each molecule and are ranging from −800 to −2000 kcal/mol. [sent-156, score-0.384]

61 As a result, we have a dataset of 7165 Coulomb matrices of size 23 × 23 with their associated onedimensional labels2 . [sent-157, score-0.06]

62 Model validation For each learning method we used stratified 5-fold cross validation with identical cross validation folds, where the stratification was done by grouping molecules into groups of five by their energies and then randomly assigning one molecule to each fold, as in Rupp et al. [sent-159, score-0.558]

63 Choice of parameters for kernel ridge regression The kernel ridge regression model was trained using a Gaussian kernel (Kij = exp[−||xi − xj ||2 /(2σ 2 )]) where σ is the kernel width. [sent-164, score-0.416]

64 No further scaling or normalization of the data was done, as the meaningfulness of the data in chemical compound space was to be preserved. [sent-165, score-0.275]

65 A grid search with an inner cross validation was used to determine the hyperparameters for each of the five cross validation folds for each method, namely kernel width σ and regularization strength λ. [sent-166, score-0.073]

66 For the eigenspectrum representation the individual folds showed lower regularization parameters (λeig = 2. [sent-169, score-0.21]

67 00) as compared to the sorted Coulomb representation (λsorted = 1. [sent-171, score-0.132]

68 When the algorithm is trained on random Coulomb matrices, we set the number of permutations involved in the kernel to L = 250 (see Equation 3) and grid-search hyperparameters over both the “sum” and “max” kernels. [sent-180, score-0.069]

69 Choice of parameters for the neural network We choose a binarization step θ = 1 (see Equation 4). [sent-185, score-0.085]

70 As a result, the neural network takes approximately 1800 inputs. [sent-186, score-0.068]

71 The error derivative is backpropagated from layer l to layer l − 1 by multiplying it by η = m/n where m and n are the number of input and output units of layer l. [sent-190, score-0.074]

72 We use averaged stochastic gradient descent (ASGD) with minibatches of size 25 for a maximum of 250000 iterations and with ASGD coefficients set so that the neural network remembers approximately 10% of its training history. [sent-193, score-0.068]

73 Training the neural network takes between one hour and one day on a CPU depending on the sample complexity. [sent-195, score-0.068]

74 When using the random Coulomb matrix representation, the prediction for a new molecule is averaged over 10 different realizations of its associated random Coulomb matrix. [sent-196, score-0.349]

75 Linear regression and k-nearest neighbors are inaccurate compared to the more refined kernel methods and multilayer neural network. [sent-262, score-0.253]

76 The multilayer neural network performance varies considerably depending on the type of representation but sets the lowest error in our study on the random Coulomb representation. [sent-263, score-0.227]

77 , 2011) and kernel support vector regression (Smola and Schölkopf, 2004). [sent-265, score-0.087]

78 Linear regression and k-nearest neighbors are clearly off-the-mark compared to the other more sophisticated models such as mixed effects models, kernel methods and multilayer neural networks. [sent-266, score-0.253]

79 While results for kernel algorithms are similar, they all differ considerably from those obtained with the multilayer neural network. [sent-267, score-0.198]

80 In particular, we can observe that they are performing reasonably well with all types of representation while the multilayer neural network performance is highly dependent on the representation fed as input. [sent-268, score-0.281]

81 The neural network performs best with random Coulomb matrices that are intrinsically the richest representation as a whole distribution over Coulomb matrices is associated to each molecule. [sent-270, score-0.191]

82 As the training data increases, the error for Gaussian kernel ridge regression decreases slowly while the neural network can take greater advantage from this additional data. [sent-272, score-0.228]

83 6 Conclusion Predicting molecular energies quickly and accurately across the chemical compound space (CCS) is an important problem as the quantum-mechanical calculations are typically taking days and do not scale well to more complex systems. [sent-273, score-0.484]

84 (2012) and provided a deeper understanding of some of the ingredients for learning a successful mapping between raw molecular geometries and atomization energies. [sent-276, score-0.286]

85 Our results suggest the importance of having flexible priors (in our case, a multilayer network) and lots of data (generated artificially by exploiting symmetries of the Coulomb matrix). [sent-277, score-0.122]

86 Results for kernel ridge regression are more invariant to the representation and to the number of samples than for the multilayer neural network. [sent-282, score-0.364]

87 51 kcal/mol, which is considerably closer to the 1 kcal/mol required for chemical accuracy. [sent-285, score-0.209]

88 Many open problems remain that makes quantum chemistry an attractive challenge for Machine Learning: (1) Are there fundamental modeling limits of the statistical learning approach for quantum chemistry applications or is it rather a matter of producing more training data? [sent-286, score-0.222]

89 (3) Can better representations be devised with inbuilt invariance properties (e. [sent-289, score-0.071]

90 (4) How can we extract physics insights on quantum mechanics from the trained nonlinear ML prediction models? [sent-293, score-0.13]

91 Neural network approach to quantum-chemistry data: Accurate prediction of density functional theory energies. [sent-304, score-0.06]

92 Support vector machine regression (LS-SVM)— an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data? [sent-309, score-0.198]

93 Editorial: Charting chemical space: Challenges and opportunities for artificial intelligence and machine learning. [sent-312, score-0.209]

94 Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. [sent-323, score-0.06]

95 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. [sent-327, score-0.357]

96 Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks. [sent-364, score-0.068]

97 A random-sampling high dimensional model representation neural network for building potential energy surfaces. [sent-367, score-0.171]

98 UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. [sent-385, score-0.286]

99 Fast and accurate modeling of molecular atomization energies with machine learning. [sent-389, score-0.34]

100 Molecular grand-canonical ensemble density functional theory and exploration of chemical space. [sent-407, score-0.209]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('coulomb', 0.797), ('molecule', 0.307), ('chemical', 0.209), ('molecules', 0.148), ('eigenspectrum', 0.148), ('molecular', 0.132), ('atomization', 0.131), ('rupp', 0.125), ('multilayer', 0.122), ('sorted', 0.095), ('energies', 0.077), ('ridge', 0.073), ('compound', 0.066), ('chemistry', 0.061), ('atoms', 0.059), ('quantum', 0.05), ('kernel', 0.048), ('energy', 0.047), ('balabin', 0.046), ('dinger', 0.046), ('schr', 0.046), ('matrices', 0.043), ('network', 0.04), ('argonne', 0.04), ('regression', 0.039), ('physics', 0.038), ('mae', 0.037), ('representation', 0.037), ('permutation', 0.037), ('anatole', 0.034), ('ccs', 0.034), ('invariance', 0.031), ('cartesian', 0.031), ('lorenz', 0.028), ('neural', 0.028), ('et', 0.026), ('folds', 0.025), ('von', 0.023), ('rg', 0.023), ('electronic', 0.023), ('asgd', 0.023), ('czy', 0.023), ('ekaterina', 0.023), ('energetics', 0.023), ('fazli', 0.023), ('hautier', 0.023), ('inbuilt', 0.023), ('kcal', 0.023), ('lilienfeld', 0.023), ('lomakina', 0.023), ('manzhos', 0.023), ('pharmaceutical', 0.023), ('pinheiro', 0.023), ('rapp', 0.023), ('rton', 0.023), ('siamac', 0.023), ('todeschini', 0.023), ('raw', 0.023), ('matrix', 0.022), ('mechanics', 0.022), ('permutations', 0.021), ('tanh', 0.021), ('prediction', 0.02), ('surfaces', 0.02), ('ci', 0.02), ('charges', 0.02), ('eig', 0.02), ('baldi', 0.02), ('leadership', 0.02), ('mol', 0.02), ('networks', 0.02), ('matthias', 0.02), ('sorting', 0.019), ('layer', 0.019), ('dan', 0.019), ('potential', 0.019), ('ref', 0.019), ('guha', 0.019), ('invisible', 0.019), ('decoste', 0.019), ('facility', 0.019), ('jaitly', 0.019), ('lecun', 0.018), ('depicted', 0.018), ('pl', 0.018), ('tangent', 0.018), ('fed', 0.017), ('invariant', 0.017), ('binarization', 0.017), ('simard', 0.017), ('representations', 0.017), ('descriptor', 0.017), ('dataset', 0.017), ('ciresan', 0.017), ('alignments', 0.017), ('input', 0.017), ('neighbors', 0.016), ('strati', 0.016), ('korea', 0.016), ('hamiltonian', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

2 0.035601445 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

3 0.031779822 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of first-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the first-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the first-order level. We propose to achieve this by partitioning the first-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

4 0.031326279 197 nips-2012-Learning with Recursive Perceptual Representations

Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell

Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1

5 0.031059587 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

Author: Suvrit Sra

Abstract: Symmetric positive definite (spd) matrices pervade numerous scientific disciplines, including machine learning and optimization. We consider the key task of measuring distances between two spd matrices; a task that is often nontrivial whenever the distance function must respect the non-Euclidean geometry of spd matrices. Typical non-Euclidean distance measures such as the Riemannian metric δR (X, Y ) = log(Y −1/2 XY −1/2 ) F , are computationally demanding and also complicated to use. To allay some of these difficulties, we introduce a new metric on spd matrices, which not only respects non-Euclidean geometry but also offers faster computation than δR while being less complicated to use. We support our claims theoretically by listing a set of theorems that relate our metric to δR (X, Y ), and experimentally by studying the nonconvex problem of computing matrix geometric means based on squared distances. 1

6 0.030400876 231 nips-2012-Multiple Operator-valued Kernel Learning

7 0.028539186 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

8 0.028373731 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

9 0.028181223 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

10 0.027294789 330 nips-2012-Supervised Learning with Similarity Functions

11 0.026816769 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

12 0.02643222 188 nips-2012-Learning from Distributions via Support Measure Machines

13 0.025647005 187 nips-2012-Learning curves for multi-task Gaussian process regression

14 0.025125189 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

15 0.024740251 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

16 0.024701312 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

17 0.024652855 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

18 0.023813477 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

19 0.023547605 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

20 0.022890298 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.079), (1, 0.023), (2, -0.021), (3, -0.006), (4, 0.005), (5, 0.013), (6, -0.002), (7, 0.023), (8, -0.006), (9, -0.006), (10, -0.014), (11, -0.023), (12, -0.015), (13, 0.002), (14, 0.001), (15, -0.044), (16, 0.016), (17, 0.016), (18, 0.058), (19, -0.036), (20, 0.023), (21, -0.013), (22, -0.003), (23, -0.018), (24, -0.015), (25, -0.012), (26, 0.008), (27, 0.03), (28, -0.009), (29, 0.023), (30, -0.046), (31, 0.014), (32, 0.012), (33, 0.023), (34, 0.046), (35, -0.006), (36, 0.01), (37, -0.015), (38, 0.011), (39, 0.009), (40, 0.018), (41, -0.054), (42, -0.055), (43, 0.0), (44, -0.038), (45, -0.012), (46, 0.05), (47, -0.027), (48, -0.043), (49, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84600896 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

2 0.52815419 167 nips-2012-Kernel Hyperalignment

Author: Alexander Lorbert, Peter J. Ramadge

Abstract: We offer a regularized, kernel extension of the multi-set, orthogonal Procrustes problem, or hyperalignment. Our new method, called Kernel Hyperalignment, expands the scope of hyperalignment to include nonlinear measures of similarity and enables the alignment of multiple datasets with a large number of base features. With direct application to fMRI data analysis, kernel hyperalignment is well-suited for multi-subject alignment of large ROIs, including the entire cortex. We report experiments using real-world, multi-subject fMRI data. 1

3 0.52031481 231 nips-2012-Multiple Operator-valued Kernel Learning

Author: Hachem Kadri, Alain Rakotomamonjy, Philippe Preux, Francis R. Bach

Abstract: Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the case of kernel ridge regression for functional responses with an r -norm constraint on the combination coefficients (r ≥ 1). The resulting optimization problem is more involved than those of multiple scalar-valued kernel learning since operator-valued kernels pose more technical and theoretical issues. We propose a multiple operator-valued kernel learning algorithm based on solving a system of linear operator equations by using a block coordinate-descent procedure. We experimentally validate our approach on a functional regression task in the context of finger movement prediction in brain-computer interfaces. 1

4 0.5128336 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

Author: Kenji Fukumizu, Chenlei Leng

Abstract: We propose a novel kernel approach to dimension reduction for supervised learning: feature extraction and variable selection; the former constructs a small number of features from predictors, and the latter finds a subset of predictors. First, a method of linear feature extraction is proposed using the gradient of regression function, based on the recent development of the kernel method. In comparison with other existing methods, the proposed one has wide applicability without strong assumptions on the regressor or type of variables, and uses computationally simple eigendecomposition, thus applicable to large data sets. Second, in combination of a sparse penalty, the method is extended to variable selection, following the approach by Chen et al. [2]. Experimental results show that the proposed methods successfully find effective features and variables without parametric models. 1

5 0.51164103 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

6 0.50872463 39 nips-2012-Analog readout for optical reservoir computers

7 0.5025906 225 nips-2012-Multi-task Vector Field Learning

8 0.48572481 358 nips-2012-Value Pursuit Iteration

9 0.48518208 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

10 0.48190987 188 nips-2012-Learning from Distributions via Support Measure Machines

11 0.46640927 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

12 0.46197882 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

13 0.46099567 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

14 0.44478077 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

15 0.44200167 338 nips-2012-The Perturbed Variation

16 0.44175068 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

17 0.43839896 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

18 0.43718019 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

19 0.4365536 34 nips-2012-Active Learning of Multi-Index Function Models

20 0.43557748 111 nips-2012-Efficient Sampling for Bipartite Matching Problems


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.025), (21, 0.022), (38, 0.072), (42, 0.022), (54, 0.424), (55, 0.034), (74, 0.049), (76, 0.091), (80, 0.069), (92, 0.063)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89535457 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

same-paper 2 0.76375329 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

3 0.71622694 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

Author: Paul Vernaza, Drew Bagnell

Abstract: Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality— a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data. 1

4 0.69535422 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

5 0.68610013 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

6 0.66934663 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

7 0.56289452 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

8 0.55917966 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

9 0.52839488 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

10 0.52163774 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

11 0.5009554 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

12 0.49556336 38 nips-2012-Algorithms for Learning Markov Field Policies

13 0.48999116 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

14 0.48547745 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

15 0.47996148 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

16 0.47967929 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

17 0.47739494 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

18 0.47597176 348 nips-2012-Tractable Objectives for Robust Policy Optimization

19 0.47590753 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

20 0.47569585 160 nips-2012-Imitation Learning by Coaching