nips nips2011 nips2011-7 nips2011-7-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew A. Kayala, Pierre F. Baldi
Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
[1] E.J. Corey and W.T. Wipke. 166(3902):178–92, 1969. Computer-assisted design of complex organic syntheses. Science,
[2] M.H. Todd. Computer-aided organic synthesis. Chem. Soc. Rev., 34(3):247–266, 2005.
[3] P. Rydberg, D.E. Gloriam, J. Zaretzki, C. Breneman, and L. Olsen. SMARTCyp: A 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS Med. Chem. Lett., 1(3):96–100, 2010. 8
[4] G. Henkelman, B.P. Uberuaga, and H. J´ nsson. A climbing image nudged elastic band method for finding o saddle points and minimum energy paths. J. Chem. Phys., 113(22):9901–9904, 2000.
[5] B. Peters, A. Heyden, A.T. Bell, and A. Chakraborty. A growing string method for determining transition states: comparison to the nudged elastic band and string methods. J. Chem. Phys., 120(17):7877–7886, 2004.
[6] C.J. Cramer. Essentials of Computational Chemistry: Theories and Models. Wiley, West Sussex, England, 2 edition, 2004.
[7] W.L. Jorgensen, E.R. Laird, A.J. Gushurst, J.M. Fleischer, S.A. Gothe, H.E. Helson, G.D. Paderes, and S. Sinclair. CAMEO: a program from the logical prediction of the products of organic reactions. Pure Appl. Chem., 62:1921–1932, 1990.
[8] R. Hollering, J. Gasteiger, L. Steinhauer, K.-P. Schulz, and A. Herwig. Simulation of organic reactions: from the degradation of chemicals to combinatorial synthesis. J. Chem. Inf. Model., 40(2):482–494, 2000.
[9] G. Benk¨ , C. Flamm, and P.F. Stadler. A graph-based toy model of chemistry. J. Chem. Inf. Model., o 43(4):1085–1093, 2003.
[10] I.M. Socorro, K. Taylor, and J.M. Goodman. 7(16):3541–3544, 2005. ROBIA: a reaction prediction program. Org. Lett.,
[11] J. Chen and P. Baldi. No electron left behind: a rule-based expert system to predict chemical reactions and reaction mechanisms. J. Chem. Inf. Model., 49(9):2034–2043, 2009.
[12] P. R¨ se and J. Gasteiger. Automated derivation of reaction rules for the EROS 6.0 system for reaction o prediction. Anal. Chim. Acta, 235:163–168, 1990.
[13] B. Wang and Z. Cao. Mechanism of acid-catalyzed hydrolysis of formamide from cluster-continuum model calculations: concerted versus stepwise pathway. J. Phys. Chem. A, 114(49):12918–12927, 2010.
[14] C.A. James, D. Weininger, and J. Delany. Daylight theory manual. http://www.daylight.com/ dayhtml/doc/theory/index.html, 2008. Last accessed January 2011.
[15] C.K. Ingold. Structure and Mechanism in Organic Chemistry. Cornell University Press, Ithaca, NY, 1953.
[16] R. Grossman. The Art of Writing Reasonable Organic Reaction Mechanisms. Springer-Verlag, New York, NY, 2 edition, 2003.
[17] G. Rozenberg, editor. Handbook of Graph Grammars and Computing by Graph Transformation: Volume I. Foundations. World Scientific Publishing, River Edge, NJ, 1997.
[18] D.L. Banville. Mining chemical structural information from the drug literature. Drug Discovery Today, 11:35–42, 2006.
[19] J. Park, G.R. Rosania, and K. Saitou. Tunable machine vision-based strategy for automated annotation of chemical databases. J. Chem. Inf. Model., 49(8):1993–2001, 2009.
[20] D.D. Ridley. Searching for chemical reaction information. In S.R. Heller, editor, The Beilstein Online Database, volume 436 of ACS Symposium Series, pages 88–112. American Chemical Society, Washington, DC, 1990.
[21] D.L. Roth. SPRESIweb 2.1, a selective chemical synthesis and reaction database. J. Chem. Inf. Model., 45(5):1470–1473, 2005.
[22] J. Gasteiger and T. Engel, editors. Chemoinformatics: A Textbook. Wiley-VCH, Weinheim, Germany, 2003.
[23] V. H¨ hnke, B. Hofmann, T. Grgat, E. Proschak, D. Steinhilber, and G. Schneider. PhAST: pharmacophore a alignment search tool. J. Comput. Chem., 30(5):761–71, 2009.
[24] R. Neuneier and H.-G. Zimmermann. How to train neural networks. In G.B. Orr and K.-R. M¨ ller, editors, u Neural Networks: Tricks of the Trade, pages 373–423. Springer-Verlag, Heidelberg, Germany, 1998.
[25] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning (ICML05), pages 89–96. ACM Press, Bonn, Germany, 2005.
[26] K. J¨ rvelin and J. Kek¨ l¨ inen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., a aa 20(4):422–446, 2002. 9