nips nips2001 nips2001-25 nips2001-25-reference knowledge-graph by maker-knowledge-mining

25 nips-2001-Active Learning in the Drug Discovery Process

Source: pdf

Author: Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, Christian Lemmen

Abstract: We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, ﬁnd those that bind to a target molecule in as few iterations of biological testing as possible. In each iteration a comparatively small batch of compounds is screened for binding to the target. We apply active learning techniques for selecting the successive batches. One selection strategy picks unlabeled examples closest to the maximum margin hyperplane. Another produces many weight vectors by running perceptrons over multiple permutations of the data. Each weight vector votes with its prediction and we pick the unlabeled examples for which the prediction is most evenly split between and . For a third selection strategy note that each unlabeled example bisects the version space of consistent weight vectors. We estimate the volume on both sides of the split by bouncing a billiard through the version space and select unlabeled examples that cause the most even split of the version space. We demonstrate that on two data sets provided by DuPont Pharmaceuticals that all three selection strategies perform comparably well and are much better than selecting random batches for testing. § © ¨

reference text

[Ang88] D. Angluin. Queries and concept learning. Machine Learning, 2:319–342, 1988. [BGV92] B.E. Boser, I.M. Guyon, and V.N. Vapnik. A training algorithm for optimal margin classiﬁers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992. [CAL90] D. Cohn, L. Atlas, and R. Ladner. Training connectionist networks with queries and selective sampling. Advances in Neural Information Processing Systems, 2:566–573, 1990. [CCS00] C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classiﬁers. In Proceedings of ICML2000, page 8, Stanford, CA, 2000. [FS98] Y. Freund and R. Schapire. Large margin classiﬁcation using the perceptron algorithm. In Proc. 11th Annu. Conf. on Comput. Learning Theory. ACM Press, New York, NY, July 1998. [HGC99] Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines: Estimating the bayes point in kernel space. In Proceedings of IJCAI Workshop Support Vector Machines, pages 23–27, 1999. [Joa99] T. Joachims. Making large–scale SVM learning practical. In B. Sch¨ lkopf, o C.J.C. Burges, and A.J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184, Cambridge, MA, 1999. MIT Press. [MGST97] P. Myers, J. Greene, J. Saunders, and S. Teig. Rapid, reliable drug discovery. Today’s Chemist at Work, 6:46–53, 1997. [RM00] P. Ruj´ n and M. Marchand. Computing the bayes kernel classiﬁer. In Advances a in Large Margin Classiﬁers, volume 12, pages 329–348. MIT Press, 2000. [Ruj97] P. Ruj´ n. Playing billiard in version space. Neural Computation, 9:99–122, a 1997. [SOS92] H. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the Fifth Workshop on Computational Learning Theory, pages 287– 294, 1992. [TK00] S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. In Proceedings of the Seventeenth International Conference on Machine Learning, San Francisco, CA, 2000. Morgan Kaufmann.