nips nips2004 nips2004-80 nips2004-80-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Haidong Wang, Eran Segal, Asa Ben-Hur, Daphne Koller, Douglas L. Brutlag
Abstract: Protein interactions typically arise from a physical interaction of one or more small sites on the surface of the two proteins. Identifying these sites is very important for drug and protein design. In this paper, we propose a computational method based on probabilistic relational model that attempts to address this task using high-throughput protein interaction data and a set of short sequence motifs. We learn the model using the EM algorithm, with a branch-and-bound algorithm as an approximate inference for the E-step. Our method searches for motifs whose presence in a pair of interacting proteins can explain their observed interaction. It also tries to determine which motif pairs have high affinity, and can therefore lead to an interaction. We show that our method is more accurate than others at predicting new protein-protein interactions. More importantly, by examining solved structures of protein complexes, we find that 2/3 of the predicted active motifs correspond to actual interaction sites. 1
[1] P. Uetz, et al. A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature, 403(6770):623–7, 2000. 0028-0836 Journal Article.
[2] H. W. Mewes, et al. Mips: a database for genomes and protein sequences. Nucleic Acids Res, 2002.
[3] I. Xenarios, et al.Dip ; the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Research, 30(1):303–305, 2002. (c) 2002 Inst. For Sci. Info.
[4] P. Chakrabarti and J. Janin. Dissecting protein protein recognition sites. PROTEINS: Structure, Function, and Genetics, 47:334–343, 2002.
[5] J. J. Gray, et al.Protein protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. Journal of Molecular Biology, 331:281–299, 2003.
[6] Y. Ofran and B. Rost. Predicted protein-protein interaction sites from local sequence information. FEBS Lett., 544(1-3):236–239, 2003.
[7] R. Jansen, et al. A bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 302:449–53, 2003.
[8] M. Deng, S. Mehta, F. Sun, and T. Chen. Inferring domain-domain interactions from proteinprotein interactions. Genome Res, 12(10):1540–8, 2002. 22253763 1088-9051 Journal Article.
[9] E. Segal, H. Wang, and D. Koller. Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19 Suppl 1:I264–I272, 2003. 1367-4803 Journal Article.
[10] L. Giot, et al. A protein interaction map of drosophila melanogaster. Science, 302(5651):1727– 36, 2003.
[11] L. Falquet, et al. The PROSITE database, its status in 2002. Nucliec Acids Research, 30:235– 238, 2002.
[12] D. R. Caffrey, et al. Are protein protein interfaces more conserved in sequence than the rest of the protein surface? Protein Science, 13:190–202, 2003.
[13] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.
[14] D. Koller and A. Pfeffer. Probabilistic frame-based systems. In Proc. AAAI, pages 580–587, 1998.
[15] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc., B(39):1–39, 1977.
[16] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, pages 689–695, 2000.
[17] M. Henrion. Search-based methods to bound diagnostic probabilities in very large belief nets. In Uncertainty in Artificial Intelligence, pages 142–150, 1991.
[18] E. Sprinzak and H. Margalit. Correlated sequence-signatures as markers of protein-protein interaction. Journal of Molecular Biology, 311:681–692, 2001.
[19] R. Apweiler, et al. The interpro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res, 29(1):37–40, 2001. 1362-4962 Journal Article.
[20] H.M. Berman, et al. The protein data bank. Nucleic Acids Research, 28:235–242, 2000.