nips nips2005 nips2005-103 nips2005-103-reference knowledge-graph by maker-knowledge-mining

103 nips-2005-Kernels for gene regulatory regions


Source: pdf

Author: Jean-philippe Vert, Robert Thurman, William S. Noble

Abstract: We describe a hierarchy of motif-based kernels for multiple alignments of biological sequences, particularly suitable to process regulatory regions of genes. The kernels incorporate progressively more information, with the most complex kernel accounting for a multiple alignment of orthologous regions, the phylogenetic tree relating the species, and the prior knowledge that relevant sequence patterns occur in conserved motif blocks. These kernels can be used in the presence of a library of known transcription factor binding sites, or de novo by iterating over all k-mers of a given length. In the latter mode, a discriminative classifier built from such a kernel not only recognizes a given class of promoter regions, but as a side effect simultaneously identifies a collection of relevant, discriminative sequence motifs. We demonstrate the utility of the motif-based multiple alignment kernels by using a collection of aligned promoter regions from five yeast species to recognize classes of cell-cycle regulated genes. Supplementary data is available at http://noble.gs.washington.edu/proj/pkernel. 1


reference text

[1] D. Y. Chiang, P. O. Brown, and M. B. Eisen. Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics, 17(Supp. 1):S49–S55, 2001.

[2] D. Boffelli, J. McAuliffe, D. Ovcharenko, K. D. Lewis, I. Ovcharenko, L. Pachter, and E. M. Rubin. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science, 299:1391–1394, 2003.

[3] C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In R. B. Altman, A. K. Dunker, L. Hunter, K. Lauderdale, and T. E. Klein, editors, Proceedings of the Pacific Symposium on Biocomputing, pages 564–575, New Jersey, 2002. World Scientific.

[4] X. H-F. Zhang, K. A. Heller, I. Hefter, C. S. Leslie, and L. A. Chasin. Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. Genome Research, 13:2637–2650, 2003.

[5] A. Zien, G. R¨ tch, S. Mika, B. Sch¨ lkopf, T. Lengauer, and K.-R. M¨ ller. Engineering support a o u vector machine kernels that recognize translation initiation sites. Bioinformatics, 16(9):799– 807, 2000.

[6] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge UP, 1998.

[7] K. Tsuda, T. Kin, and K. Asai. Marginalized kernels for biological sequences. Bioinformatics, 18:S268–S275, 2002.

[8] M. Eisen, P. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genomewide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95:14863–14868, 1998.

[9] Paul Cliften, Priya Sudarsanam, Ashwin Desikan, Lucinda Fulton, Bob Fulton, John Majors, Robert Waterston, Barak A. Cohen, and Mark Johnston. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science, 301(5629):71–76, 2003.

[10] Manolis Kellis, Nick Patterson, Matthew Endrizzi, Bruce Birren, and Eric S Lander. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature, 423(6937):241–254, 2003.

[11] GJ Olsen, H Matsuda, R Hagstrom, and R Overbeek. fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci., 10(1):41–48, 1994.