nips nips2003 nips2003-79 nips2003-79-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Darya Chudova, Christopher Hart, Eric Mjolsness, Padhraic Smyth
Abstract: We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures. 1
Bar-Joseph, Z., Gerber, G., Gifford, D., Jaakkola, T., and Simon, I. (2002). A new approach to analyzing gene expression time series data. In The Sixth Annual International Conference on (Research in) Computational (Molecular) Biology (RECOMB), pages 39–48, N.Y. ACM Press. Cho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T. G., Gabrielian, A. E., Landsman, D., Lockhart, D. J., and Davis, R. W. (1998). A genomewide transcriptional analysis of the mitotic cell cycle. Mol Cell, 2(1):65–73. Chudova, D., Gaffney, S., Mjolsness, E., and Smyth, P. (2003). Mixture models for translationinvariant clustering of sets of multi-dimensional curves. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 79–88, Washington, DC. DeSarbo, W. S. and Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5(1):249–282. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 95(25):14863–8. Gaffney, S. J. and Smyth, P. (2003). Curve clustering with random effects regression mixtures. In Bishop, C. M. and Frey, B. J., editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL. Gibson, M. and Mjolsness, E. (2001). Modeling the activity of single genes. In Bower, J. M. and Bolouri, H., editors, Computational Methods in Molecular Biology. MIT Press. James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98:397–408. Mestl, T., Lemay, C., and Glass, L. (1996). Chaos in high-dimensional neural and gene networks. Physica, 98:33. Ramsay, J. and Silverman, B. W. (1997). Functional Data Analysis. Springer-Verlag, New York, NY. Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. (2001). Model-based clustering and data transformations for gene expression data. Bioinformatics, 17(10):977–987.