jmlr jmlr2009 jmlr2009-62 jmlr2009-62-reference knowledge-graph by maker-knowledge-mining

62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures

Source: pdf

Author: Babak Shahbaba, Radford Neal

Abstract: We introduce a new nonlinear model for classiﬁcation, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefﬁcients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classiﬁcation problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data. Keywords: mixture models, Dirichlet process, classiﬁcation

reference text

E. L. Allwein, R. E. Schapire, Y. Singer, and P. Kaelbling. Reducing multiclass to binary: A unifying approach for margin classiﬁers. Journal of Machine Learning Research, 1:113–141, 2000. 1847 S HAHBABA AND N EAL C. E. Antoniak. Mixture of Dirichlet process with applications to Bayesian nonparametric problems. Annals of Statistics, 273(5281):1152–1174, 1974. D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn scheme. Annals of Statistics, 1:353–355, 1973. D. M. Blei and M. I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121–144, 2005. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003. L. Breiman, J. Friedman, R. A. Olshen, and C. J. Stone. Classiﬁcation and Regression Trees. Chapman and Hall, Boca Raton, 1993. C. A. Bush and S. N. MacEachern. A semi-parametric Bayesian model for randomized block designs. Biometrika, 83:275–286, 1996. B. Cai and D. B. Dunson. Bayesian covariance selection in generalized linear mixed models. Biometrics, 62:446–457, 2006. L. Cnockaert, J. Schoentgen, P. Auzou, C. Ozsancak, L. Defebvre, and F. Grenez. Low-frequency vocal modulations in vowels produced by parkinsonian subjects. Speech Communication, 50(4): 288–300, 2008. M. J. Daniels and R. E. Kass. Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. Journal of the American Statistical Association, 94(448):1254–1263, 1999. C. H. Q. Ding and I. Dubchak. Multi-class protein fold recognision using support vector machines and neural networks. Bioinformatics, 17(4):349–358, 2001. J. R. Duffy. Motor Speech Disorders: Substrates, Differential Diagnosis and Management. Elsevier Mosby, St. Louis, Mo., 2nd edition, 2005. M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. Journal of American Statistical Society, 90:577–588, 1995. T. S. Ferguson. A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1: 209–230, 1973. T. S. Ferguson. Recent Advances in Statistics, Ed: Rizvi, H. and Rustagi, J., chapter Bayesian density estimation by mixtures of normal distributions, pages 287–302. Academic Press, New York, 1983. J. F¨ rnkranz. Round robin classiﬁcation. Journal of Machine Learning Research, 2:721–747, 2002. u ISSN 1533-7928. Z. Ghahramani and G. E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical report, Technical Report CRG-TR-96-1, Department of Computer Science, University of Toronto, 1996. 1848 N ONLINEAR M ODELS U SING D IRICHLET P ROCESS M IXTURES A. K. Ho, R. Iansek, C. Marigliani, J. L. Bradshaw, and S. Gates. Speech impairment in a large sample of patients with Parkinson’s disease. Behavioural Neurology, 11:131–137, 1998. U. Hobohm and C. Sander. Enlarged representative set of proteins. Protein Science, 3:522–524, 1994. U. Hobohm, M. Scharf, R. Schneider, and C. Sander. Selection of a representative set of structure from the brookhaven protein bank. Protein Science, 1:409–417, 1992. C. W. Hsu and C. J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13:415–425, 2002. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991. S. Jain and R. M. Neal. Splitting and merging components of a nonconjugate Dirichlet process mixture model (with discussion). Bayesian Analysis, 2:445–472, 2007. M. A. Little, P. E. McSharry, E. J. Hunter, J. Spielman, and L. O. Ramig. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Transactions on Biomedical Engineering, (in press), 2008. L. Lo Conte, B. Ailey, T.J.P Hubbard, S. E. Brenner, A.G. Murzing, and C. Chothia. Scop: a structural classiﬁcation of protein database. Nucleic Acids Research, 28:257–259, 2000. S. N. MacEachern and P. M¨ ller. Estimating mixture of Dirichlet process models. Journal of u Computational and Graphcal Statistics, 7:223–238, 1998. E. Meeds and S. Osindero. An alternative inﬁnite mixture of Gaussian process experts. Advances in Neural Information Processing Systems, 18:883, 2006. P. M¨ ller, A. Erkanli, and M. West. Bayesian curve ﬁtting using multivariate mormal mixtures. u Biometrika, 83(1):67–79, 1996. R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9:249–265, 2000. R. M. Neal. Slice sampling. Annals of Statistics, 31(3):705–767, 2003. R. M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, 1993. R. M. Neal. Bayesian Learning for Neural Networks. Lecture Notes in Statistics No. 118, New York: Springer-Verlag, 1996. D. A. Rahn, M. Chou, J. J. Jiang, and Y. Zhang. Phonatory impairment in Parkinson’s disease: evidence from nonlinear dynamic analysis and perturbation analysis.(Clinical report). Journal of Voice, 21:64–71, 2007. C. E. Rasmussen and Z. Ghahramani. Inﬁnite mixtures of Gaussian process experts. In Advances in Neural Information Processing Systems 14, page 881. MIT Press, 2002. 1849 S HAHBABA AND N EAL D. Rubin and D. Thayer. EM algorithms for ML factor analysis. Pshychometrika, 47(1):69–76, 2007. S. Sapir, J. L. Spielman, L. O. Ramig, B. H. Story, and C. Fox. Effects of intensive voice treatment (the Lee Silverman Voice Treatment [lsvt]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: Acoustic and perceptual ﬁndings. Journal of Speech, Language, and Hearing Research, 50(4):899–912, 2007. B. Shahbaba. Discovering hidden structures using mixture models: Application to nonlinear time series processes. Studies in Nonlinear Dynamics & Econometrics, 13(2):Article 5, 2009. B. Shahbaba. Improving Classiﬁcation Models When a Class Hierarchy is Available. PhD thesis, Biostatistics, Public Health Sciences, University of Toronto, 2007. B. Shahbaba and R. M. Neal. Improving classiﬁcation when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2(1):221–238, 2007. B. Shahbaba and R. M. Neal. Gene function classiﬁcation using Bayesian models with hierarchybased priors. BMC Bioinformatics, 7:448, 2006. N. Singh, V. Pillay, and Y. E. Choonara. Advances in the treatment of Parkinson’s disease. Progress in Neurobiology, 81:29–44, 2007. I. Ulusoy and C. M. Bishop. Generative versus discriminative methods for object recognition. In CVPR ’05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2, pages 258–265, Washington, DC, USA, 2005. IEEE Computer Society. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., New York, NY, USA, 1995. ISBN 0-387-94559-8. S. Waterhouse, D. MacKay, and T. Robinson. Bayesian methods for mixtures of experts. In Advances in Neural Information Processing Systems 8, page 351. MIT Press, 1996. 1850