jmlr jmlr2007 jmlr2007-5 jmlr2007-5-reference knowledge-graph by maker-knowledge-mining

5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification

Source: pdf

Author: Jia Li, Surajit Ray, Bruce G. Lindsay

Abstract: A new clustering approach based on mode identiﬁcation is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efﬁciently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model ﬁtting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Speciﬁcally, a pairwise separability measure for clusters is deﬁned using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difﬁculty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density

reference text

A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6:1345-1382, 2005. J. D. Banﬁeld and A. E. Raftery. Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803-821, 1993. C. Blake, E. Keogh, and C. J. Merz. UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html, Dept. of Information and Computer Science, UCI, 1998. G. Celeux and G. Govaert. Comparison of the mixture and the classiﬁcation maximum likelihood in cluster analysis. Journal of Statistical Computation & Simulation, 47:127-146, 1993. S. V. Chakravarthy and J. Ghosh. Scale-based clustering using the radial basis function network. IEEE Trans. Neural Networks, 7(5):1250-1261, 1996. M.-Y. Cheng, P. Hall, and J. A. Hartigan. Estimating gradient trees. A Festschrift for Herman Rubin, IMS Lecture Notes Monogr. Ser., 45:237-49, Inst. Math. Statist., Beachwood, OH, 2004. 1721 L I , R AY AND L INDSAY H. Chipman and R. Tibshirani. Hybrid hierarchical clustering with applications to microarray data. Biostatistics, 7(2):286-301, 2006. T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal Royal Statistics Society, 39(1):1-21, 1977. B. S. Everitt, S. Landau, and M. Leese. Cluster Analysis. Oxford University Press US, 2001. C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97:611-631, 2002. C. Fraley and A. E. Raftery. MCLUST Version 3 for R: Normal mixture modeling and model-based clustering. Technical Report, no. 504, Department of Statistics, University of Washington, 2006. T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoret. Comp. Sci., 38(22):293-306, 1985. J. C. Gower and G. J. S. Ross. Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1):54-64, 1969. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc., NJ, USA, 1988. A. K. Jain , M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264-323, 1999. D. Joshi, J. Li, and J. Z. Wang. A computationally efﬁcient approach to the estimation of two- and three-dimensional hidden Markov models. IEEE Transactions on Image Processing, 15(7):18711886, 2006. J. R. Kettenring. The practice of cluster analysis. Journal of Classiﬁcation, 23(1):3-30, 2006. K. Lang. NewsWeeder: Learning to ﬁlter netnews. In Proceeding of the International Conference on Machine Learning, pages 331-339, 1995. Y. Leung, J.-S. Zhang, and Z.-B. Xu. Clustering by scale-space ﬁltering. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12):1396-1410, 2000. J. Li. Two-scale image retrieval with signiﬁcant meta-information feedback. In Proc. ACM Multimedia, pages 499-502, Singapore, November 2005. J. Li. Clustering based on a multi-layer mixture model. Journal of Computational and Graphical Statistics, 14(3):547-568, 2005. J. Li and R. M. Gray. Image Segmentation and Compression Using Hidden Markov Models. Springer, 2000. 1722 N ONPARAMETRIC M ODAL C LUSTERING J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In Proc. ACM Multimedia Conference, pages 911-920, ACM, Santa Barbara, CA, October 2006. J. Li and H. Zha. Two-way Poisson mixture models for simultaneous document classiﬁcation and word clustering. Computational Statistics and Data Analysis, 50(1):163-180, 2006. G. J. McLachlan and D. Peel. Finite Mixture Models. New York: Wiley, 2000. M. C. Minnotte and D. W. Scott. The mode tree: A tool for visualization of nonparametric density features. Journal of Computational and Graphical Statistics, 2(1):51-68, 1993. M. C. Minnotte, D. J. Marchette, and E. J. Wegman. The bumpy road to the mode forest. Journal of Computational and Graphical Statistics, 7(2):239-51, 1998. N. R. Pal and S. K. Pal. A review on image segmentation techniques. Pattern Recognition, 26:127794, 1993. A. Pothen, H. D. Simon, and K. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal on Matrix Analysis and Applications, 11(3):430-452, 1990. S. Ray and B. G. Lindsay. The topography of multivariate normal mixtures. Annals of Statistics, 33(5):2042-2065, 2005. S. J. Roberts. Parametric and nonparametric unsupervised clustering analysis. Pattern Recognition, 30(2):261-272, 1997. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(8):888-905, 2000. J. Z. Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9):947963, 2001. R. Wilson and M. Spann. A new approach to clustering. Pattern Recognition, 23(12):1413-25, 1990. C. F. J. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1):95103, 1983. 1723