jmlr jmlr2011 jmlr2011-60 jmlr2011-60-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Umut Ozertem, Deniz Erdogmus
Abstract: Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous first and second derivatives. Therefore, we first present our principal curve/surface definition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems. Keywords: unsupervised learning, dimensionality reduction, principal curves, principal surfaces, subspace constrained mean-shift
F. Auger and P. Flandrin. Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on Signal Processing, 43(5):1068–1089, May 1995. J. D. Banfield and A. E. Raftery. Ice floe identification in satellite images using mathematical morphology and clustering about principal curves. Journal of the American Statistical Association, 87(417):7–16, 1992. G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10):2385–2404, 2000. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, June 2003. R. E. Bellman. Adaptive Control Processes. Princeton University Press, Princeton, NJ, 1961. Y. Bengio, H. Larochelle, and P. Vincent. Non-local manifold parzen windows. In Advances in Neural Information Processing Systems 18, pages 115–122. MIT Press, 2006. C. M. Bishop. Neural Networks for Pattern Recognition, 1st Ed. Clarendon Press, Oxford, 1997. M. A. Carreira-Perpinan. Fast nonparametric clustering with gaussian blurring mean-shift. In ICML ’06: Proceedings of the 23rd International Conference on Machine learning, pages 153–160, New York, NY, USA, 2006. ACM. ISBN 1595933832. J. E. Chacon, T. Duong, and M. P. Wand. Asymptotics for general multivariate kernel density derivative estimators. Statistica Sinica, in press, 2011. K. Chang and J. Grosh. A unified model for probabilistic principal surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1):59–74, 2002. O. Chapelle and A. Zien. Semi-supervised classification by low density separation. In R. G. Cowell and Z. Ghahramani, editors, Proc. of the Tenth Int. Workshop on Artificial Intelligence and Statistics (AISTATS 2005), pages 57–64, Barbados, January 6–8 2005. O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, o MA, 2006. URL http://www.kyb.tuebingen.mpg.de/ssl-book. S. Chen, B. Mulgrew, and P. M. Grant. A clustering technique for digital communications channel equalization using radial basis function networks. IEEE Transactions on Neural Networks, 4(4): 570–590, Jul 1993. 1282 L OCALLY D EFINED P RINCIPAL C URVES AND S URFACES Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Analysis and Machine Intelligence,, 17(8):790–799, 1995. J. Choi, H. Yu, and Y. H. Lee. Adaptive mimo decision feedback equalization for receivers with time-varying channels. Signal Processing, IEEE Transactions on [see also Acoustics, Speech, and Signal Processing, IEEE Transactions on], 53(11):4295–4303, Nov. 2005. D. Comaniciu. An algorithm for data-driven bandwidth selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2):281–288, 2003. D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002. B. N. Delaunay. Sur la sph` re vide. Bulletin of Academy of Sciences of the USSR, (6):793–800, e 1934. P. Delicado. Principal curves and principal oriented points. 1998. URL http://www.econ.upf. es/deehome/what/wpapers/postscripts/309.pdf. T. Duchamp and W. Stuetzle. Geometric properties of principal curves in the plane. Robust Statistics, Data Analysis, and Computer Intensive Methods: In Honor of Peter Huber’s 60th Birthday, 109:135–152, 1996a. T. Duchamp and W. Stuetzle. Extremal properties of principal curves in the plane. The Annals of Statistics, 24(4):1511–1520, 1996b. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2000. D. Erdogmus and U. Ozertem. Nonlinear coordinate unfolding via principal curve projections with application to bss. In 14th International Conference on Neural Information Processing, 2007. L. Fahrmeir and G. Tutz. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer-Verlag, New York, 1994. T. S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics, 1 (2):209–230, 1973. G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wolniansky. Simplified processing for high spectral efficiency wireless communication employing multi-element arrays. IEEE Journal on Selected Areas in Communications, 17(11):1841–1852, Nov 1999. K. Fukunaga and D.R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):176–183, 1971. S. A. Fulop and K. Fitz. Separation of components from impulses in reassigned spectrograms. Acoustical Society of America Journal, 121:1510–1517, 2007. J. Ham, D. Lee, S. Mika, and B. Scholkopf. A kernel view of the dimensionality reduction of manifolds. In Proceedings of the Twenty First International Conference on Machine Learning (ICML-04), pages 369–376, 2004. 1283 O ZERTEM AND E RDOGMUS B. E. Hansen. Uniform convergence rates for kernel estimation with dependent data. Econometric Theory, 24(03):726–748, June 2008. URL http://ideas.repec.org/a/cup/etheor/ v24y2008i03p726-748_08.html. T. Hastie. Principal Curves and Surfaces. PhD thesis, Stanford University, 1984. T. Hastie and W. Stuetzle. Principal curves. Journal of American Statistical Association, 84:502– 516, 1989. A. Hegde, J. C. Principe, D. Erdogmus, U. Ozertem, Y. N. Rao, and H. Peddaneni. Perturbationbased eigenvector updates for on-line principal components analysis and canonical correlation analysis. Journal of VLSI Signal Processing Systems, 45(1-2):85–95, 2006. H. Hotelling. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol., 24, 1933. J. E. Jackson. A User’s Guide to Principal Components. John Wiley and Sons, New York, 1991. I. T. Jolliffe. Principal Components Analysis. Springer-Verlag, Berlin, 1986. M. C. Jones, J. S. Marron, and S. J. Sheather. A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433):401–407, 1996. K. Fitz and L. Haken and P. Christensen. Transient preservation under transformation in an additive sound model. Proceedings of International Computer Music Conference, pages 392–395, 2000. N. Kambhatla and T. K. Leen. Fast non-linear dimension reduction. In Neural Information Processing Systems, pages 152–159, 1994. N. Kambhatla and T. K. Leen. Dimension reduction by local principal component analysis. Neural Computation, 9(7):1493–1516, 1997. E. Karami and M. Shiva. Decision-directed recursive least squares mimo channels tracking. EURASIP Journal of Wireless Communication Networks, 2006(2):7–7, 2006. B. Kegl. Principal Curves: Learning, Design, And Applications. PhD thesis, Concordia University, Montreal, Canada, 1999. B. Kegl and A. Kryzak. Piecewise linear skeletonization using principal curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1):59–74, 2002. B. Kegl, A. Kryzak, T. Linder, and K. Zeger. Learning and design of principal curves. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(3):281–297, 2000. S. Y. Kung, K. I. Diamantaras, and J. S. Taur. Adaptive principal component extraction (apex) and applications. IEEE Transactions on Signal Processing, 42(5):1202–1217, May 1994. P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall/CRC, London, 1989. P. Meinicke and H. Ritter. Local pca learning with resolution-dependent mixtures of gaussians. Proceedings of 9th International Conference on Artificial Neural Networks, pages 497–502, 1999. 1284 L OCALLY D EFINED P RINCIPAL C URVES AND S URFACES A. K. Ozdemir and O. Arikan. A high resolution time frequency representation with significantly reduced cross-terms. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2:693–696, 2000. A. K. Ozdemir, L. Durak, and O. Arikan. High resolution time-frequency analysis by fractional domain warping. IEEE International Conference on Acoustics, Speech, and Signal Processing, 6:3553–3556, 2001. U. Ozertem and D. Erdogmus. Principal curve time warping. IEEE Transactions on Signal Processing, 57(6):2041–2049, 2009. U. Ozertem, D. Erdogmus, and O. Arikan. Piecewise smooth signal denoising via principal curve projections. In IEEE Int. Conf. on Machine Learning for Signal Processing, pages 426 – 431, 2008. E. Parzen. On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3):1065–1076, 1962. V. C. Raykar and R. Duraiswami. Fast optimal bandwidth selection for kernel density estimation. In J. Ghosh, D. Lambert, D. Skillicorn, and J. Srivastava, editors, Proceedings of the sixth SIAM International Conference on Data Mining, pages 524–528, 2006. A.A. Rontogiannis, V. Kekatos, and K. Berberidis. A square-root adaptive v-blast algorithm for fast time-varying mimo channels. IEEE Signal Processing Letters, 13(5):265–268, May 2006. S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, December 2000. S. Sandilya and S. R. Kulkarni. Principal curves with bounded turn. IEEE Transactions on Information Theory, 48(10):2789–2793, 2002. B. Sch¨ lkopf, A. Smola, and K. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10(5):1299–1319, 1998. John Shawe-Taylor and Yoram Singer, editors. Regularization and Semi-supervised Learning on Large Graphs, volume 3120 of Lecture Notes in Computer Science, 2004. Springer. S. J. Sheather and M. C. Jones. A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological), 53(3):683–690, 1991. B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, April 1986. ISBN 0412246201. D. C. Stanford and A. E. Raftery. Finding curvilinear features in spatial point patterns: Principal curve clustering with noise. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6):601–609, 2000. J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, December 2000. 1285 O ZERTEM AND E RDOGMUS R. Tibshirani. Principal curves revisited. Statistics and Computation, 2:183–190, 1992. S. Van Vaerenbergh and I. Santamar´a. A spectral clustering approach to underdetermined postı nonlinear blind source separation of sparse sources. IEEE Transactions on Neural Networks, 17 (3):811–814, May 2006. S. Van Vaerenbergh and I. Santamaria. A spectral clustering approach for blind decoding of mimo transmissions over time-correlated fading channels. In E. Hines and M. Martinez, editors, Intelligent Systems: Techniques and Applications. Shaker Publishing, 2008. S. Van Vaerenbergh, E. Est´ banez, and I. Santamar´a. A spectral clustering algorithm for decode ı ing fast time-varying BPSK MIMO channels. In 15th European Signal Processing Conference, Poznan, Poland, September 2007. J. J. Verbeek, N. A. Vlassis, and B. Kr¨ se. A soft k-segments algorithm for principal curves. In o ICANN ’01: Proceedings of the International Conference on Artificial Neural Networks, pages 450–456, London, UK, 2001. Springer-Verlag. ISBN 3-540-42486-5. J. J. Verbeek, N. Vlassis, and B. Kr¨ se. A k-segments algorithm for finding principal curves. Pattern o Recognition Letters, 23(8):1009–1017, 2002. P. Vincent and Y. Bengio. Manifold parzen windows. In Advances in Neural Information Processing Systems 15, pages 825–832, 2003. K. Q. Weinberger and L. K. Saul. Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1):77–90, 2006. A. S. Wong, K. Wong, and C. Wong. A practical sequential method for principal component analysis. Neural Processing Letters, 11(2):107–112, 2000. C. Yang, R. Duraiswami, D. Dementhon, and L. Davis. Mean-shift analysis using quasi-newton methods. In Proceedings of the International Conference on Image Processing, pages 447–450, 2003a. C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis. Improved fast gauss transform and efficient kernel density estimation. In Ninth IEEE International Conference on Computer Vision, pages 664–671 vol.1, 2003b. 1286