jmlr jmlr2013 jmlr2013-112 jmlr2013-112-reference knowledge-graph by maker-knowledge-mining

112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library


Source: pdf

Author: Sergey Lisitsyn, Christian Widmer, Fernando J. Iglesias Garcia

Abstract: We present Tapkee, a C++ template library that provides efficient implementations of more than 20 widely used dimensionality reduction techniques ranging from Locally Linear Embedding (Roweis and Saul, 2000) and Isomap (de Silva and Tenenbaum, 2002) to the recently introduced BarnesHut-SNE (van der Maaten, 2013). Our library was designed with a focus on performance and flexibility. For performance, we combine efficient multi-core algorithms, modern data structures and state-of-the-art low-level libraries. To achieve flexibility, we designed a clean interface for applying methods to user data and provide a callback API that facilitates integration with the library. The library is freely available as open-source software and is distributed under the permissive BSD 3-clause license. We encourage the integration of Tapkee into other open-source toolboxes and libraries. For example, Tapkee has been integrated into the codebase of the Shogun toolbox (Sonnenburg et al., 2010), giving us access to a rich set of kernels, distance measures and bindings to common programming languages including Python, Octave, Matlab, R, Java, C#, Ruby, Perl and Lua. Source code, examples and documentation are available at http://tapkee.lisitsyn.me. Keywords: dimensionality reduction, machine learning, C++, open source software


reference text

D. K. Agrafiotis. Stochastic proximity embedding. Journal of Computational Chemistry, 24(10): 1215–1221, 2003. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Science, 14:585–591, 2002. ISSN 10495258. A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. Proceedings of the 23rd International Conference on Machine Learning ICML 06, 1:97–104, 2006. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1): 5–30, 2006. V. de Silva and J. B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. Advances in Neural Information Processing Systems, 15:705–712, 2002. D. L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for highdimensional data. Proceedings of the National Academy of Sciences of the United States of America, 100(10):5591–5596, 2003. M. S. Gashler. Waffles: A machine learning toolkit. Journal of Machine Learning Research, 12 (July):2383–2387, 2011. N. Halko, P.-G. Martinsson, Y. Shkolnisky, and M. Tygert. An algorithm for the principal component analysis of large data sets. SIAM Journal on Scientific Computing, 33(5):2580–2594, 2011. X. He and P. Niyogi. Locality preserving projections. Matrix, 16(December):153–160, 2003. X. He, D. Cai, S. Yan, and H.-J. Zhang. Neighborhood preserving embedding. Tenth IEEE International Conference on Computer Vision ICCV05 Volume 1, 2:1208–1213, 2005. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. S. Sonnenburg, G. Raetsch, S. Henschel, C. Widmer, J. Behr, A. Zien, F. De Bona, A. Binder, C. Gehl, and V. Franc. The SHOGUN machine learning toolbox. Journal of Machine Learning Research, 11(x):1799–1802, 2010. L. van der Maaten. Barnes-Hut-SNE. arXiv preprint arXiv:1301.3342, 2013. T. Zhang, J. Yang, D. Zhao, and X. Ge. Linear local tangent space alignment and application to face recognition. Neurocomputing, 70(7-9):1547–1553, 2007. ISSN 09252312. Z. Zhang and H. Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 26(1):313–338, 2004. 2359