nips nips2011 nips2011-287 nips2011-287-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Salah Rifai, Yann N. Dauphin, Pascal Vincent, Yoshua Bengio, Xavier Muller
Abstract: We combine three important ideas present in previous work for building classifiers: the semi-supervised hypothesis (the input distribution contains information about the classifier), the unsupervised manifold hypothesis (data density concentrates near low-dimensional manifolds), and the manifold hypothesis for classification (different classes correspond to disjoint manifolds separated by low density). We exploit a novel algorithm for capturing manifold structure (high-order contractive auto-encoders) and we show how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping. This representation learning algorithm can be stacked to yield a deep architecture, and we combine it with a domain knowledge-free version of the TangentProp algorithm to encourage the classifier to be insensitive to local directions changes along the manifold. Record-breaking classification results are obtained. 1
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127. Also published as a book. Now Publishers, 2009. Bengio, Y. and Monperrus, M. (2005). Non-local manifold tangent learning. In NIPS’04, pages 129–136. MIT Press. Bengio, Y., Larochelle, H., and Vincent, P. (2006). Non-local manifold parzen windows. In NIPS’05, pages 115–122. MIT Press. 8 Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in NIPS 19. Brand, M. (2003). Charting a manifold. In NIPS’02, pages 961–968. MIT Press. Cayton, L. (2005). Algorithms for manifold learning. Technical Report CS2008-0923, UCSD. Drucker, H. and LeCun, Y. (1992). Improving generalisation performance using double back-propagation. IEEE Transactions on Neural Networks, 3(6), 991–997. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., and Bengio, S. (2010). Why does unsupervised pre-training help deep learning? JMLR, 11, 625–660. Goodfellow, I., Le, Q., Saxe, A., and Ng, A. (2009). Measuring invariances in deep networks. In NIPS’09, pages 646–654. Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. Kavukcuoglu, K., Ranzato, M., Fergus, R., and LeCun, Y. (2009). Learning invariant features through topographic filter maps. pages 1605–1612. IEEE. Lasserre, J. A., Bishop, C. M., and Minka, T. P. (2006). Principled hybrids of generative and discriminative models. pages 87–94, Washington, DC, USA. IEEE Computer Society. LeCun, Y., Haffner, P., Bottou, L., and Bengio, Y. (1999). Object recognition with gradient-based learning. In Shape, Contour and Grouping in Computer Vision, pages 319–345. Springer. Narayanan, H. and Mitter, S. (2010). Sample complexity of testing the manifold hypothesis. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1786–1794. Ranzato, M., Poultney, C., Chopra, S., and LeCun, Y. (2007a). Efficient learning of sparse representations with an energy-based model. In NIPS’06. Ranzato, M., Huang, F., Boureau, Y., and LeCun, Y. (2007b). Unsupervised learning of invariant feature hierarchies with applications to object recognition. IEEE Press. Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y. (2011a). Contracting auto-encoders: Explicit invariance during feature extraction. In Proceedings of the Twenty-eight International Conference on Machine Learning (ICML’11). Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y., Dauphin, Y., and Glorot, X. (2011b). Higher order contractive auto-encoder. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). Roweis, S. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326. Salakhutdinov, R. and Hinton, G. E. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. In AISTATS’2007, San Juan, Porto Rico. Omnipress. Salakhutdinov, R. and Hinton, G. E. (2009). Deep Boltzmann machines. In AISTATS’2009, volume 5, pages 448–455. Simard, P., Victorri, B., LeCun, Y., and Denker, J. (1992). Tangent prop - A formalism for specifying selected invariances in an adaptive network. In NIPS’91, pages 895–903, San Mateo, CA. Morgan Kaufmann. Simard, P. Y., LeCun, Y., and Denker, J. (1993). Efficient pattern recognition using a new transformation distance. In NIPS’92, pages 50–58. Morgan Kaufmann, San Mateo. Tenenbaum, J., de Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323. Trebar, M. and Steele, N. (2008). Application of distributed svm architectures in classifying forest data cover types. Computers and Electronics in Agriculture, 63(2), 119 – 130. Vincent, P. and Bengio, Y. (2003). Manifold parzen windows. In NIPS’02. MIT Press. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR, 11(3371–3408). Weston, J., Ratle, F., and Collobert, R. (2008). Deep learning via semi-supervised embedding. In ICML 2008, pages 1168–1175, New York, NY, USA. Yu, K. and Zhang, T. (2010). Improved local coordinate coding using local tangents. Yu, K., Zhang, T., and Gong, Y. (2009). Nonlinear learning using local coordinate coding. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 2223–2231. 9