jmlr jmlr2012 jmlr2012-78 jmlr2012-78-reference knowledge-graph by maker-knowledge-mining

78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information


Source: pdf

Author: Jasper Snoek, Ryan P. Adams, Hugo Larochelle

Abstract: While unsupervised learning has long been useful for density modeling, exploratory data analysis and visualization, it has become increasingly important for discovering features that will later be used for discriminative tasks. Discriminative algorithms often work best with highly-informative features; remarkably, such features can often be learned without the labels. One particularly effective way to perform such unsupervised learning has been to use autoencoder neural networks, which find latent representations that are constrained but nevertheless informative for reconstruction. However, pure unsupervised learning with autoencoders can find representations that may or may not be useful for the ultimate discriminative task. It is a continuing challenge to guide the training of an autoencoder so that it finds features which will be useful for predicting labels. Similarly, we often have a priori information regarding what statistical variation will be irrelevant to the ultimate discriminative task, and we would like to be able to use this for guidance as well. Although a typical strategy would be to include a parametric discriminative model as part of the autoencoder training, here we propose a nonparametric approach that uses a Gaussian process to guide the representation. By using a nonparametric model, we can ensure that a useful discriminative function exists for a given set of features, without explicitly instantiating it. We demonstrate the superiority of this guidance mechanism on four data sets, including a real-world application to rehabilitation research. We also show how our proposed approach can learn to explicitly ignore statistically significant covariate information that is label-irrelevant, by evaluating on the small NORB image recognition problem in which pose and lighting labels are available. Keywords: autoencoder, gaussian process, gaussian process latent variable model, representation learning, unsupervised learning


reference text

Ryan P. Adams, Zoubin Ghahramani, and Michael I. Jordan. Tree-structured stick breaking for hierarchical data. In Advances in Neural Information Processing Systems, 2010. Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems, 2007. Christopher M. Bishop and G. D. James. Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nuclear Instruments and Methods in Physics Research, 1993. Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. pre-print, 2010. arXiv:1012.2599. Adam Coates, Honglak Lee, and Andrew Y. Ng. An analysis of single-layer networks in unsupervised feature learning. In Conference on Artificial Intelligence and Statistics, 2011. Garrison W. Cottrell, Paul Munro, and David Zipser. Learning internal representations from grayscale images: An example of extensional programming. In Conference of the Cognitive Science Society, 1987. Li Deng, Mike Seltzer, Dong Yu, Alex Acero, Abdel-Rahman Mohamed, and Geoffrey E. Hinton. Binary coding of speech spectrograms using a deep autoencoder. In Interspeech, 2010. Jacob Goldberger, Sam Roweis, Geoff Hinton, and Ruslan Salakhutdinov. Neighbourhood components analysis. In Advances in Neural Information Processing Systems, 2004. Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In IEEE Conference on Computer Vision and Pattern Recognition, 2006. 2586 N ONPARAMETRIC G UIDANCE OF AUTOENCODERS Rajibul Huq, Patricia Kan, Robby Goetschalckx, Debbie H´ bert, Jesse Hoey, and Alex Mihailidis. e A decision-theoretic approach in the design of an adaptive upper-limb stroke rehabilitation robot. In International Conference of Rehabilitation Robotics (ICORR), 2011. Patricia Kan, Rajibul Huq, Jesse Hoey, Robby Goestschalckx, and Alex Mihailidis. The development of an adaptive upper-limb stroke rehabilitation robotic system. Neuroengineering and Rehabilitation, 2011. Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In International Conference on Machine Learning, 2007. Neil D. Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6:1783–1816, 2005. Neil D. Lawrence and Joaquin Qui˜ onero-Candela. Local distance preservation in the GP-LVM n through back constraints. In International Conference on Machine Learning, 2006. Neil D. Lawrence and Raquel Urtasun. Non-linear matrix factorization with Gaussian processes. In International Conference on Machine Learning, 2009. Yann LeCun, Fu Jie Huang, and L´ on Bottou. Learning methods for generic object recognition with e invariance to pose and lighting. IEEE Conference on Computer Vision and Pattern Recognition, 2004. David Lowe and Michael E. Tipping. Neuroscale: Novel topographic feature extraction using RBF networks. In Advances in Neural Information Processing Systems, 1997. Elaine Lu, Rosalie Wang, Rajibul Huq, Don Gardner, Paul Karam, Karl Zabjek, Debbie H´ bert, e Jennifer Boger, and Alex Mihailidis. Development of a robotic device for upper limb stroke rehabilitation: A user-centered design approach. Journal of Behavioral Robotics, 2(4):176–184, 2011. David J. C. MacKay. Introduction to Gaussian processes. Neural Networks and Machine Learning, 1998. David J.C. MacKay. Bayesian neural networks and density networks. In Nuclear Instruments and Methods in Physics Research, A, pages 73–80, 1994. ˘ Jonas Mockus, Vytautas Tie˘is, and Antanas Zilinskas. The application of Bayesian methods for s seeking the extremum. Towards Global Optimization, 2:117–129, 1978. Vinod Nair and Geoffrey E. Hinton. 3D object recognition with deep belief nets. In Advances in Neural Information Processing Systems, 2009. Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted Boltzmann machines. In International Conference on Machine Learning, 2010. Radford Neal. Bayesian learning for neural networks. Lecture Notes in Statistics, 118, 1996. 2587 S NOEK , A DAMS AND L AROCHELLE Marc’Aurelio Ranzato and Martin Szummer. Semi-supervised learning of compact document representations with deep networks. In International Conference on Machine Learning, 2008. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. Ruslan Salakhutdinov and Geoffrey Hinton. Learning a nonlinear embedding by preserving class neighbourhood structure. In Conference on Artificial Intelligence and Statistics, 2007. Ruslan Salakhutdinov and Geoffrey Hinton. Using deep belief nets to learn covariance kernels for Gaussian processes. In Advances in Neural Information Processing Systems, 2008. Ruslan Salakhutdinov and Hugo Larochelle. Efficient learning of deep Boltzmann machines. In Conference on Artificial Intelligence and Statistics, 2010. Aaron P. Shon, Keith Grochow, Aaron Hertzmann, and Rajesh P. N. Rao. Learning shared latent structure for image synthesis and robotic imitation. In Advances in Neural Information Processing Systems, 2005. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, 2012. Babak Taati, Rosalie Wang, Rajibul Huq, Jasper Snoek, and Alex Mihailidis. Vision-based posture assessment to detect and categorize compensation during robotic rehabilitation therapy. In International Conference on Biomedical Robotics and Biomechatronics, 2012. Antonio Torralba, Rob Fergus, and William T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958–1970, 2008. Raquel Urtasun and Trevor Darrell. Discriminative Gaussian process latent variable model for classification. In International Conference on Machine Learning, 2007. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In International Conference on Machine Learning, 2008. Francesco Vivarelli and Christopher K. I. Williams. Discovering hidden features with Gaussian process regression. In Advances in Neural Information Processing Systems, 1999. Jack M. Wang, David J. Fleet, and Aaron Hertzmann. Multifactor Gaussian process models for style-content separation. In International Conference on Machine Learning, 2007. Jack M. Wang, David J. Fleet, and Aaron Hertzmann. Gaussian process dynamical models for human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008. Christopher K. I. Williams. Computation with infinite neural networks. Neural Computation, 10 (5):1203–1216, 1998. Richard S. Zemel, Christopher K. I. Williams, and Michael C. Mozer. Lending direction to neural networks. Neural Networks, 8:503–512, 1995. 2588