nips nips2013 nips2013-93 nips2013-93-reference knowledge-graph by maker-knowledge-mining

93 nips-2013-Discriminative Transfer Learning with Tree-based Priors

Source: pdf

Author: Nitish Srivastava, Ruslan Salakhutdinov

Abstract: High capacity classiﬁers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classiﬁcation performance for such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classiﬁer’s parameters. We show that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require large amounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-speciﬁed tree, this algorithm modiﬁes the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classiﬁcation task. Our method achieves state-of-the-art classiﬁcation results on the CIFAR-100 image data set and the MIR Flickr image-text data set. 1

reference text

[1] E. Bart, I. Porteous, P. Perona, and M. Welling. Unsupervised learning of visual taxonomies. In CVPR, pages 1–8, 2008.

[2] Y. Bengio and Y. LeCun. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 2007.

[3] Hal Daum´ , III. Bayesian multitask learning with latent hierarchies. In Proceedings of the Twenty-Fifth e Conference on Uncertainty in Artiﬁcial Intelligence, UAI ’09, pages 135–142, Arlington, Virginia, United States, 2009. AUAI Press.

[4] Theodoros Evgeniou and Massimiliano Pontil. Regularized multi-task learning. In ACM SIGKDD, 2004.

[5] Li Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(4):594–611, April 2006.

[6] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1319–1327, 2013.

[7] M. Guillaumin, J. Verbeek, and C. Schmid. Multimodal semi-supervised learning for image classiﬁcation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 902 –909, june 2010.

[8] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.

[9] Mark J. Huiskes and Michael S. Lew. The MIR Flickr retrieval evaluation. In MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM.

[10] Mark J. Huiskes, Bart Thomee, and Michael S. Lew. New trends and ideas in visual concept detection: the MIR ﬂickr retrieval evaluation initiative. In Multimedia Information Retrieval, pages 527–536, 2010.

[11] Zhuoliang Kang, Kristen Grauman, and Fei Sha. Learning with whom to share in multi-task feature learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pages 521–528, New York, NY, USA, June 2011. ACM.

[12] Seyoung Kim and Eric P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML, pages 543–550, 2010.

[13] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. MIT Press, 2012.

[15] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th International Conference on Machine Learning, pages 609–616, 2009.

[16] George A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):39–41, November 1995.

[17] R. Salakhutdinov, J. Tenenbaum, and A. Torralba. Learning to learn with compound hierarchical-deep models. In NIPS. MIT Press, 2011.

[18] R. Salakhutdinov, A. Torralba, and J. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011.

[19] R. R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics, volume 12, 2009.

[20] Babak Shahbaba and Radford M. Neal. Improving classiﬁcation when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2(1):221–238, 2007.

[21] Nitish Srivastava and Ruslan Salakhutdinov. Multimodal learning with deep boltzmann machines. In Advances in Neural Information Processing Systems 25, pages 2231–2239. MIT Press, 2012.

[22] Jakob Verbeek, Matthieu Guillaumin, Thomas Mensink, and Cordelia Schmid. Image Annotation with TagProp on the MIRFLICKR set. In 11th ACM International Conference on Multimedia Information Retrieval (MIR ’10), pages 537–546. ACM Press, 2010.

[23] Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. Multi-task learning for classiﬁcation with dirichlet process priors. J. Mach. Learn. Res., 8:35–63, May 2007.

[24] Matthew D. Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557, 2013.

[25] Alon Zweig and Daphna Weinshall. Hierarchical regularization cascade for joint learning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 37–45, May 2013. 9