nips nips2013 nips2013-93 nips2013-93-reference knowledge-graph by maker-knowledge-mining

93 nips-2013-Discriminative Transfer Learning with Tree-based Priors


Source: pdf

Author: Nitish Srivastava, Ruslan Salakhutdinov

Abstract: High capacity classifiers, such as deep neural networks, often struggle on classes that have very few training examples. We propose a method for improving classification performance for such classes by discovering similar classes and transferring knowledge among them. Our method learns to organize the classes into a tree hierarchy. This tree structure imposes a prior over the classifier’s parameters. We show that the performance of deep neural networks can be improved by applying these priors to the weights in the last layer. Our method combines the strength of discriminatively trained deep neural networks, which typically require large amounts of training data, with tree-based priors, making deep neural networks work well on infrequent classes as well. We also propose an algorithm for learning the underlying tree structure. Starting from an initial pre-specified tree, this algorithm modifies the tree to make it more pertinent to the task being solved, for example, removing semantic relationships in favour of visual ones for an image classification task. Our method achieves state-of-the-art classification results on the CIFAR-100 image data set and the MIR Flickr image-text data set. 1


reference text

[1] E. Bart, I. Porteous, P. Perona, and M. Welling. Unsupervised learning of visual taxonomies. In CVPR, pages 1–8, 2008.

[2] Y. Bengio and Y. LeCun. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, 2007.

[3] Hal Daum´ , III. Bayesian multitask learning with latent hierarchies. In Proceedings of the Twenty-Fifth e Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 135–142, Arlington, Virginia, United States, 2009. AUAI Press.

[4] Theodoros Evgeniou and Massimiliano Pontil. Regularized multi-task learning. In ACM SIGKDD, 2004.

[5] Li Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(4):594–611, April 2006.

[6] Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1319–1327, 2013.

[7] M. Guillaumin, J. Verbeek, and C. Schmid. Multimodal semi-supervised learning for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 902 –909, june 2010.

[8] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.

[9] Mark J. Huiskes and Michael S. Lew. The MIR Flickr retrieval evaluation. In MIR ’08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM.

[10] Mark J. Huiskes, Bart Thomee, and Michael S. Lew. New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In Multimedia Information Retrieval, pages 527–536, 2010.

[11] Zhuoliang Kang, Kristen Grauman, and Fei Sha. Learning with whom to share in multi-task feature learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pages 521–528, New York, NY, USA, June 2011. ACM.

[12] Seyoung Kim and Eric P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML, pages 543–550, 2010.

[13] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. MIT Press, 2012.

[15] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th International Conference on Machine Learning, pages 609–616, 2009.

[16] George A. Miller. Wordnet: a lexical database for english. Commun. ACM, 38(11):39–41, November 1995.

[17] R. Salakhutdinov, J. Tenenbaum, and A. Torralba. Learning to learn with compound hierarchical-deep models. In NIPS. MIT Press, 2011.

[18] R. Salakhutdinov, A. Torralba, and J. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011.

[19] R. R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 12, 2009.

[20] Babak Shahbaba and Radford M. Neal. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2(1):221–238, 2007.

[21] Nitish Srivastava and Ruslan Salakhutdinov. Multimodal learning with deep boltzmann machines. In Advances in Neural Information Processing Systems 25, pages 2231–2239. MIT Press, 2012.

[22] Jakob Verbeek, Matthieu Guillaumin, Thomas Mensink, and Cordelia Schmid. Image Annotation with TagProp on the MIRFLICKR set. In 11th ACM International Conference on Multimedia Information Retrieval (MIR ’10), pages 537–546. ACM Press, 2010.

[23] Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. Multi-task learning for classification with dirichlet process priors. J. Mach. Learn. Res., 8:35–63, May 2007.

[24] Matthew D. Zeiler and Rob Fergus. Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557, 2013.

[25] Alon Zweig and Daphna Weinshall. Hierarchical regularization cascade for joint learning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 37–45, May 2013. 9