nips nips2012 nips2012-193 nips2012-193-reference knowledge-graph by maker-knowledge-mining

193 nips-2012-Learning to Align from Scratch

Source: pdf

Author: Gary Huang, Marwan Mattar, Honglak Lee, Erik G. Learned-miller

Abstract: Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face veriﬁcation. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Speciﬁcally, we incorporate deep learning into the congealing alignment framework. Through deep learning, we obtain features that can represent the image at differing resolutions based on network depth, and that are tuned to the statistics of the speciﬁc data being aligned. In addition, we modify the learning algorithm for the restricted Boltzmann machine by incorporating a group sparsity penalty, leading to a topographic organization of the learned ﬁlters and improving subsequent alignment results. We apply our method to the Labeled Faces in the Wild database (LFW). Using the aligned images produced by our proposed unsupervised algorithm, we achieve higher accuracy in face veriﬁcation compared to prior work in both unsupervised and supervised alignment. We also match the accuracy for the best available commercial method. 1

reference text

[1] L. Wolf, T. Hassner, and Y. Taigman. Similarity scores based on background samples. In ACCV, 2009.

[2] Y. Taigman, L. Wolf, and T. Hassner. Multiple one-shots for utilizing class label information. In BMVC, 2009.

[3] M. Everingham, J. Sivic, and A. Zisserman. “Hello! My name is... Buffy” - automatic naming of characters in TV video. In BMVC, 2006.

[4] T. L. Berg, A. C. Berg, M. Maire, R. White, Y. W. Teh, E. Learned-Miller, and D. A. Forsyth. Names and faces in the news. In CVPR, 2004.

[5] Y. Zhou, L. Gu, and H.-J. Zhang. Bayesian tangent shape model: Estimating shape and pose parameters via Bayesian inference. In CVPR, 2003.

[6] E. Learned-Miller. Data driven image models through continuous joint alignment. PAMI, 2005.

[7] E. Miller, N. Matsakis, and P. Viola. Learning from one example through shared densities on transforms. In CVPR, 2000.

[8] L. Zollei, E. Learned-Miller, E. Grimson, and W. Wells. Efﬁcient population registration of 3d data. In Workshop on Computer Vision for Biomedical Image Applications: Current Techniques and Future Trends, at ICCV, 2005.

[9] E. Learned-Miller and V. Jain. Many heads are better than one: Jointly removing bias from multiple MRIs using nonparametric maximum likelihood. In Proceedings of Information Processing in Medical Imaging, pages 615–626, 2005.

[10] G. B. Huang, V. Jain, and E. Learned-Miller. Unsupervised joint alignment of complex images. In ICCV, 2007.

[11] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.

[12] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.

[13] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In NIPS, 2007. 8

[14] M. Ranzato, Y.-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. In NIPS, 2007.

[15] M. Cox, S. Lucey, S. Sridharan, and J. Cohn. Least squares congealing for unsupervised alignment of images. In CVPR, 2008.

[16] M. Cox, S. Sridharan, S. Lucey, and J. Cohn. Least squares congealing for large numbers of images. In ICCV, 2009.

[17] X. Liu, Y. Tong, and F. W. Wheeler. Simultaneous alignment and clustering for an image ensemble. In ICCV, 2009.

[18] M. A. Mattar, A. R. Hanson, and E. G. Learned-Miller. Unsupervised joint alignment and clustering using Bayesian nonparametrics. In UAI, 2012.

[19] J. Zhu, L. V. Gool, and S. C. Hoi. Unsupervised face alignment by nonrigid mapping. In ICCV, 2009.

[20] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive ﬁeld properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.

[21] H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efﬁcient sparse coding algorithms. In NIPS, 2007.

[22] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus. Deconvolutional networks. In CVPR, 2010.

[23] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Unsupervised learning of hierarchical representations with convolutional deep belief networks. Communications of the ACM, 54(10):95–103, 2011.

[24] J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramid matching using sparse coding for image classiﬁcation. In CVPR, pages 1794–1801, 2009.

[25] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In ICCV, 2009.

[26] G. B. Huang, H. Lee, and E. Learned-Miller. Learning hierarchical representations for face veriﬁcation with convolutional deep belief networks. In CVPR, 2012.

[27] H. Lee, Y. Largman, P. Pham, and A. Y. Ng. Unsupervised feature learning for audio classiﬁcation using convolutional deep belief networks. In NIPS, 2009.

[28] R. Collobert and J. Weston. A uniﬁed architecture for natural language processing: Deep neural networks with multitask learning. In ICML, 2008.

[29] R. Salakhutdinov and G. E. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 50:969–978, 2009.

[30] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In ICML, 2007.

[31] O. Chapelle, B. Sch¨ lkopf, and A. Zien. Semi-supervised learning. MIT Press, 2006. o

[32] M. Yuan and L. Yin. Model selection and estimation in regression with grouped variables. Technical report, University of Wisconsin, 2004.

[33] A. Hyv¨ rinen, P. O. Hoyer, and M. Inki. Topographic independent component analysis. Neural Compua tation, 13(7):1527–1558, 2001.

[34] K. Kavukcuoglu, M. Ranzato, R. Fergus, and Y. LeCun. Learning invariant features through topographic ﬁlter maps. In CVPR, 2009.

[35] M. Norouzi, M. Ranjbar, and G. Mori. Stacks of convolutional restricted Boltzmann machines for shiftinvariant feature learning. In CVPR, pages 2735–2742, 2009.

[36] Y.-L. Boureau, F. R. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, 2010.

[37] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.

[38] H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief net model for visual area V2. In NIPS, 2008.

[39] K. Sohn, D. Y. Jung, H. Lee, and A. H. III. Efﬁcient learning of sparse, distributed, convolutional feature representations for object recognition. In ICCV, 2011.

[40] H. V. Nguyen and L. Bai. Cosine similarity metric learning for face veriﬁcation. In ACCV, 2010.

[41] T. Ojala, M. Pietikinen, and D. Harwood. A comparative study of texture measures with classiﬁcation based on feature distributions. Pattern Recognition, 19(3):51–59, 1996. 9