jmlr jmlr2013 jmlr2013-15 jmlr2013-15-reference knowledge-graph by maker-knowledge-mining

15 jmlr-2013-Bayesian Canonical Correlation Analysis


Source: pdf

Author: Arto Klami, Seppo Virtanen, Samuel Kaski

Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation


reference text

Deepak Agarwal, Bee-Chung Chen, and Bo Long. Localized factor models for multi-context recommendation. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), pages 609–617. ACM, New York, NY, USA, 2011. C´ dric Archambeau and Francis Bach. Sparse probabilistic projections. In D. Koller, D. Schuure mans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 73–80. MIT Press, 2009. 998 BAYESIAN C ANONICAL C ORRELATION A NALYSIS C´ dric Archambeau, Nicolas Delannay, and Michel Verleysen. Robust probabilistic projections. e In W.W. Cohen and A. Moore, editors, Proceedings of the 23rd International Conference on Machine Learning, pages 33–40. ACM, 2006. Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. Francis R. Bach and Michael I. Jordan. A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Department of Statistics, University of California, Berkeley, 2005. Christopher M. Bishop. Bayesian PCA. In M. S. Kearns, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 382–388. MIT Press, 1999. Leo Breiman and Jerome H. Friedman. Predicting multivariate responses in multiple linear regression. Journal of Royal Statistical Society B, 59(3), 1997. Michael W. Browne. The maximum-likelihood solution in inter-battery factor analysis. British Journal of Mathematical and Statistical Psychology, 32:75–86, 1979. Andreas Damianou, Carl Ek, Michalis Titsias, and Neil Lawrence. Manifold relevance determination. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012. Tijl De Bie and Bart De Moor. On the regularization of canonical correlation analysis. In Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003), pages 785–790, 2003. Filip Deleus and Marc M. Van Hulle. Functional connectivity analysis of fMRI data based on regularized multiset canonical correlation analysis. Journal of Neuroscience methods, 197(1): 143–157, 2011. Carl H. Ek, Jon Rihan, Philip H.S. Torr, Gr´ gory Rogez, and Neil D. Lawrence. Ambiquity mode elling in latent spaces. In Proceedings of the International Workshop on Machine Learning for Multimodal Interaction (MLMI’08), pages 62–73, 2008. Yusuke Fujiwara, Yoichi Miyawaki, and Yukiyasu Kamitani. Estimating image bases for visual image reconstruction from human brain activity. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 576–584, 2009. Zoubin Ghahramani and Matthew J. Beal. Variational inference for Bayesian mixtures of factor analyzers. In S.A. Solla, T.K. Leen, and K-R. M¨ ller, editors, Advances in Neural Information u Processing Systems 12, pages 449–455. MIT Press, 2000. Zoubin Ghahramani, Thomas L. Griffiths, and Peter Sollich. Bayesian nonparametric latent feature models. Bayesian Statistics, 8, 2007. Ignacio Gonzales, Sebastien Dejean, Pascal G.P. Martin, and Alain Baccini. CCA: An R package to extend canonical correlation analysis. Journal of Statistical Software, 23(12):1–14, 2008. 999 K LAMI , V IRTANEN AND K ASKI Yue Guan and Jennifer G. Dy. Sparse probabilistic principal component analysis. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), Volume 5 of JMLR:W&CP;, pages 185–192, 2009. Aria Haghighi, Percy Liang, Taylor Berh-Kirkpatrick, and Dan Klein. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL-08: HLT, pages 771–779, Columbus, Ohio, June 2008. Association for Computational Linguistics. David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. Harold Hotelling. Relations between two sets of variates. Biometrika, 28:321–377, 1936. William W. Hsieh. Nonlinear canonical correlation analysis by neural networks. Neural Networks, 13:1095–1105, 2000. Ilkka Huopaniemi, Tommi Suvitaival, Janne Nikkil¨ , Matej Oreˇiˇ , and Samuel Kaski. Two-way a sc analysis of high-dimensional collinear data. Data Mining and Knowledge Discovery, 19:261– 276, 2009. Ilkka Huopaniemi, Tommi Suvitaival, Janne Nikkil¨ , Matej Oreˇiˇ , and Samuel Kaski. Multivariate a sc multi-way analysis of multi-source data. Bioinformatics, 26:i391–i398, 2010. (ISMB 2010). Alexander Ilin and Tapani Raiko. Practical approaches to principal component analysis in the presence of missing data. Journal of Machine Learning Research, 11:1957–2000, 2010. Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. Extracting shared subspace for multi-label classification. In Proceedings of thre 14th ACM SIGKDD International Conferece on Knowledge Discovery and Data Mining, pages 381–389, 2008. Yangqing Jia, Mathieu Salzmann, and Trevor Darrell. Factorized latent spaces with structured sparsity. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 982–990. MIT Press, 2010. Alfredo A. Kalaitzis and Neil D. Lawrence. Residual Component Analysis: Generalising PCA for more flexible inference in linear-Gaussian models. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning, pages 209–216. Omnipress, 2012. Arto Klami. Variational Bayesian matching. In S. C. H. Hoi and W. Buntine, editors, Proceedings of the 4th Asian Conference on Machine Learning (ACML), Volume 25 of JMLR:C℘, pages 205-220, 2012. Arto Klami and Samuel Kaski. Generative models that discover dependencies between data sets. In Proceedings of MLSP’06, IEEE International Workshop on Machine Learning for Signal Processing, pages 123–128. IEEE, 2006. Arto Klami and Samuel Kaski. Local dependent components. In Zoubin Ghahramani, editor, Proceedings of ICML 2007, the 24th International Conference on Machine Learning, pages 425– 432. Omnipress, 2007. 1000 BAYESIAN C ANONICAL C ORRELATION A NALYSIS Arto Klami and Samuel Kaski. Probabilistic approach to detecting dependencies between data sets. Neurocomputing, 72:39–46, 2008. Arto Klami, Seppo Virtanen, and Samuel Kaski. Bayesian exponential family projections for coupled data sources. In P. Grunwald and P. Spirtes, editors, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (2010), pages 286–293. AUAI Press, 2010. David Knowles and Zoubin Ghahramani. Nonparametric Bayesian sparse factor models with application to gene expression modeling. Annals of Applied Statistics, 5:B 1534-1552, 2011. Miika Koskinen, Jaakko Viinikanoja, Mikko Kurimo, Arto Klami, Samuel Kaski, and Riitta Hari. Identifying fragments of natural speech from the listener’s MEG signals. Human Brain Mapping, 2012. Leo Lahti, Samuel Myllykangas, Sakari Knuutila, and Samuel Kaski. Dependency detection with similarity constraints. In Proceedings of MLSP 2009, IEEE International Workshop on Machine Learning for Signal Processing, pages 89–94. IEEE, 2009. Leo Lahti, Martin Sch¨ fer, Hans-Ulrich Klein, Silvio Bicciato, and Martin Dugas. Cancer gene a prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review. Briefings in Bioinformatics, March 2012. Pei Ling Lai and Colin Fyfe. Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(5):365–377, 2000. Gayle Leen and Colin Fyfe. A Gaussian process latent variable model formulation of canonical correlation analysis. In Proceedings of 14th European Symposium on Artificial Neural Networks, pages 418–418, 2006. Jaakko Luttinen and Alexander Ilin. Transformations in variational Bayesian factor analysis to speed up learning. Neurocomputing, 73:1093–1102, 2010. Thomas Melzer, Michael Reiter, and Horst Bischof. Nonlinear feature extraction using generalized canonical correlation analysis. In Artificial Neural Networks – ICANN 2001, pages 353–360. Springer, 2001. Shakir Mohamed, Katherine A. Heller, and Zoubin Ghahramani. Bayesian and L1 approaches for sparse unsupervised learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning, pages 751–758. Omnipress, 2012. Takuho Nakano, Akisato Kimura, Hirokazu Kameoka, Shigeki Miyabe, Shigeki Sagayama, Nobutaka Ono, Kunio Kashino, and Takuya Nishimoto. Automatic video annotation via hierarchical topic trajectory model considering cross-model correlations. In Proceedings of the IEEE Internatioanl Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2380–2383, 2011. Radford M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, 1996. James Petterson and Tiberio Caetano. Reverse multi-label learning. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1912–1920. MIT Press, 2010. 1001 K LAMI , V IRTANEN AND K ASKI Yuan Qi and Tommi S. Jaakkola. Parameter expanded variational Bayesian methods. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Syso tems 19, pages 1097–1104. MIT Press, 2007. Piyush Rai and Hal Daum´ III. Multi-label prediction via sparse infinite CCA. In Y. Bengio, e D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1518–1526. MIT Press, 2009. M´ lanie Rey and Volker Roth. Copula mixture model for dependency-seeking clustering. In Proe ceedings of the 29th International Conference on Machine Learning (ICML), 2012. Simon Rogers, Arto Klami, Janne Sinkkonen, Mark Girolami, and Samuel Kaski. Infinite factorization of multiple non-parametric views. Machine Learning, 79(1-2):201–226, 2010. Indrayana Rustandi, Marcel A. Just, and Tom M. Mitchell. Integrating multiple-study multiplesubject fMRI datasets using canonical correlation analysis. In Proceedings of the MICCAI 2009 Workshop: Statistical modeling and detection issues in intra- and inter-subject functional MRI data analysis, 2009. Aaron Shon, Keith Grochow, Aaron Hertzmann, and Rajesh Rao. Learning shared latent structure for image synthesis and robotic imitation. In Y. Weriss, B. Sch¨ lkopf, and J. Platt, editors, o Advances in Neural Information Processing Systems 18, pages 1233–1240. MIT Press, 2010. Ajit P. Singh and Geoffrey J. Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), pages 650–658. ACM, New York, NY, USA, 2008. Liang Sun, Shuiwang Ji, and Jieping Ye. Canonical correlation analysis for multilabel classification: A least-squares formulation, extension, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):194–200, 2011. Yee Whye Teh, Michael I Jordan, Matthew J Beal, and David M Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006. Abhishek Tripathi, Arto Klami, Matej Oreˇiˇ , and Samuel Kaski. Matching samples of multiple sc views. Data Mining and Knowledge Discovery, 23:300–321, 2011. Grigorios Tsoumakas and Ioannis Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the 18th European Conference on Machine Learning (ECML 2007), pages 406–417, 2007. Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Mining multi-label data. In O. Maimon and L. Rokach, editors, Data Mining and Knowledge Discovery Handbook. Springer, 2nd edition, 2010. Ledyard R. Tucker. An inter-battery method of factor analysis. Psychometrika, 23:111–136, 1958. Angelika van der Linde. Reduced rank regression models with latent variables in Bayesian functional data analysis. Bayesian Analysis, 6(1):77–126, 2011. 1002 BAYESIAN C ANONICAL C ORRELATION A NALYSIS Jaakko Viinikanoja, Arto Klami, and Samuel Kaski. Variational Bayesian mixture of robust CCA models. In A. Gionis J. Luis Balc´ zar, F. Bonchi and M. Sebag, editors, Machine Learning and a Knowledge Discovery in Databases. Proceedings of European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, volume III, pages 370–385. Springer, 2010. Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 1473–1480. MIT Press, 2003. Seppo Virtanen, Arto Klami, and Samuel Kaski. Bayesian CCA via group sparsity. In L. Getoor and T. Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pages 457–464. ACM, 2011. Seppo Virtanen, Arto Klami, Suleiman A. Khan, and Samuel Kaski. Bayesian group factor analysis. In N. Lawrence and M. Girolami, editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of JMLR:W&CP;, pages 1269–1277, 2012. Seppo Virtanen, Jangqing Jia, Arto Klami, and Trevor Darrell. Factorized multi-modal topic model. In N. de Freitas and K. Murphy, editors, Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), pages 843–851. AUAI Press, 2012. Chong Wang. Variational Bayesian approach to canonical correlation analysis. IEEE Transactions on Neural Networks, 18:905–910, 2007. David Wipf and Srikantan Nagarajan. A new view on automatic relevance determination. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1625–1632. MIT Press, 2008. Jarkko Ylipaavalniemi, Eerika Savia, Sanna Malinen, Riitta Hari, Ricardo Vig´ rio, and Samuel a Kaski. Dependencies between stimuli and spatially independent fMRI sources: Towards brain correlates of natural stimuli. NeuroImage, 48:176–185, 2009. Min-Ling Zhang and Zhi-Hua Zhou. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038–2048, 2007. 1003