nips nips2009 nips2009-158 nips2009-158-reference knowledge-graph by maker-knowledge-mining

158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Source: pdf

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efﬁcacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

reference text

[1] C. Archambeau and F. Bach. Sparse probabilistic projections. In Neural Information Processing Systems 21, 2008.

[2] J. Arenas-Garc´a, K. B. Petersen, and L. K. Hansen. Sparse kernel orthonormalized pls for feature extracı tion in large data sets. In Neural Information Processing Systems 19, 2006.

[3] F. R. Bach and M. I. Jordan. A Probabilistic Interpretation of Canonical Correlation Analysis. In Technical Report 688, Dept. of Statistics. University of California, 2005.

[4] J. Baxter. A Model of Inductive Bias Learning. Journal of Artiﬁcial Intelligence Research, 12:149–198, 2000.

[5] C. M. Bishop. Bayesian PCA. In Neural Information Processing Systems 11, Cambridge, MA, USA, 1999. MIT Press.

[6] R. Caruana. Multitask Learning. Machine Learning, 28(1):41–75, 1997.

[7] H. Daum´ III. Bayesian Multitask Learning with Latent Hierarchies. In Conference on Uncertainty in e Artiﬁcial Intelligence, Montreal, Canada, 2009.

[8] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res., 5:73–99, 2004.

[9] Z. Ghahramani, T. L. Grifﬁths, and P. Sollich. Bayesian Nonparametric Latent Feature Models. In Bayesian Statistics 8. Oxford University Press, 2007.

[10] A. Globerson and N. Tishby. Sufﬁcient dimensionality reduction. J. Mach. Learn. Res., 3:1307–1331, 2003.

[11] H. Hotelling. Relations Between Two Sets of Variables. Biometrika, pages 321–377, 1936.

[12] S. Ji, L. Tang, S. Yu, and J. Ye. Extracting Shared Subspace for Multi-label Classiﬁcation. 2008.

[13] S. Ji and J. Ye. Linear dimensionality reduction for multi-label classiﬁcation. In Twenty-ﬁrst International Joint Conference on Artiﬁcial Intelligence, 2009.

[14] M. Kim and V. Pavlovic. Covariance operator based dimensionality reduction with extension to semisupervised settings. In Twelfth International Conference on Artiﬁcial Intelligence and Statistics, Florida USA, 2009.

[15] A. Klami and S. Kaski. Local dependent components. In ICML ’07: Proceedings of the 24th international conference on Machine learning, 2007.

[16] P. Rai and H. Daum´ III. The inﬁnite hierarchical factor regression model. In Neural Information Proe cessing Systems 21, 2008.

[17] D. Hardoon J. Shawe-Taylor. The Double-Barrelled LASSO (Sparse Canonical Correlation Analysis). In Workshop on Learning from Multiple Sources (NIPS), 2008.

[18] B. Sriperumbudur, D. Torres, and G. Lanckriet. The Sparse Eigenvalue Problem. In arXiv:0901.1504v1, 2009.

[19] N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368–377.

[20] N. Ueda and K. Saito. Parametric Mixture Models for Multi-labeled Text. Advances in Neural Information Processing Systems, pages 737–744, 2003.

[21] C. Wang. Variational Bayesian approach to Canonical Correlation Analysis. In IEEE Transactions on Neural Networks, 2007.

[22] A. Wiesel, M. Kliger, and A. Hero. A Greedy Approach to Sparse Canonical Correlation Analysis. In arXiv:0801.2748, 2008.

[23] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task Learning for Classiﬁcation with Dirichlet Process Priors. The Journal of Machine Learning Research, 8:35–63, 2007.

[24] K. Yu, S. Yu, and V. Tresp. Multi-label Informed Latent Semantic Indexing. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 258–265. ACM New York, NY, USA, 2005.

[25] S. Yu, K. Yu, V. Tresp, H. Kriegel, and M. Wu. Supervised Probabilistic Principal Component Analysis. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006.

[26] Y. Zhang Z. H. Zhou. Multi-Label Dimensionality Reduction via Dependence Maximization. In Proceedings of the Twenty-Third AAAI Conference on Artiﬁcial Intelligence, AAAI 2008, pages 1503–1505, 2008. 9