nips nips2012 nips2012-228 nips2012-228-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain
Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1
[1] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, pages 772–780, 2009.
[2] B. Hariharan, L. Zelnik-Manor, S. V. N. Vishwanathan, and M. Varma. Large scale max-margin multilabel classiďŹ cation with priors. In ICML, pages 423–430, 2010.
[3] G. Tsoumakas and I. Katakis. Multi-label classiďŹ cation: An overview. IJDWM, 3(3):1–13, 2007.
[4] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, 2005.
[5] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classiďŹ cation. Pattern Recognition, 37(9):1757–1771, 2004.
[6] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.
[7] R. M. Rifkin and A. Klautau. In defense of one-vs-all classiďŹ cation. Journal of Machine Learning Research, 5:101–141, 2004.
[8] D. Needell and J. A. Tropp. Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 26(3):301 – 321, 2009.
[9] S. Foucart. Hard thresholding pursuit: an algorithm for compressive sensing, 2010. preprint.
[10] D. Baron, S. S. Sarvotham, and R. G. Baraniuk. Bayesian compressive sensing via belief propagation. IEEE Transactions on Signal Processing, 58(1), 2010.
[11] S. Ji, Y. Xue, and L. Carin. Bayesian compressive sensing. IEEE Transactions on Signal Processing, 56(6), 2008.
[12] N. Cesa-Bianchi, A Conconi, and C. Gentile. Learning probabilistic linear-threshold classiďŹ ers via selective sampling. In COLT, 2003.
[13] N. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian Process method: Informative vector machines. NIPS, 2002.
[14] D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4), 1992.
[15] S. Tong and D. Koller. Support vector machine active learning with applications to text classiďŹ cation. In ICML, 2000.
[16] Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3), 1997.
[17] B. Yang, J.-Tao Sun, T. Wang, and Z. Chen. Effective multi-label active learning for text classiďŹ cation. In KDD, pages 917–926, 2009.
[18] J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint wordimage embeddings. Machine Learning, 81(1):21–35, 2010.
[19] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.
[20] M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001. 9