nips nips2009 nips2009-157 nips2009-157-reference knowledge-graph by maker-knowledge-mining

157 nips-2009-Multi-Label Prediction via Compressed Sensing

Source: pdf

Author: John Langford, Tong Zhang, Daniel J. Hsu, Sham M. Kakade

Abstract: We consider multi-label prediction problems with large output spaces under the assumption of output sparsity – that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multi-label regression problems to binary regression problems. We show that the number of subproblems need only be logarithmic in the total number of possible labels, making this approach radically more efﬁcient than others. We also state and prove robustness guarantees for this method in the form of regret transform bounds (in general), and also provide a more detailed analysis for the linear prediction setting. 1

reference text

[1] David Donoho. Compressed sensing. IEEE Trans. Info. Theory, 52(4):1289–1306, 2006.

[2] T. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artiﬁcial Intelligence Research, 2:263–286, 1995.

[3] R. Rifkin and A. Klautau. In defense of one-vs-all classiﬁcation. Journal of Machine Learning Research, 5:101–141, 2004.

[4] M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classiﬁcation. Pattern Recognition, 37(9):1757–1771, 2004.

[5] A. Clare and R.D. King. Knowledge discovery in multi-label phenotype data. In European Conference on Principles of Data Mining and Knowledge Discovery, 2001. 8

[6] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.

[7] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classiﬁcation. Journal of Machine Learning Research, 7:31–54, 2006.

[8] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.

[9] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classiﬁcation models. Journal of Machine Learning Research, 7:1601–1626, 2006.

[10] J. Huang, T. Zhang, and D. Metaxax. Learning with structured sparsity. In ICML, 2009.

[11] G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efﬁcient multilabel classiﬁcation in domains with large number of labels. In Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data, 2008.

[12] Erin Allwein, Robert Schapire, and Yoram Singer. Reducing multiclass to binary: A unifying approach for margin classiﬁers. Journal of Machine Learning Research, 1:113–141, 2000.

[13] J. Langford and A. Beygelzimer. Sensitive error correcting output codes. In Proc. Conference on Learning Theory, 2005.

[14] Emmanuel Cand` s, Justin Romberg, and Terrence Tao. Stable signal recovery from incomplete and e inaccurate measurements. Comm. Pure Appl. Math., 59:1207–122, 2006.

[15] R. DeVore. Deterministic constructions of compressed sensing matrices. J. of Complexity, 23:918–925, 2007.

[16] Shahar Mendelson, Alain Pajor, and Nicole Tomczak-Jaegermann. Uniform uncertainty principle for Bernoulli and subgaussian ensembles. Constructive Approximation, 28(3):277–289, 2008.

[17] M. Rudelson and R. Vershynin. Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements. In Proc. Conference on Information Sciences and Systems, 2006.

[18] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993.

[19] Tong Zhang. Adaptive forward-backward greedy algorithm for sparse learning with linear models. In Proc. Neural Information Processing Systems, 2008.

[20] D. Needell and J.A. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 2007.

[21] Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.

[22] Sham M. Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Proc. Neural Information Processing Systems, 2008.

[23] Andrew Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In ICML, 2004.

[24] Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proc. ACM Conference on Human Factors in Computing Systems, 2004.

[25] Marcin Marszałek, Cordelia Schmid, Hedi Harzallah, and Joost van de Weijer. Learning object representations for visual object class recognition. In Visual Recognition Challange Workshop, in conjunction with ICCV, 2007.

[26] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. SURF: Speeded up robust features. Computer Vision and Image Understanding, 110(3):346–359, 2008.

[27] David Donoho, Michael Elad, and Vladimir Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Info. Theory, 52(1):6–18, 2006.

[28] Sanjoy Dasgupta. Learning Probability Distributions. PhD thesis, University of California, 2000. 9