nips nips2011 nips2011-277 nips2011-277-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: James Petterson, Tibério S. Caetano
Abstract: In this paper we present an algorithm to learn a multi-label classifier which attempts at directly optimising the F -score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graph-cuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multi-label datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
[1] K. Dembczynski, W. Cheng, and E. H¨llermeier, “Bayes Optimal Multilabel Classifiu cation via Probabilistic Classifier Chains,” in ICML, 2010.
[2] X. Zhang, T. Graepel, and R. Herbrich, “Bayesian Online Learning for Multi-label and Multi-variate Performance Measures,” in AISTATS, 2010.
[3] P. Rai and H. Daume, “Multi-Label Prediction via Sparse Infinite CCA,” in NIPS, 2009.
[4] J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classifier chains for multi-label classification.,” in ECML/PKDD, 2009.
[5] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor, “Kernel-based learning of hierarchical multilabel classification models,” JMLR, vol. 7, pp. 1601–1626, December 2006.
[6] Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya, “Hierarchical multi-label prediction of gene function,” Bioinformatics, vol. 22, pp. 830–836, April 2006.
[7] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, “TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation,” in ICCV, 2009.
[8] M. Jansche, “Maximum expected F-measure training of logistic regression models,” HLT, 2005.
[9] T. Mensink, J. Verbeek, and G. Csurka, “Learning structured prediction models for interactive image labeling,” in CVPR, 2011.
[10] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” JMLR, vol. 6, pp. 1453–1484, 2005.
[11] J. Petterson and T. Caetano, “Reverse multi-label learning,” in NIPS, 2010.
[12] W. Bi and J. Kwok, “Multi-Label Classification on Tree- and DAG-Structured Hierarchies,” in ICML, 2011.
[13] N. Ghamrawi and A. Mccallum, “Collective Multi-Label Classification,” 2005.
[14] B. Hariharan, S. V. N. Vishwanathan, and M. Varma, “Large Scale Max-Margin MultiLabel Classification with Prior Knowledge about Densely Correlated Labels,” in ICML, 2010.
[15] G. Tsoumakas, I. Katakis, and I. P. Vlahavas, Mining Multi-label Data. Springer, 2009.
[16] G. Tsoumakas and I. P. Vlahavas, “Random k-labelsets: An ensemble method for multilabel classification,” in ECML, 2007.
[17] S. Virtanen, A. Klami, and S. Kaski, “Bayesian CCA via Group Sparsity,” in ICML, 2011.
[18] C. H. Teo, S. V. N. Vishwanathan, A. J. Smola, and Q. V. Le, “Bundle methods for regularized risk minimization,” JMLR, vol. 11, pp. 311–365, 2010.
[19] Z. Svitkina and L. Fleischer, “Submodular approximation: Sampling-based algorithms and lower bounds,” in FOCS, 2008.
[20] M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognition, vol. 40, pp. 2038–2048, July 2007.
[21] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. PAMI, 2004. 9