nips nips2011 nips2011-277 nips2011-277-reference knowledge-graph by maker-knowledge-mining

277 nips-2011-Submodular Multi-Label Learning

Source: pdf

Author: James Petterson, Tibério S. Caetano

Abstract: In this paper we present an algorithm to learn a multi-label classiﬁer which attempts at directly optimising the F -score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and eﬃciently via graph-cuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a suﬃcient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multi-label datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1

reference text

[1] K. Dembczynski, W. Cheng, and E. H¨llermeier, “Bayes Optimal Multilabel Classiﬁu cation via Probabilistic Classiﬁer Chains,” in ICML, 2010.

[2] X. Zhang, T. Graepel, and R. Herbrich, “Bayesian Online Learning for Multi-label and Multi-variate Performance Measures,” in AISTATS, 2010.

[3] P. Rai and H. Daume, “Multi-Label Prediction via Sparse Inﬁnite CCA,” in NIPS, 2009.

[4] J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classiﬁer chains for multi-label classiﬁcation.,” in ECML/PKDD, 2009.

[5] J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor, “Kernel-based learning of hierarchical multilabel classiﬁcation models,” JMLR, vol. 7, pp. 1601–1626, December 2006.

[6] Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya, “Hierarchical multi-label prediction of gene function,” Bioinformatics, vol. 22, pp. 830–836, April 2006.

[7] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, “TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation,” in ICCV, 2009.

[8] M. Jansche, “Maximum expected F-measure training of logistic regression models,” HLT, 2005.

[9] T. Mensink, J. Verbeek, and G. Csurka, “Learning structured prediction models for interactive image labeling,” in CVPR, 2011.

[10] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” JMLR, vol. 6, pp. 1453–1484, 2005.

[11] J. Petterson and T. Caetano, “Reverse multi-label learning,” in NIPS, 2010.

[12] W. Bi and J. Kwok, “Multi-Label Classiﬁcation on Tree- and DAG-Structured Hierarchies,” in ICML, 2011.

[13] N. Ghamrawi and A. Mccallum, “Collective Multi-Label Classiﬁcation,” 2005.

[14] B. Hariharan, S. V. N. Vishwanathan, and M. Varma, “Large Scale Max-Margin MultiLabel Classiﬁcation with Prior Knowledge about Densely Correlated Labels,” in ICML, 2010.

[15] G. Tsoumakas, I. Katakis, and I. P. Vlahavas, Mining Multi-label Data. Springer, 2009.

[16] G. Tsoumakas and I. P. Vlahavas, “Random k-labelsets: An ensemble method for multilabel classiﬁcation,” in ECML, 2007.

[17] S. Virtanen, A. Klami, and S. Kaski, “Bayesian CCA via Group Sparsity,” in ICML, 2011.

[18] C. H. Teo, S. V. N. Vishwanathan, A. J. Smola, and Q. V. Le, “Bundle methods for regularized risk minimization,” JMLR, vol. 11, pp. 311–365, 2010.

[19] Z. Svitkina and L. Fleischer, “Submodular approximation: Sampling-based algorithms and lower bounds,” in FOCS, 2008.

[20] M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognition, vol. 40, pp. 2038–2048, July 2007.

[21] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-ﬂow algorithms for energy minimization in vision,” IEEE Trans. PAMI, 2004. 9