nips nips2003 nips2003-132 nips2003-132-reference knowledge-graph by maker-knowledge-mining

132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

Source: pdf

Author: Stuart Andrews, Thomas Hofmann

Abstract: Learning from ambiguous training data is highly relevant in many applications. We present a new learning algorithm for classiﬁcation problems where labels are associated with sets of pattern instead of individual patterns. This encompasses multiple instance learning as a special case. Our approach is based on a generalization of linear programming boosting and uses results from disjunctive programming to generate successively stronger linear relaxations of a discrete non-convex problem. 1

reference text

[1] Stuart Andrews, Ioannis Tsochantaridis, and Thomas Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003.

[2] Egon Balas. Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM Journal on Algebraic and Discrete Methods, 6(3):466– 486, July 1985.

[3] A. Demirez and K. Bennett. Optimization approaches to semisupervised learning. In M. Ferris, O. Mangasarian, and J. Pang, editors, Applications and Algorithms of Complementarity. Kluwer Academic Publishers, Boston, 2000.

[4] Ayhan Demiriz, Kristin P. Bennett, and John Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1-3):225–254, 2002.

[5] T. G. Dietterich, R. H. Lathrop, and T. Lozano-Perez. Solving the multiple instance problem with axis-parallel rectangles. Artiﬁcial Intelligence, 89(1-2):31–71, 1997.

[6] T. G¨rtner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In a Proc. 19th International Conf. on Machine Learning. Morgan Kaufmann, San Francisco, CA, 2002.

[7] A.J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artiﬁcal Intelligence, 1998.

[8] T. Joachims. Transductive inference for text classiﬁcation using support vector machines. In Proceedings 16th International Conference on Machine Learning, pages 200–209. Morgan Kaufmann, San Francisco, CA, 1999.

[9] Sangbum Lee and Ignacio E. Grossmann. New algorithms for nonlinear generalized disjunctive programming. Computers and Chemical Engineering Journal, 24(910):2125–2141, October 2000.

[10] O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classiﬁcation. In Proc. 15th International Conf. on Machine Learning, pages 341–349. Morgan Kaufmann, San Francisco, CA, 1998.

[11] J. Ramon and L. De Raedt. Multi instance neural networks. In Proceedings of ICML2000, Workshop on Attribute-Value and Relational Learning, 2000.

[12] G. R¨tsch, T. Onoda, and K.-R. M¨ller. Soft margins for AdaBoost. Technical Report a u NC-TR-1998-021, Department of Computer Science, Royal Holloway, University of London, Egham, UK, 1998.

[13] Gunnar R¨tsch, Sebastian Mika, Bernhard Sch¨lkopf, and Klaus-Robert M¨ller. Cona o u structing boosting algorithms from svms: an application to one-class classiﬁcation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1184–1199, 2002.

[14] Qi Zhang and Sally A. Goldman. EM-DD: An improved multiple-instance learning technique. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002. Appendix x x x x The primal variables are αk , αk , ξi , ξi , ξj , and ηi . The dual variables are ux and x uj for the margin constraints, and ρik , σi , and θi for the equality constraints on αk , ξ and η, respectively. The Lagrangian is given by  L= k αk + C  ξi + i ux j − i − x∈Xi ρik ξj  − x αk x∈Xi k + − σi i i x∈Xi x∈Xi − σij x∈Xi ˜x x ξj ξj − i x∈Xi j ηi ηi . ˜x x i x∈Xi Taking derivatives w.r.t. primal variables, leads to the following dual max θi i s.t. ux , j θi ≤ ux + i ux ≤ C, i ux ≤ σij , j σij ≤ C i j yi ux hk (x) + i yj ux hk (xj ) ≤ ρik , j j ρik = 1 i x ξj ξj − i,j x∈Xi ˜x x ξi ξi − − k x ξi ξi − x ηi 1− θi i x∈Xi αk αk ˜x x x∈Xi i x x x αk hk (x) + ξi − ηi yi k αk − − ux i x x x αk hk (xj ) + ξj − ηi yj j i,k i j 