jmlr jmlr2012 jmlr2012-71 jmlr2012-71-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sivan Sabato, Naftali Tishby
Abstract: In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classification rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to specific settings or applications. In this work we provide a unified theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a specific application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efficient PAC-learning for MIL can be generated from any efficient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size. Keywords: multiple-instance learning, learning theory, sample complexity, PAC learning, supervised classification
N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. In Foundations of Computer Science, 1993. Proceedings., 34th Annual Symposium on, pages 292–301. IEEE, 1993. N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM, 44(4):615–631, 1997. S. Andrews. Learning from Ambiguous Examples. PhD thesis, Brown University, May 2007. S. Andrews and T. Hofmann. Multiple-instance learning via disjunctive programming boosting. In Advances in Neural Information Processing Systems 16, 2003. S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems 15, pages 561–568, 2002. M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. P. Auer, P. M. Long, and A. Srinivasan. Approximating hyper-rectangles: learning and pseudorandom sets. Journal of Computer and System Sciences, 57(3):376–388, 1998. B. Babenko, N. Verma, P. Dollar, and S. Belongie. Multiple instance learning with manifold bags. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pages 81–88, 2011. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. P. L. Bartlett, S. R. Kulkarni, and S. E. Posner. Covering numbers for real-valued function classes. IEEE Transactions on Information Theory, 43(5):1721–1724, 1997. A. Blum and A. Kalai. A note on learning from multiple-instance examples. Machine Learning, 30 (1):23–29, 1998. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, September 1995. T. G. Dietterich, R. H. Lathrop, and T. Lozano-P´ rez. Solving the multiple instance problem with e axis-parallel rectangles. Artificial Intelligence, 89(1-2):31–71, 1997. 3037 S ABATO AND T ISBHY R. M. Dudley. Central limit theorems for empirical measures. The Annals of Probability, 6(6): 899–929, 1978. R.M. Dudley. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1(3):290 – 330, 1967. Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997. T. G¨ rtner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In Proceedings of a the Nineteenth International Conference on Machine Learning, pages 179–186, 2002. D. Haussler and P. M. Long. A generalization of Sauer’s lemma. Journal of Combinatorial Theory, Series A, 71(2):219–240, 1995. P. M. Long and L. Tan. PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Machine Learning, 30(1):7–21, 1998. ISSN 0885-6125. O. Maron and T. Lozano-P´ rez. A framework for multiple-instance learning. In Advances in Neural e Information Processing Systems 10, pages 570–576, 1998. O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 341–349, 1998. S. Mendelson. Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Transactions on Information Theory, 48(1):251–263, 2002. S. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, 1996. L. Pitt and L. G. Valiant. Computational limitations on learning from examples. Technical report, Harvard University Aiken Computation Laboratory, July 1986. D. Pollard. Convergence of Stochastic Processes. Springer-Verlag, 1984. L. De Raedt. Attribute-value learning versus inductive logic programming: The missing links (extended abstract). In Proceedings of the 8th International Workshop on Inductive Logic Programming, pages 1–8, London, UK, 1998. Springer-Verlag. G. R¨ tsch and M. K. Warmuth. Efficient margin maximizing with boosting. Journal of Machine a Learning Research, 6:2131–2152, December 2005. S. Sabato and N. Tishby. Homogeneous multi-instance learning with arbitrary dependence. In Proceedings of the Twenty Second Annual Conference on Computational Learning Theory, 2009. S. Sabato, N. Srebro, and N. Tishby. Reducing label complexity by learning from bags. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9, pages 685–692, 2010. N. Sauer. On the density of families of sets. Journal of Combinatorial Theory Series A, 13:145–147, 1972. 3038 M ULTI -I NSTANCE L EARNING WITH A NY H YPOTHESIS C LASS R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):1–40, 1999. R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998. S. Shalev-Shwartz, O. Shamir, and K. Learning kernel-based halfspaces with the zero-one loss. In Proceedings of the Twenty Third Annual Conference on Computational Learning Theory, 2010. N. Srebro, K. Sridharan, and A. Tewari. abs/1009.3896, 2010. Smoothness, low-noise and fast rates. CoRR, V. N. Vapnik and A. Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, XVI(2):264–280, 1971. J. von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100:295–320, 1928. N. Weidmann, E. Frank, and B. Pfahringer. A two-level learning method for generalized multiinstance problems. In Proceedings of the European conference on machine learning, pages 468– 479, 2003. Q. Zhang and S.A. Goldman. EM-DD: An improved multiple-instance learning technique. In Advances in Neural Information Processing Systems 14, 2001. Z. Zhou, K. Jiang, and M. Li. Multi-instance learning based web mining. Applied Intelligence, 22 (2):135–147, 2005. 3039