jmlr jmlr2009 jmlr2009-29 jmlr2009-29-reference knowledge-graph by maker-knowledge-mining

29 jmlr-2009-Estimating Labels from Label Proportions


Source: pdf

Author: Novi Quadrianto, Alex J. Smola, Tibério S. Caetano, Quoc V. Le

Abstract: Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, possibly with known label proportions. This problem occurs in areas like e-commerce, politics, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice. Keywords: unsupervised learning, Gaussian processes, classification and prediction, probabilistic models, missing variables


reference text

Y. Altun and A.J. Smola. Unifying divergence minimization and statistical inference via convex duality. In H.U. Simon and G. Lugosi, editors, Proc. Annual Conf. Computational Learning Theory, LNCS, pages 139–153. Springer, 2006. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res., 3:463–482, 2002. E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 51(12):4203– 4215, 2005. B.C. Chen, L. Chen, R. Ramakrishnan, and D.R. Musicant. Learning from aggregate views. In L. Liu, A. Reuter, K.Y. Whang, and J. Zhang, editors, Proceedings of the 22nd International Conference on Data Engineering (ICDE), pages 3–12, Atlanta, GA, 2006. S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Department of Statistics, Stanford University, May 1995. A. C. Chiaia, C. Banta-Green, L. Power, D. L. Sudakin, and J. A. Field. Community burdens of methamphetamine and other illicit drugs. In 6th International Conference on Pharmaceuticals and Enocrine Disrupting Chemicals in Water, 2007. M. Dud´k and R. E. Schapire. Maximum entropy distribution estimation with generalized reguı larization. In G´ bor Lugosi and Hans U. Simon, editors, Proc. Annual Conf. Computational a Learning Theory. Springer Verlag, June 2006. T. G¨ rtner, Q.V. Le, S. Burton, A.J. Smola, and S.V.N. Vishwanathan. Large-scale multiclass transa duction. In Neural Information Processing Systems, pages 411–418. MIT Press, 2006. T. Hofmann, B. Sch¨ lkopf, and A. J. Smola. Kernel methods in machine learning. Technical Report o 156, Max-Planck-Institut f¨ r biologische Kybernetik, 2006. To appear in the Annals of Statistics. u J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Sch¨ lkopf. Correcting sample selection bias o by unlabeled data. In Advances in Neural Information Processing Systems 19, Cambridge, MA, 2007. MIT Press. H. K¨ ck and N. de Freitas. Learning about individuals from group statistics. In Uncertainty in u Artificial Intelligence (UAI), pages 332–339, Arlington, Virginia, 2005. AUAI Press. M. Ledoux and M. Talagrand. Probability in Banach Spaces. Springer, 1991. 2373 Q UADRIANTO , S MOLA , C AETANO AND L E G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Zoubin Ghahramani, editor, Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007),Corvallis, OR, pages 593–600. Omnipress, 2007. S. Mendelson. Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans. Inform. Theory, 48(1):251–263, 2002. D.R. Musicant, J. Christensen, and J.F. Olson. Supervised learning by training on aggregate outputs. In IEEE International Conference on Data Mining, 2007. N. Quadrianto, A. Smola, T. Caetano, and Q. Le. Estimating labels from label proportions. In W. Cohen, A. McCallum, and S. Roweis, editors, Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pages 776–783. Omnipress, 2008. B. Sch¨ lkopf. Support Vector Learning. R. Oldenbourg Verlag, Munich, 1997. Download: o http://www.kernel-machines.org. B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o A.J. Smola, S. V. N. Vishwanathan, and Q.V. Le. Bundle methods for machine learning. In Daphne Koller and Yoram Singer, editors, Advances in Neural Information Processing Systems 20, Cambridge MA, 2007. MIT Press. B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Sch¨ lkopf. Injective hilbert space o embeddings of probability measures. In Proceedings of the 21st Annual Conference on Learning Theory, pages 111–122, 2008. C.H. Teo, Q. Le, A.J. Smola, and S.V.N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD). ACM, 2007. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6:1453–1484, 2005. ˚ P.A. Wedin. Perturbation theory for pseudo-inverses. BIT Numerical Mathematics, 13(2), 1973. 2374