jmlr jmlr2011 jmlr2011-29 jmlr2011-29-reference knowledge-graph by maker-knowledge-mining

29 jmlr-2011-Efficient Learning with Partially Observed Attributes

Source: pdf

Author: Nicolò Cesa-Bianchi, Shai Shalev-Shwartz, Ohad Shamir

Abstract: We investigate three variants of budgeted learning, a setting in which the learner is allowed to access a limited number of attributes from training or test examples. In the “local budget” setting, where a constraint is imposed on the number of available attributes per training example, we design and analyze an efﬁcient algorithm for learning linear predictors that actively samples the attributes of each training instance. Our analysis bounds the number of additional examples sufﬁcient to compensate for the lack of full information on the training set. This result is complemented by a general lower bound for the easier “global budget” setting, where it is only the overall number of accessible training attributes that is being constrained. In the third, “prediction on a budget” setting, when the constraint is on the number of available attributes per test example, we show that there are cases in which there exists a linear predictor with zero error but it is statistically impossible to achieve arbitrary accuracy without full information on test examples. Finally, we run simple experiments on a digit recognition problem that reveal that our algorithm has a good performance against both partial information and full information baselines. Keywords: budgeted learning, statistical learning, linear predictors, learning with partial information, learning theory

reference text

P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 2003. M-F Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In ICML, 2006. S. Ben-David and E. Dichterman. Learning with restricted focus of attention. JCSS: Journal of Computer and System Sciences, 56, 1998. A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML, 2009. R. Calderbank, S. Jafarpour, and R. Schapire. Compressed learning: Universal sparse dimensionality reduction and learning in the measurement domain. Manuscript, 2009. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50(9):2050–2057, September 2004. N. Cesa-Bianchi, S. Shalev-Shwartz, and O. Shamir. Online learning of noisy data with kernels. In COLT, 2010. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15:201–221, 1994. K. Deng, C. Bourke, S. Scott, J. Sunderman, and Y. Zheng. Bandit-based algorithms for budgeted learning. In ICDM, 2007. L. Devroye, L. Gy¨ rﬁ, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, o 1996. J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efﬁcient projections onto the ℓ1 -ball for learning in high dimensions. In ICML, 2008. C. Dwork. Differential privacy: A survey of results. In M. Agrawal, D.-Z. Du, Z. Duan, and A. Li, editors, TAMC, volume 4978 of Lecture Notes in Computer Science, pages 1–19. Springer, 2008. R. Greiner, A. J. Grove, and D. Roth. Learning cost-sensitive active classiﬁers. Artiﬁcial Intelligence, 139(2):137–174, 2002. 2877 C ESA -B IANCHI , S HALEV-S HWARTZ AND S HAMIR S. Guha and K. Munagala. Approximation algorithms for budgeted learning problems. In STOC, 2007. S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, 2007. S. Hanneke. Adaptive rates of convergence in active learning. In COLT, 2009. D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992. E. Hazan and T. Koren. Optimal algorithms for ridge and Lasso regression with partially observed attributes. CoRR, abs/1108.4559, 2011. E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. In COLT, 2006. S.M. Kakade and S. Shalev-Shwartz. Mind the duality gap: Logarithmic regret algorithms for online optimization. In Advances in Neural Information Processing Systems 22, 2008. S.M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In NIPS, 2008. A. Kapoor and R. Greiner. Learning and classifying under hard budgets. In ECML, 2005a. A. Kapoor and R. Greiner. Budgeted learning of bounded active classiﬁers. In Proceedings of the ACM SIGKDD Workshop on Utility-Based Data Mining, 2005b. Y. L. Le Cun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of IEEE, 86(11):2278–2324, November 1998. O. Madani, D.J. Lizotte, and R. Greiner. Active model selection. In UAI, 2004. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML, 2007. R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58(1): 267–288, 1996. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. S. Zhou, J. Lafferty, and L. Wasserman. Compressed and privacy-sensitive sparse regression. IEEE Transactions on Information Theory, 55(2):846–866, 2009. 2878