nips nips2010 nips2010-180 nips2010-180-reference knowledge-graph by maker-knowledge-mining

180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

Source: pdf

Author: Daniel Golovin, Andreas Krause, Debajyoti Ray

Abstract: We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near–optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2 , a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the ﬁrst competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non–uniform cost and their noise is correlated. We also propose E FF ECXTIVE , a particularly fast approximation of EC 2 , and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty. 1

reference text

[1] N. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In ICML, 2006.

[2] Gowtham Bellala, Suresh Bhavnani, and Clayton Scott. Extensions of generalized binary search to group identiﬁcation and exponential costs. In Advances in Neural Information Processing Systems (NIPS), 2010.

[3] Gowtham Bellala, Suresh K. Bhavnani, and Clayton D. Scott. Group-based query learning for rapid diagnosis in time-critical situations. CoRR, abs/0911.4511, 2009.

[4] Colin F. Camerer. An experimental test of several generalized utility theories. The Journal of Risk and Uncertainty, 2(1):61–104, 1989.

[5] V. T. Chakaravarthy, V. Pandit, S. Roy, P. Awasthi, and M. Mohania. Decision trees for entity identiﬁcation: Approximation algorithms and hardness results. In In Proceedings of the ACM- SIGMOD Symposium on Principles of Database Systems, 2007.

[6] K. Chaloner and I. Verdinelli. Bayesian experimental design: A review. Statistical Science, 10(3):273–304, Aug. 1995.

[7] Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In NIPS, 2004.

[8] Sanjoy Dasgupta. Coarse sample complexity bounds for active learning. In Y. Weiss, B. Sch¨ lkopf, o and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 235–242. MIT Press, Cambridge, MA, 2006.

[9] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. CoRR, abs/1003.3967v3, 2010.

[10] Daniel Golovin, Andreas Krause, and Debajyoti Ray. Near-optimal Bayesian active learning with noisy observations. CoRR, abs/1010.3091, 2010.

[11] Andrew Guillory and Jeff Bilmes. Average-case active learning with costs. In The 20th International Conference on Algorithmic Learning Theory, University of Porto, Portugal, October 2009.

[12] Giora Hanoch and Haim Levy. Efﬁcient portfolio selection with quadratic and cubic utility. The Journal of Business, 43(2):181–189, 1970.

[13] R. A. Howard. Information value theory. In IEEE Transactions on Systems Science and Cybernetics (SSC-2), 1966.

[14] D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Econometrica, 47(2):263–292, 1979.

[15] S. Rao Kosaraju, Teresa M. Przytycka, and Ryan S. Borgstrom. On an optimal split tree problem. In WADS ’99: Proceedings of the 6th International Workshop on Algorithms and Data Structures, pages 157–168, London, UK, 1999. Springer-Verlag.

[16] Andreas Krause and Carlos Guestrin. Optimal value of information in graphical models. Journal of Artiﬁcial Intelligence Research (JAIR), 35:557–591, 2009.

[17] D. V. Lindley. On a measure of the information provided by an experiment. Annals of Mathematical Statistics, 27:986–1005, 1956.

[18] D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590– 604, 1992.

[19] Rob Nowak. Noisy generalized binary search. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1366–1374. 2009.

[20] John W. Pratt. Risk aversion in the small and in the large. Econometrica, 32(1):122–136, 1964.

[21] D. Prelec. The probablity weighting function. Econometrica, 66(3):497–527, 1998.

[22] John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behaviour. Princeton University Press, 1947.

[23] Alice X. Zheng, Irina Rish, and Alina Beygelzimer. Efﬁcient test selection in active diagnosis via entropy approximation. In UAI ’05, Proceedings of the 21st Conference in Uncertainty in Artiﬁcial Intelligence, 2005. 9