nips nips2013 nips2013-29 nips2013-29-reference knowledge-graph by maker-knowledge-mining

29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

Source: pdf

Author: Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan

Abstract: Maximization of submodular functions has wide applications in machine learning and artiﬁcial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efﬁcient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem. 1

29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

reference text