nips nips2005 nips2005-33 nips2005-33-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zoubin Ghahramani, Katherine A. Heller
Abstract: Inspired by “Google™ Sets”, we consider the problem of retrieving items from a concept or cluster, given a query consisting of a few items from that cluster. We formulate this as a Bayesian inference problem and describe a very simple algorithm for solving it. Our algorithm uses a modelbased concept of a cluster and ranks items using a score which evaluates the marginal probability that each item belongs to a cluster containing the query items. For exponential family models with conjugate priors this marginal probability is a simple function of sufficient statistics. We focus on sparse binary data and show that our score can be evaluated exactly using a single sparse matrix multiplication, making it possible to apply our algorithm to very large datasets. We evaluate our algorithm on three datasets: retrieving movies from EachMovie, finding completions of author sets from the NIPS dataset, and finding completions of sets of words appearing in the Grolier encyclopedia. We compare to Google™ Sets and show that Bayesian Sets gives very reasonable set completions. 1
[1] Google ™Sets. http://labs.google.com/sets
[2] Lafferty, J. and Zhai, C. (2002) Probabilistic relevance models based on document and query generation. In Language modeling and information retrieval.
[3] Ponte, J. and Croft, W. (1998) A language modeling approach to information retrieval. SIGIR.
[4] Robertson, S. and Sparck Jones, K. (1976). Relevance weighting of search terms. J Am Soc Info Sci.
[5] Tenenbaum, J. B. and Griffiths, T. L. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641.
[6] Tong, S. (2005). Personal communication.