nips nips2008 nips2008-184 nips2008-184-reference knowledge-graph by maker-knowledge-mining

184 nips-2008-Predictive Indexing for Fast Search

Source: pdf

Author: Sharad Goel, John Langford, Alexander L. Strehl

Abstract: We tackle the computational problem of query-conditioned search. Given a machine-learned scoring rule and a query distribution, we build a predictive index by precomputing lists of potential results sorted based on an expected score of the result over future queries. The predictive index datastructure supports an anytime algorithm for approximate retrieval of the top elements. The general approach is applicable to webpage ranking, internet advertisement, and approximate nearest neighbor search. It is particularly effective in settings where standard techniques (e.g., inverted indices) are intractable. We experimentally ﬁnd substantial improvement over existing methods for internet advertisement and approximate nearest neighbors. 1

reference text

Andoni, A., & Indyk, P. (2006). Near-optimal hashing algorithms for near neighbor problem in high dimensions. Proceedings of the Symposium on Foundations of Computer Science (FOCS’06). Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efﬁcient query evaluation using a two-level retrieval process. CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management (pp. 426–434). Cheng, C.-S., Chung, C.-P., & Shann, J. J.-J. (2006). Fast query evaluation through document identiﬁer assignment for inverted ﬁle-based information retrieval systems. Inf. Process. Manage., 42, 729–750. Chierichetti, F., Lattanzi, S., Mari, F., & Panconesi, A. (2008). On placing skips optimally in expectation. WSDM ’08: Proceedings of the international conference on Web search and web data mining (pp. 15–24). New York, NY, USA: ACM. Datar, M., Immorlica, N., Indyk, P., & Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on pstable distributions. SCG ’04: Proceedings of the twentieth annual symposium on Computational geometry (pp. 253–262). New York, NY, USA: ACM. Fagin, R. (2002). Combining fuzzy information: an overview. SIGMOD Rec., 31, 109–118. Fagin, R., Lotem, A., & Naor, M. (2003). Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66, 614–656. Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. The VLDB Journal (pp. 518–529). J¨ rvelin, K., & Kek¨ l¨ inen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions a aa on Information Systems, 20, 422–446. Liu, T., Moore, A., Gray, A., & Yang, K. (2004). An investigation of practical approximate nearest neighbor algorithms. Neural Information Processing Systems. Zobel, J., & Moffat, A. (2006). Inverted ﬁles for text search engines. ACM Comput. Surv., 38, 6. 8