acl acl2011 acl2011-113 acl2011-113-reference knowledge-graph by maker-knowledge-mining

113 acl-2011-Efficient Online Locality Sensitive Hashing via Reservoir Counting


Source: pdf

Author: Benjamin Van Durme ; Ashwin Lall

Abstract: We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar memory footprint.


reference text

Dimitris Achlioptas. 2003. Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci., 66:671–687, June. Shane Bergsma and Benjamin Van Durme. 2011. Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI). Rahul Bhagat and Deepak Ravichandran. 2008. Large Scale Acquisition of Paraphrases for Learning Surface Patterns. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL). Thorsten Brants and Alex Franz. 2006. Web 1T 5-gram version 1. Moses Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of STOC. Michel X. Goemans and David P. Williamson. 1995. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. JACM, 42: 1115–1 145. Amit Goyal, Jagadeesh Jagarlamudi, Hal Daum e´ III, and Suresh Venkatasubramanian. 2010. Sketch Techniques for Scaling Distributional Similarity to the Web. In Proceedings of the ACL Workshop on GEometrical Models of Natural Language Semantics. Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of STOC. Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’06, pages 287–296, New York, NY, USA. ACM. Ping Li, Kenneth W. Church, and Trevor J. Hastie. 2008. One Sketch For All: Theory and Application of Con- ditional Random Sampling. In Proc. of the Conference on Advances in Neural Information Processing Systems (NIPS). Dekang Lin, Kenneth Church, Heng Ji, Satoshi Sekine, David Yarowsky, Shane Bergsma, Kailash Patil, Emily Pitler, Rachel Lathbury, Vikram Rao, Kapil Dalwani, and Sushant Narsale. 2010. New Tools for Web-Scale N-grams. In Proceedings of LREC. Patrick Pantel, Eric Crestan, Arkady Borkovsky, AnaMaria Popescu, and Vishnu Vyas. 2009. Web-Scale Distributional Similarity and Entity Set Expansion. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Sasa Petrovic, Miles Osborne, and Victor Lavrenko. 2010. Streaming First Story Detection with application to Twitter. In Proceedings of the Annual Meeting of the North American Association of Computational Linguistics (NAACL). 23 Deepak Ravichandran, Patrick Pantel, and Eduard Hovy. 2005. Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL). Benjamin Van Durme and Ashwin Lall. 2009. Streaming Pointwise Mutual Information. In Proc. of the Conference on Advances in Neural Information Processing Systems (NIPS). Benjamin Van Durme and Ashwin Lall. 2010. Online Generation of Locality Sensitive Hash Signatures. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL). Jeffrey S. Vitter. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw., 11:37–57, March.