nips nips2009 nips2009-233 nips2009-233-reference knowledge-graph by maker-knowledge-mining

233 nips-2009-Streaming Pointwise Mutual Information


Source: pdf

Author: Benjamin V. Durme, Ashwin Lall

Abstract: Recent work has led to the ability to perform space efficient, approximate counting over large vocabularies in a streaming context. Motivated by the existence of data structures of this type, we explore the computation of associativity scores, otherwise known as pointwise mutual information (PMI), in a streaming context. We give theoretical bounds showing the impracticality of perfect online PMI computation, and detail an algorithm with high expected accuracy. Experiments on news articles show our approach gives high accuracy on real world data. 1


reference text

[Bloom, 1970] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13:422–426, 1970. [Chambers and Jurafsky, 2008] Nathanael Chambers and Dan Jurafsky. Unsupervised Learning of Narrative Event Chains. In Proceedings of ACL, 2008. [Chklovski and Pantel, 2004] Timothy Chklovski and Patrick Pantel. VerbOcean: Mining the Web for FineGrained Semantic Verb Relations. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 33–40, Barcelona, Spain, 2004. [Church and Hanks, 1990] Kenneth Church and Patrick Hanks. Word Association Norms, Mutual Information and Lexicography. Computational Linguistics, 16(1):22–29, March 1990. [Frank et al., 2007] Michael C. Frank, Noah D. Goodman, and Joshua B. Tenenbaum. A Bayesian framework for cross-situational word learning. In Advances in Neural Information Processing Systems, 20, 2007. [Gim´ nez and M` rquez, 2004] Jes´ s Gim´ nez and Llu´s M` rquez. SVMTool: A general POS tagger generator e a u e ı a based on Support Vector Machines. In Proceedings of LREC, 2004. [Graff, 2003] David Graff. English Gigaword. Linguistic Data Consortium, Philadelphia, 2003. [Joachims, 1999] Thorsten Joachims. Making large-scale SVM learning practical. In B. Sch¨ lkopf, C. Burges, o and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 11, pages 169–184. MIT Press, Cambridge, MA, 1999. [Li and Church, 2007] Ping Li and Kenneth W. Church. A sketch algorithm for estimating two-way and multiway associations. Computational Linguistics, 33(3):305–354, 2007. [Lin, 1998] Dekang Lin. Automatic Retrieval and Clustering of Similar Words. In Proceedings of COLINGACL, 1998. [Rosenfeld, 1994] Ronald Rosenfeld. Adaptive Statistical Language Modeling: A Maximum Entropy Approach. PhD thesis, Computer Science Department, Carnegie Mellon University, April 1994. [Schooler and Anderson, 1997] Lael J. Schooler and John R. Anderson. The role of process in the rational analysis of memory. Cognitive Psychology, 32(3):219–250, 1997. [Talbot and Osborne, 2007] David Talbot and Miles Osborne. Randomised Language Modelling for Statistical Machine Translation. In Proceedings of ACL, 2007. [Talbot, 2009] David Talbot. Succinct approximate counting of skewed data. In Proceedings of IJCAI, 2009. [Van Durme and Lall, 2009] Benjamin Van Durme and Ashwin Lall. Probabilistic Counting with Randomized Storage. In Proceedings of IJCAI, 2009. 9