jmlr jmlr2012 jmlr2012-90 jmlr2012-90-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tom De Smedt, Walter Daelemans
Abstract: Pattern is a package for Python 2.4+ with functionality for web mining (Google + Twitter + Wikipedia, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from Keywords: Python, data mining, natural language processing, machine learning, graph networks
David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, 2007. Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: An open source software for exploring and manipulating networks. Proceedings of the Third International ICWSM Conference, 2009. 2066 PATTERN FOR P YTHON Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly Media, 2009. Ulrik Brandes. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology, 25(2):163–177, 2001. Eric Brill. A simple rule-based part of speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, pages 152–155, 1992. Chih-Chung Chang and Chih-Jen Li. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 2011. Damian Conway. An algorithmic approach to english pluralization. Proceedings of the Second Annual Perl Conference, 1998. Tom De Smedt and Walter Daelemans. Vreselijk mooi! (terribly beautiful): A subjectivity lexicon for dutch adjectives. Proceedings of the 8th Language Resources and Evaluation Conference (LREC’12), pages 3568—-3572, 2012. Tom De Smedt, Vincent Van Asch, and Walter Daelemans. Memory-based shallow parser for python. CLiPS Technical Report Series, 2, 2010. Janez Demˇar, Blaˇ Zupan, Gregor Leban, and Tomaz Curk. Orange: From experimental machine s z learning to interactive data mining. Knowledge Discovery in Databases, 3202:537–539, 2004. Charles Elkan. Using the triangle inequality to accelerate k-means. Proceedings of the Twentieth International Conference on Machine Learning, pages 147–153, 2003. Christiane Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, 1998. Jeroen Geertzen. Jeroen geertzen :: software & demos : Brill-nl, June 2010. //\_pos/. URL http: Aric Hagberg, Daniel Schult, and Pieter Swart. Exploring network structure, dynamics, and function using networkx. Proceedings of the 7th Python in Science Conference, pages 11–15, 2008. Adam Kilgarriff and Gregory Grefenstette. Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3):333–347, 2003. Roeland Ordelman, Franciska de Jong, Arjan van Hessen, and Hendri Hondorp. TwNC: A multifaceted dutch news corpus. ELRA Newsletter, 12:3–4, 2007. Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the ACL, pages 271–278, 2004. Tom Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin Felder, Frank Sehnke, Thomas R¨ ckstieß, u and J¨ rgen Schmidhuber. Pybrain. Journal of Machine Learning Research, pages 743–746, 2010. u 2067