emnlp emnlp2011 emnlp2011-139 emnlp2011-139-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eiji ARAMAKI ; Sachiko MASKAWA ; Mizuki MORITA
Abstract: Sachiko MASKAWA The University of Tokyo Tokyo, Japan s achi ko . mas kawa @ gma i . com l Mizuki MORITA National Institute of Biomedical Innovation Osaka, Japan mori ta . mi zuki @ gmai l com . posts more than 5.5 million messages (tweets) every day (reported by Twitter.com in March 201 1). With the recent rise in popularity and scale of social media, a growing need exists for systems that can extract useful information from huge amounts of data. We address the issue of detecting influenza epidemics. First, the proposed system extracts influenza related tweets using Twitter API. Then, only tweets that mention actual influenza patients are extracted by the support vector machine (SVM) based classifier. The experiment results demonstrate the feasibility of the proposed approach (0.89 correlation to the gold standard). Especially at the outbreak and early spread (early epidemic stage), the proposed method shows high correlation (0.97 correlation), which outperforms the state-of-the-art methods. This paper describes that Twitter texts reflect the real world, and that NLP techniques can be applied to extract only tweets that contain useful information. 1
Barbosa, L. and J. Feng. 2010. Robust Sentiment Detection on Twitter from Biased and Noisy Data. In Proc. 23rd Intl. Conf. on Computational Linguistics (COLING). Boyd, D., S. Golder, and G. Lotan. 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter. In Proc. HICSS43. Breiman L. Random Forests. 2001 . Machine learning, 45(1): 5–32. Breiman, L. Bagging predictors. 1996. Machine learning, 24(2): 123–140. Cortes C. and V. Vapnik. 1995. Support vector networks. In Machine Learning, pp. 273–297. Chapman, W., W. Bridewell, P. Hanbury, G.F. Cooper, and B. Buchanan. 2001 . A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 5:301-3 10. Chapman, W., J. Dowling, and D. Chu. 2007. ConText: An algorithm for identifying contextual features from clinical text. Biological, translational, and clinical language processing (BioNLP2007), pp. 8 1–88. Elkin, P.L., S.H. Brown, B.A. Bauer, C.S. Husser, W. Carruth, L.R. Bergstrom, and D.L. Wahner-Roedler. 2005. A controlled trial of automated classification of negation from clinical notes. BMC Medical Informatics and Decision Making 5: 13. Espino, J., W. Hogan, and M. Wagner. 2003. Telephone triage: A timely data source for surveillance of influenza-like diseases. In Proc. of Annual Symposium of AMIA, pp. 2 15–2 19. Finin, T., W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. 2010. Annotating named entities in Twitter data with crowdsourcing. In Proc. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10), pp. 80-88. Freund, Y. and R. Schapire. 1996. Experiments with a new boosting algorithm. In Machine Learning Intl. Workshop, pp. 148–1 56. Ginsberg, J., M.H. Mohebbi, R.S. Patel, and L. Brammer. 2009. Detecting influenza epidemics using search engine query data, Nature Vol. 457 (19). Huang, Y. and H.J. Lowe. 2007. A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3):304-3 11. Huberman, B. and D. R. F. Wu. 2009. Social networks that matter: Twitter under the microscope. First Monday, Vol. 14. 1576 Hulth, A., G. Rydevik, and A. Linde. 2009. Web Queries as a Source for Syndromic Surveillance. PLoS ONE 4(2). Johnson, HA., MM. Wagner, WR. Hogan, W. Chapman, RT. Olszewski, J. Dowling, and G. Barnas. 2004. Analysis of Web access logs for surveillance of influenza. Stud. Health Technol. Inform. 107(Pt 2): 1202-1206. Magruder, S. 2003. Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins University APL Technical Digest 24:349–353. Milstein, S., A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. 2008. Twitter and the micromessaging revolution: Communication, connections, and immediacy, 140 characters at a time. O’Reilly Media. Mutalik, P.G., A. Deshpande, and P.M. Nadkarni. 2001 . Use of general purpose negation detection to augment concept indexing of medical documents: A quantitative study using theUMLS. Journal of the American Medical Informatics Association, 8(6):598-609. Paul, MJ. and M. Dredze. 201 1. You Are What You Tweet: Analyzing Twitter for Public Health. In Proc. of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM). Polgreen, PM., Y. Chen, D.M. Pennock, and F.D. Nelson. 2008. Using Internet Searches for Influenza Surveillance, Clinical Infectious Diseases Vol. 47 (11) pp. 1443-1448. Quinlan. J. 1993. C4. 5: programs for machine learning. Morgan Kaufmann. Sakaki, T., M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors, in Proc. of Conf. on World Wide Web (WWW).