acl acl2011 acl2011-165 acl2011-165-reference knowledge-graph by maker-knowledge-mining

165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes

Source: pdf

Author: Youngjun Kim ; Ellen Riloff ; Stephane Meystre

Abstract: We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94. 17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. 1

reference text

Apache UIMA 2008. Available at http://uima.apache.org. Jason Baldridge, Tom Morton, and Gann Bierner. 2005. OpenNLP Maxent Package in Java, Available at: http://incubator.apache.org/opennlp/. Berry de Bruijn, Colin Cherry, Svetlana Kiritchenko, Joel Martin, and Xiaodan Zhu. 201 1. MachineLearned Solutions for Three Stages of Clinical Information Extraction: the State of the Art at i2b2 20 10. J Am Med Inform Assoc. Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a Library for Support Vector Machines, 2001 . Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Wendy W. Chapman, Will Bridewell, Paul Hanbury, Gregory F. Cooper, and Bruce G. Buchanan. 2001 . A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries. Journal of Biomedical Informatics, 34:301 -3 10. Wendy W. Chapman, David Chu, and John N. Dowling. 2007. ConText: An Algorithm for Identifying Contextual Features from Clinical Text. BioNLP 2007: Biological, translational, and clinical language processing, Prague, CZ. David Ferrucci and Adam Lally. 2004. UIMA: An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment. Journal of Natural Language Engineering, 10(3-4): 327-348. Tzu-Kuo Huang, Ruby C. Weng, and Chih-Jen Lin. 2006. Generalized Bradley-Terry Models and Multiclass Probability Estimates. Journal of Machine Learning Research, 7:85-1 15. i2b2/VA 2010 Challenge Assertion Annotation Guidelines. https://www.i2b2.org/NLP/Relations/assets/Assertion %20Annotation%20Guideline.pdf. LVG (Lexical Variants Generation). 2010. Available at: http://lexsrv2.nlm.nih.gov/LexSysGroup/Projects/lvg. Alexa T. McCray, Suresh Srinivasan, and Allen C. Browne. 1994. Lexical Methods for Managing Variation in Biomedical Terminologies. Proc Annu Symp Comput Appl Med Care.:235–239. Stéphane M. Meystre and Peter J. Haug. 2005. Automation of a Problem List Using Natural Language Processing. BMC Med Inform Decis Mak, 5:30. Guergana K. Savova, James J. Masanz, Philip V. Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C. KipperSchuler, and Christopher G. Chute. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc., 17(5):5075 13. ÖzlemUzuner and Scott DuVall. 2010. Fourth i2b2/VA Challenge. In http://www.i2b2.org/NLP/Relations/. Özlem Uzuner, Xiaoran Zhang, and Sibanda Tawanda. 2009. Machine Learning and Rule-based Approaches to Assertion Classification. J Am Med Inform Assoc., 16: 109-1 15. 316