acl acl2013 acl2013-118 acl2013-118-reference knowledge-graph by maker-knowledge-mining

118 acl-2013-Development and Analysis of NLP Pipelines in Argo

Source: pdf

Author: Rafal Rak ; Andrew Rowley ; Jacob Carter ; Sophia Ananiadou

Abstract: Developing sophisticated NLP pipelines composed of multiple processing tools and components available through different providers may pose a challenge in terms of their interoperability. The Unstructured Information Management Architecture (UIMA) is an industry standard whose aim is to ensure such interoperability by defining common data structures and interfaces. The architecture has been gaining attention from industry and academia alike, resulting in a large volume ofUIMA-compliant processing components. In this paper, we demonstrate Argo, a Web-based workbench for the development and processing of NLP pipelines/workflows. The workbench is based upon UIMA, and thus has the potential of using many of the existing UIMA resources. We present features, and show examples, offacilitating the distributed development of components and the analysis of processing results. The latter includes annotation visualisers and editors, as well as serialisation to RDF format, which enables flexible querying in addition to data manipulation thanks to the semantic query language SPARQL. The distributed development feature allows users to seamlessly connect their tools to workflows running in Argo, and thus take advantage of both the available library of components (without the need of installing them locally) and the analytical tools.

reference text

W A Baumgartner, K B Cohen, and L Hunter. 2008. An open-source framework for large-scale, flexible evaluation of biomedical text mining systems. Journal of biomedical discovery and collaboration, 3: 1+. H Cunningham, D Maynard, K Bontcheva, and V Tablan. 2002. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. D Ferrucci and A Lally. 2004. UIMA: An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment. Natural Language Engineering, 10(3-4):327–348. R T Fielding and R N Taylor. 2002. Principled de- sign of the modern Web architecture. ACM Trans. Internet Technol. , 2(2): 115–150, May. I Gurevych, M M ¨uhlh a¨user, C M ¨uller, J Steimle, M Weimer, and T Zesch. 2007. Darmstadt knowledge processing repository based on uima. In Proceedings of the First Workshop on Unstructured Information Management Architecture, T ¨ubingen, Germany. U Hahn, E Buyko, R Landefeld, M M ¨uhlhausen, M Poprat, K Tomanek, and J Wermter. 2008. An Overview ofJCORE, the JULIE Lab UIMA Component Repository. In Language Resources and Evaluation Workshop, Towards Enhanc. Interoperability Large HLT Syst.: UIMA NLP, pages 1–8. Y Kano, R Dorado, L McCrochon, S Ananiadou, and J Tsujii. 2010. U-Compare: An integrated language resource evaluation platform including a comprehensive UIMA resource library. In Proceedings of the Seventh International Conference on Language Resources and Evaluation, pages 428–434. G K Savova, J J Masanz, P V Ogren, J Zheng, S Sohn, K C Kipper-Schuler, and C G Chute. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5):507–513. P Stenetorp, S Pyysalo, G Topi c´, T Ohta, S Ananiadou, and J Tsujii. 2012. brat: a web-based tool for nlpassisted text annotation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 102–107, Avignon, France. 120