acl acl2013 acl2013-227 acl2013-227-reference knowledge-graph by maker-knowledge-mining

227 acl-2013-Learning to lemmatise Polish noun phrases


Source: pdf

Author: Adam Radziszewski

Abstract: We present a novel approach to noun phrase lemmatisation where the main phase is cast as a tagging problem. The idea draws on the observation that the lemmatisation of almost all Polish noun phrases may be decomposed into transformation of singular words (tokens) that make up each phrase. We perform evaluation, which shows results similar to those obtained earlier by a rule-based system, while our approach allows to separate chunking from lemmatisation.


reference text

Robert Bembenik, Łukasz Skonieczny, Henryk Rybi n´ski, Marzena Kryszkiewicz, and Marek Niezgódka, editors. 2013. Intelligent Tools for Building a Scientific Information Platform, volume 467 of Studies in Computational Intelligence. Springer Berlin Heidelberg. Bartosz Broda, Michał Marci ´nczuk, Marek Maziarz, Adam Radziszewski, and Adam Wardy´ nski. 2012. KPWr: Towards a free corpus of Polish. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U g˘ur Do˘ gan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of LREC’12, Istanbul, Turkey. ELRA. Aleksander Buczy ´nski and Adam Przepiórkowski. 2009. Human language technology. challenges of the information society. chapter Spejd: A Shallow Processing and Morphological Disambiguation Tool, pages 13 1–141. Springer-Verlag, Berlin, Heidelberg. Grzegorz Chrupała, Georgiana Dinu, and Josef van Genabith. 2008. Learning morphology with Morfette. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, and Daniel Tapias, editors, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, may. European Language Resources Association (ELRA). Łukasz Degórski. 2011. Towards the lemmatisation of Polish nominal syntactic groups using a shallow 708 grammar. In Pascal Bouvry, Mieczysław A. Kłopotek, Franck Leprevost, Małgorzata Marciniak, Agnieszka Mykowiecka, and Henryk Rybi n´ski, editors, Security and Intelligent Information Systems: International Joint Conference, SIIS 2011, Warsaw, Poland, June 13-14, 2011, Revised Selected Papers, volume 7053 of Lecture Notes in Computer Science, pages 370–378. Springer-Verlag. Tomaž Erjavec. 2012. MULTEXT-East: morphosyntactic resources for Central and Eastern European languages. Language Resources and Evaluation, 46(1): 131–142. Miloš Jakubíˇ cek, Vojt eˇch Ková ˇr, and Pavel Šmerk. 2011. Czech morphological tagset revisited. In Proceedings of Recent Advances in Slavonic Natural Language Processing, pages 29–42, Brno. Jan Koco n´ and Maciej Piasecki. 2012. Heterogeneous named entity similarity function. In Petr Sojka, Aleš Horák, Ivan Kope cˇek, and Karel Pala, editors, Text, Speech and Dialogue, volume 7499 of Lecture Notes in Computer Science, pages 223–23 1. Springer Berlin Heidelberg. Małgorzata Marciniak and Agnieszka Mykowiecka. 2013. Terminology extraction from domain texts in Polish. In Bembenik et al. (Bembenik et al., 2013), pages 171–185. Jakub Piskorski, Karol Wieloch, and Marcin Sydow. 2009. On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Information Retrieval, 12(3):275–299. Jakub Piskorski. 2005. Named-entity recognition for Polish with SProUT. In Leonard Bolc, Zbigniew Michalewicz, and Toyoaki Nishida, editors, Intelligent Media Technology for Communicative Intelligence, volume 3490 of Lecture Notes in Computer Science, pages 122–133. berg. Springer Berlin Heidel- Adam Przepiórkowski. 2007. Slavic information extraction and partial parsing. In Proceedings of the Workshop on Balto-Slavonic Natural Language Processing, pages 1–10, Praga, Czechy, June. Association for Computational Linguistics. Adam Przepiórkowski. 2009. A comparison of two morphosyntactic tagsets of Polish. In Violetta Koseska-Toszewa, Ludmila Dimitrova, and Roman Roszko, editors, Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop, pages 138–144, Warszawa. Adam Przepiórkowski and Łukasz Szałkiewicz. 2012. Anotacja morfoskładniowa. In Adam Przepiórkowski, Mirosław Ba´ nko, Rafał L. Górski, and Barbara Lewandowska-Tomaszczyk, editors, Narodowy Korpus J˛ ezyka Polskiego. Wydawnictwo Naukowe PWN, Warsaw. Adam Radziszewski and Adam Pawlaczek. 2012. Large-scale experiments with NP chunking of Polish. In Proceedings of the 15th International Conference on Text, Speech and Dialogue, Brno, Czech Republic. Springer Verlag. Adam Radziszewski. 2013. A tiered CRF tagger for Polish. In Bembenik et al. (Bembenik et al., 2013), pages 215–230. Peter Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2:303–336. Marcin Woli ´nski. 2006. Morfeusz a practical tool for the morphological analysis of Polish. In Mieczysław A. Kłopotek, Sławomir T. Wierzcho n´, and Krzysztof Trojanowski, editors, Proceedings of IIPWM’06, pages 511–520, Ustro ´n, Poland, June 19–22. Springer-Verlag, Berlin. — 709