emnlp emnlp2010 emnlp2010-123 emnlp2010-123-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yves Scherrer ; Owen Rambow
Abstract: We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a charactern-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data.
Fadi Biadsy, Julia Hirschberg, and Nizar Habash. 2009. Spoken Arabic dialect identification using phonotactic modeling. In EACL 2009 Workshop on Computational Approaches S. Brants, S. to Semitic Languages, Athens. Dipper, S. Hansen, W. Lezius, and G. Smith. 2002. The TIGER Treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol. W. B. Cavnar and J. M. Trenkle. 1994. N-gram based text categorization. In Proceedings of SDAIR’94, Las Vegas. Eugen Dieth. 1986. Schwyzertütschi Dialäktschrift. Sauerländer, Aarau, 2nd edition. Rudolf Hotzenköcherle, Robert Schläpfer, Rudolf Trüb, and Paul Zinsli, editors. 1962-1997. Sprachatlas der deutschen Schweiz. Francke, Berne. Baden Hughes, Timothy Baldwin, Steven Bird, Jeremy Nicholson, and Andrew MacKinlay. 2006. Reconsidering language identification for written language resources. In Proceedings of LREC’06, Genoa. N. Ingle. 1980. A language identification table. Technical Translation International. Gideon S. Mann and David Yarowsky. 2001. Multipath translation lexicon induction via bridge languages. In Proceedings of NAACL’01, Pittsburgh. Radim Rˇeh u˚ ˇrek and Milan Kolkus. 2009. Language identification on the web: Extending the dictionary method. In Computational Linguistics and Intelligent Text Processing – Proceedings of CICLing 2009, pages 357–368, Mexico. Springer. Jonas Rumpf, Simon Pickl, Stephan Elspaß, Werner König, and Volker Schmidt. 2009. Structural analysis of dialect maps using methods from spatial statistics. Zeitschrift für Dialektologie und Linguistik, 76(3). Charles Schafer and David Yarowsky. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of CoNLL’02, pages 146–152, Taipei. Yves Scherrer and Owen Rambow. 2010. Natural language processing for the Swiss German dialect area. In Proceedings of KONVENS’10, Saarbrücken. Yves Scherrer. 2007. Adaptive string distance measures for bilingual dialect lexicon induction. In Proceedings of ACL’07, Student Research Workshop, pages 55–60, Prague. Yves Scherrer. 2010. Des cartes dialectologiques numérisées pour le TALN. In Proceedings of TALN’10, Montréal. Andreas Stolcke. 2002. SRILM an extensible language modeling toolkit. In Proceedings of ICSLP’02, pages 901–904, Denver. – 1161