emnlp emnlp2013 emnlp2013-151 emnlp2013-151-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wang Ling ; Chris Dyer ; Alan W Black ; Isabel Trancoso
Abstract: Compared to the edited genres that have played a central role in NLP research, microblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization replacing orthographically or lexically idiosyncratic forms with more standard variants can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, showing that normalizing English tweets and then translating improves translation quality (compared to translating unnormalized text) using three standard web translation services as well as a phrase-based translation system trained — — on parallel microblog data.