emnlp emnlp2013 emnlp2013-89 emnlp2013-89-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Morgane Ciot ; Morgan Sonderegger ; Derek Ruths
Abstract: While much work has considered the problem of latent attribute inference for users of social media such as Twitter, little has been done on non-English-based content and users. Here, we conduct the first assessment of latent attribute inference in languages beyond English, focusing on gender inference. We find that the gender inference problem in quite diverse languages can be addressed using existing machinery. Further, accuracy gains can be made by taking language-specific features into account. We identify languages with complex orthography, such as Japanese, as difficult for existing methods, suggesting a valuable direction for future research.
S. Akioka, N. Kato, Y. Muraoka, and H. Yamana. 2010. Cross-media impact on Twitter in Japan. In Proceedings ofthe International Workshop on Search andMining User-generated Contents. Atilika. 2012. Kuromoji morphological analyzer. http://www.atilika.org. D. Bamman, J. Eisenstein, and T. Schnoebelen. 2012. Gender in Twitter: Styles, stances, and social networks. arXiv preprint arXiv: 1210.4567. S Bird. 2006. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL Interactive Presentation Sessions. J.D. Burger, J. Henderson, and G. Zarrella. 2011. Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. M. Conover, B. Gon ¸calves, J. Ratkiewicz, A. Flammini, and F. Menczer. 2011a. Predicting the politial alignment of Twitter users. In Proceedings of the International Conference on Social Computing. M.D. Conover, J. Ratkiewicz, M. Francisco, B. Gon ¸calves, F Menczer, and A Flammini. 2011b. Political polarization on Twitter. In Proceedings of the International Conference on Weblogs and Social Media. P. Eckert and S. McConnell-Ginet. 2003. Language and gender. Cambridge University Press, Cambridge. F. Giglietto. 2012. If likes were votes: An empirical study of the 2011 Italian administrative elections. In Proceedings of the International Conference on Weblogs and Social Media. J. Holmes. 1995. Women, men and politeness. Longman, London. M. Kim and H.W. Park. 2012. e-measuring Twitterbased political participation and deliberation in the South Korean context by using social network and Triple Helix indicators. Scientometrics, 90(1): 121– 140. W. Liu and D. Ruths. 2013. What’s in a name? Using first names as features for gender inference in Twitter. In Analyzing Microtext: 2013 AAAI Spring Symposium. W. Liu, F.A. Zamal, and D. Ruths. 2012. Using social media to infer gender composition from commuter 1145 populations. In Proceedings of the When the City Meets the Citizen Worksop. A. Mislove, S. Lehmann, Y.Y. Ahn, J.P. Onnela, and J.N. Rosenquist. 2011. Understanding the demographics of Twitter users. In Proceedings of the International Conference on Weblogs and Social Media. D. Mocanu, A. Baronchelli, B. Gon ¸calves, N. Perra, and A. Vespignani. 2012. The Twitter of Babel: Mapping world languages through microblogging platforms. ArXiv e-prints, December. B. New and C. Landing. 2012. Lexique 3. http://www.lexique.org/telLexique.php. F.C.C. Peng, editor. 1981. Male/female differences in Japanese. The East-West Sign Language Association, Tokyo. M. Pennacchiotti and A.M. Popescu. 2011. A machine learning approach to Twitter user classification. In Proceedings of the International Conference on Weblogs and Social Media. D. Rao and D. Yarowsky. 2010. Detecting latent user properties in social media. In Proceedings of the NIPS workshop on Machine Learning for Social Networks. D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. 2010. Classifying latent user attributes in Twitter. In Proceedings of the International Workshop on Search and Mining User-generated Contents. T. Sakaki, M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the International World Wide Web Conference. Semiocast. 2012. Brazil becomes the 2nd country on Twitter, Japan 3rd, Netherlands most active country. http://semiocast.com/publications/ 2012 01 3 1 Brazil becomes 2nd country on Twitter superseds Japan. A. Tumasjan, T.O. Sprenger, P.G. Sandner, and I.M. Welpe. 2010. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the International Conference on Weblogs and Social Media. W. Weerkamp, S. Carter, and M. Tsagkias. 2011. How people use Twitter in different languages. In Proceedings of the Web Science Conference. R. Schenk-Van Witsen. 1981. Les diff e´rences sexuelles dans le fran ¸cais parl e´: une ´e tude-pilote des diff e´rences lexicales entre hommes et femmes. Langage et soci e´t e´, 17(1):59–78. F.A. Zamal, W. Liu, and D. Ruths. 2012. Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In Proceedings of the International Conference on Weblogs and Social Media.