acl acl2013 acl2013-301 acl2013-301-reference knowledge-graph by maker-knowledge-mining

301 acl-2013-Resolving Entity Morphs in Censored Data


Source: pdf

Author: Hongzhao Huang ; Zhen Wen ; Dian Yu ; Heng Ji ; Yizhou Sun ; Jiawei Han ; He Li

Abstract: In some societies, internet users have to create information morphs (e.g. “Peace West King” to refer to “Bo Xilai”) to avoid active censorship or achieve other communication goals. In this paper we aim to solve a new problem of resolving entity morphs to their real targets. We exploit temporal constraints to collect crosssource comparable corpora relevant to any given morph query and identify target candidates. Then we propose various novel similarity measurements including surface features, meta-path based semantic features and social correlation features and combine them in a learning-to-rank frame- work. Experimental results on Chinese Sina Weibo data demonstrate that our approach is promising and significantly outperforms baseline methods1 .


reference text

Lada A. Adamic and Eytan Adar. 2001. Friends and neighbors on the web. SOCIAL NETWORKS, 25:21 1–230. Aris Anagnostopoulos, Ravi Kumar, and Mohammad Mahdian. 2008. Influence and correlation in social networks. In KDD, pages 7–15. David Bamman, Brendan O’Connor, and Noah A. Smith. 2012. Censorship and deletion practices in chinese social media. First Monday, 17(3). Patrick Barwise and Se´ an Meehan. 2010. The one thing you must get right when building a brand. Harvard Business Review, 88(12):80–84. D. Bollegala, Y. Matsuo, and M. Ishizuka. 2011. Automatic discovery of personal name aliases from the web. Knowledge and Data Engineering, IEEE Transactions on, 23(6):83 1–844. Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. 2008. Optimizing chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation, StatMT ’08, pages 224–232. Pascale Fung and Lo Yuen Yee. 1998. An ir approach for translating new words from nonparallel, comparable texts. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1, ACL ’98, pages 414–420. Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki. 2006. Link prediction using supervised learning. In In Proc. of SDM 06 workshop on Link Analysis, Counterterrorism and Security. Ahmed Hassan, Haytham Fahmy, and Hany Hassan. 2007. Improving named entity translation by exploiting comparable and parallel corpora. In RANLP. Daniel S. Hirschberg. 1977. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664–675. Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’99, pages 50–57. Ralf Holzer, Bradley Malin, and Latanya Sweeney. 2005. Email alias detection using social network analysis. In Conference on Knowledge Discovery in Data: Proceedings of the 3 rd international workshop on Link discovery, volume 21, pages 52–57. Paul Hsiung, Andrew Moore, Daniel Neill, and Jeff Schneider. 2005. Alias detection in link data sets. In Proceedings of the International Conference on Intelligence Analysis, May. Hongzhao Huang, Arkaitz Zubiaga, Heng Ji, Hongbo Deng, Dong Wang, Hieu Khac Le, Tarek F. Abdelzaher, Jiawei Han, Alice Leung, John Hancock, and Clare R. Voss. 2012. Tweet ranking based on heterogeneous networks. In COLING, pages 1239– 1256. Heng Ji and Ralph Grishman. 2008. Refining event extraction through cross-document inference. In Proceedings of ACL, pages 254–262. H. Ji, R. Grishman, H.T. Dang, K. Griffitt, and J. Ellis. 2010. Overview of the tac 2010 knowledge base population track. In Text Analysis Conference (TAC) 2010. H. Ji, R. Grishman, and H.T. Dang. 2011. Overview of the tac 2011knowledge base population track. In Text Analysis Conference (TAC) 2011. Heng Ji. 2009. Mining name translations from comparable corpora by creating bilingual information networks. In Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora, BUCC ’09, pages 34–37. David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management, CIKM ’03, pages 556–559. Ching-Yung Lin, Lynn Wu, Zhen Wen, Hanghang Tong, Vicky Griffiths-Fisher, Lei Shi, and David Lubensky. 2012. Social network analysis in enterprise. Proceedings of the IEEE, 100(9):2759–2776. Vincent Ng. 2010. Supervised noun phrase coreference research: the first fifteen years. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 1396– 1411. Patrick Pantel. 2006. Alias detection in malicious environments. In AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pages 14–20. Reinhard Rapp. 1999. Automatic identification of word translations from unrelated english and german corpora. In Proceedings of the 37th annual meet- of the Association for Computational on Computational Linguistics, ACL ’99, 526. ing Linguistics pages 5 19– Li Shao and Hwee Tou Ng. 2004. Mining new word translations from comparable corpora. In Proceedings of the 20th international conference on Computational Linguistics, COLING ’04. Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, and Han Jiawei. 2011a. Co-author relationship prediction in heterogeneous bibliographic networks. In Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’ 11, pages 121–128. 1092 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011b. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, 4(11):992–1003. Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL ’03, pages 173–180. Raghavendra Udupa, K. Saravanan, A. Kumaran, and Jagadeesh Jagarlamudi. 2009. Mint: a method for effective and scalable mining of named entity transliterations from large comparable corpora. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, pages 799–807. Robert A. Wagner and Michael J. Fischer. 1974. The string-to-string correction problem. J. ACM, 21(1): 168–173. Chao Wang, Venu Satuluri, and Srinivasan Parthasarathy. 2007. Local probabilistic models for link prediction. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM ’07, pages 322–331. Zhen Wen and Ching-Yung Lin. 2010. On the quality of inferring interests from social neighbors. In KDD, pages 373–382. Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, and Juanzi Li. 2011. Social context summarization. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR ’ 11, pages 255–264. Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, and Qun Liu. 2003. Hhmm-based chinese lexical analyzer ictclas. In Proceedings ofthe second SIGHANworkshop on Chinese language processing - Volume 17, SIGHAN ’03, pages 184–187. 1093