acl acl2011 acl2011-117 acl2011-117-reference knowledge-graph by maker-knowledge-mining

117 acl-2011-Entity Set Expansion using Topic information


Source: pdf

Author: Kugatsu Sadamitsu ; Kuniko Saito ; Kenji Imamura ; Genichiro Kikui

Abstract: This paper proposes three modules based on latent topics of documents for alleviating “semantic drift” in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation, negative example selection and entity candidate pruning. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain.


reference text

Kedar Bellare, Partha P. Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum, and Mark Dredze. 2006. Lightly-supervised attribute extraction. In Proceedings of the Advances in Neural Information Processing Systems Workshop on Machine Learning for Web Search. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993–1022. David Cohn and Huau Chang. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of the 1 International Conference on Ma7th chine Learning, pages 167–174. Takeshi Fuchi and Shinichiro Takagi. 1998. Japanese Morphological Analyzer using Word Co-occurrenceJTAG. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pages 409–413. Zoubin Ghahramani and Katherine A. Heller. 2005. Bayesian sets. In Proceedings of the Advances in Neural Information Processing Systems. Thorsten Joachims. 1999. Making large-Scale SVM Learning Practical. Advances in Kernel Methods Support Vector Learning. Software available at http : / / svml ight . j oachims . org/ . Mamoru Komachi, Taku Kudo, Masashi Shimbo, and Yuji Matsumoto. 2008. Graph-based analysis of semantic drift in Espresso-like bootstrapping algorithms. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 1011–1020. Xiao-Li Li, Bing Liu, and See-Kiong Ng. 2010. Negative Training Data can be Harmful to Text Classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 218–228. Bing Liu, Wee S. Lee, Philip S. Yu, and Xiaoli Li. 2002. Partially supervised classification of text documents. In Proceedings of the 19th International Conference on Machine Learning, pages 387–394. Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. 2011. PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing. ACM Transactions on Intelligent Systems and Technology, special issue on Large Scale Machine Learning. Software available at http : / / code . google .com/p /plda. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference ofthe 47thAnnual Meeting oftheACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–101 1. Marius Pas ¸ca and Benjamin Van Durme. 2008. Weaklysupervised acquisition of open-domain classes and class attributes from web documents and query logs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 19–27. Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 113–120. Patrick Pantel, Eric Crestan, Arkady Borkovsky, AnaMaria Popescu, and Vishnu Vyas. 2009. Web-scale distributional similarity and entity set expansion. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 938– 947. Alan Ritter and Oren Etzioni. 2010. A Latent Dirichlet Allocation method for Selectional Preferences. In Proceedings of the 48th ACL Conference, pages 424– 434. Luis Sarmento, Valentin Jijkuon, Maarten de Rijke, and Eugenio Oliveira. 2007. More like these: growing entity classes from seeds. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, pages 959–962. 731 Jun Suzuki, Erik McDermott, and Hideki Isozaki. 2006. Training Conditional Random Fields with Multivariate Evaluation Measures. In Proceedings of the 21st COLING and 44th ACL Conference, pages 217–224. Michael Thelen and Ellen Riloff. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the 2002 conference on Empirical methods in natural language processing, pages 214–221.