emnlp emnlp2010 emnlp2010-27 emnlp2010-27-reference knowledge-graph by maker-knowledge-mining

27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification


Source: pdf

Author: Longhua Qian ; Guodong Zhou

Abstract: Seed sampling is critical in semi-supervised learning. This paper proposes a clusteringbased stratified seed sampling approach to semi-supervised learning. First, various clustering algorithms are explored to partition the unlabeled instances into different strata with each stratum represented by a center. Then, diversity-motivated intra-stratum sampling is adopted to choose the center and additional instances from each stratum to form the unlabeled seed set for an oracle to annotate. Finally, the labeled seed set is fed into a bootstrapping procedure as the initial labeled data. We systematically evaluate our stratified bootstrapping approach in the semantic relation classification subtask of the ACE RDC (Relation Detection and Classification) task. In particular, we compare various clustering algorithms on the stratified bootstrapping performance. Experimental results on the ACE RDC 2004 corpus show that our clusteringbased stratified bootstrapping approach achieves the best F1-score of 75.9 on the subtask of semantic relation classification, approaching the one with golden clustering.


reference text

S. Abney. 2002. Bootstrapping. ACL-2002. E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM international Conference on Digital Libraries (ACMDL 2000). S. Brin. 1998. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology (EDBT 98). E. Charniak. 2001 . Intermediate-head Parsing for Language Models. ACL-2001: 116-123. M. Collins and N. Duffy. 2001 . Convolution Kernels for Natural Language. NIPS 2001: 625-632. J.X. Chen, D.H. Ji, C.L. Tan, and Z.Y. Niu. 2005. Unsupervised Feature Selection for Relation Extraction. CIKM-2005: 411-418. J.X. Chen, D.H. Ji, and C. L. Tan. 2006. Relation Extraction using Label Propagation Based Semi supervised Learning. ACL/COLING-2006: 129-136. 354 A. Culotta and J. Sorensen. 2004. Dependency tree ker- nels for relation extraction. ACL-2004: 423-439. B.J. Frey and D. Dueck. 2007. Clustering by Passing Messages between Data Points. Science, 3 15: 972976. T. Hasegawa, S. Sekine, and R. Grishman. 2004. Discovering Relations among Named Entities from Large Corpora. ACL-2004. N. Kambhatla. 2004. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations. ACL-2004(posters): 178-181 . S. Miller, H. Fox, L. Ramshaw, and R. Weischedel. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of the 6th Applied Natural Language Processing Conference. J. Neyman. 1934. On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection. Journal of the Royal Statistical Society, 97(4): 558625. H.T. Nguyen and A. Smeulders. 2004. Active Learning Using Pre-clustering, ICML-2004. L.H. Qian, G.D. Zhou, Q.M. Zhu, and P.D. Qian. 2008. Exploiting constituent dependencies for tree kernelbased semantic relation extraction. COLING-2008: 697-704. L.H. Qian, G.D. Zhou, F. Kong, and Q.M. Zhu. 2009. Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy. EMNLP-2009: 1437-1445. D. Shen, J. Zhang, J. Su, G. Zhou and C. Tan. 2004. Multi-criteria-based active learning for named entity recognition. ACL-2004. M. Tang, X. Luo and S. Roukos. 2002. Active Learning for Statistical Natural Language Parsing. ACL-2002. U. von Luxburg. 2006. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics. D. Zelenko, C. Aone, and A. Richardella. 2003. Kernel Methods for Relation Extraction. Journal of Machine Learning Research, (2): 1083-1 106. M. Zhang, J. Su, D. M. Wang, G. D. Zhou, and C. L. Tan. 2005. Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering. IJCNLP-2005: 378-389. M. Zhang, J. Zhang, J. Su, and G.D. Zhou. 2006. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. ACL/COLING-2006: 825-832. Z. Zhang. 2004. Weakly-supervised relation classification for Information Extraction. CIKM-2004. S.B. Zhao and R. Grishman. 2005. Extracting relations with integrated information using kernel methods. ACL-2005: 419-426. Zhou, J. Su, J. Zhang, and M. Zhang. 2005. Exploring various knowledge in relation extraction. G.D. ACL-2005: 427-434. Zhou, L.H. Qian, and J.X. Fan. 2010. Tree kernelbased semantic relation extraction with rich syntactic and semantic information. Information Sciences, (179): 1785-1791 . G.D. Zhou, L.H. Qian, and Q.M. Zhu. 2009. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities. G.D. Computer Speech and Language, 23(4): 464-478. G.D. Zhou and M. Zhang. 2007. Extraction relation information from text documents by exploring various types of knowledge. Information Processing and Management, (42):969-982. G.D. Zhou, M. Zhang, D.H. Ji, and Q.M. Zhu. 2007. Tree Kernel-based Relation Extraction with ContextSensitive Structured Parse Tree Information. EMNLP/CoNLL-2007: 728-736. 355