acl acl2013 acl2013-365 acl2013-365-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vidhya Govindaraju ; Ce Zhang ; Christopher Re
Abstract: Tabular information in text documents contains a wealth of information, and so tables are a natural candidate for information extraction. There are many cues buried in both a table and its surrounding text that allow us to understand the meaning of the data in a table. We study how natural-language tools, such as part-of-speech tagging, dependency paths, and named-entity recognition, can be used to improve the quality of relation extraction from tables. In three domains we show that (1) a model that performs joint probabilistic inference across tabular and natural language features achieves an F1 score that is twice as high as either a puretable or pure-text system, and (2) using only shallower features or non-joint inference results in lower quality.
Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang. 2008. WebTables: Exploring the power of tables on the web. Proceedings of VLDB Endowment, 1(1) . Hsin-Hsi Chen, Shih-Chung Tsai, and Jin-He Tsai. 2000. Mining tables from large scale HTML texts. In Proceedings of the 18th Conference on Computational Linguistics, COLING ’00. Nancy Chinchor. 1992. The statistical significance of the MUC-4 results. In Proceedings of the 4th Conference on Message Understanding, MUC4 ’92. Bhavana Bharat Dalvi, William Cohen, and Jamie Callan. 2012. WebSets: Extracting sets of entities from the web using unsupervised information extraction. In Proceedings of the 5th A CM International Conference on Web Search and Data Mining, WSDM ’12. Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the 5th International Conference on Language Resources and Evaluation. Robert Duin. 2002. The combining classifier: to train or not to train? In 16th International Conference on Pattern Recognition. Jason Eisner. 2009. Joint models with missing data for semi-supervised learning. In NAA CL HLT Workshop on Semi-supervised Learning for Natural Language Processing. Jing Fang, Prasenjit Mitra, Zhi Tang, and C. Lee Giles. 2012. Table header detection and classification. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI ’12. Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05. Matthew Hurst and Tetsuya Nasukawa. 2000. Layout and language: Integrating spatial and linguistic knowledge for layout understanding tasks. In Proceedings of the 18th Conference on Computational Linguistics, COLING ’00. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pages 282–289, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han, and Bing Liu. 2010. Entity relation discovery from web tables and links. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10. Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: Automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’07. Andrew McCallum. 2009. Joint inference for natural language processing. In Proceedings of the 13th Conference on Computational Natural Language Learning, CoNLL ’09. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 4 7th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL ’09. David Pinto, Andrew McCallum, Xing Wei, and W. Bruce Croft. 2003. Table extraction using conditional random fields. In Proceedings of the 26th Annual International A CM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03. Aleksander Pivk. 2006. Automatic ontology generation from web tabular structures. AI Communication, 19(1) . Hoifung Poon and Pedro Domingos. 2007. Joint inference in information extraction. In Proceedings of the 22nd National Conference on Artificial intelligence, AAAI’07. Sameer Singh, Karl Schultz, and Andrew McCallum. 2009. Bi-directional joint inference for entity resolution and segmentation using imperatively-defined factor graphs. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD ’09. Ashwin Tengli, Yiming Yang, and Nian Li Ma. 2004. Learning table extraction from examples. In Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04. Kristina Toutanova and Christopher D. Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing, EMNLP ’00. Xing Wei, Bruce Croft, and Andrew McCallum. 2006. Table extraction for answer retrieval. Information Retrieval, 9(5) . 663 Dekai Wu and Ken Wing Kuen Lee. 2006. A grammatical approach to understanding textual tables using two-dimensional scfgs. In Proceedings of the COLING/ACL, COLING-ACL ’06. Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10. Burcu Yildiz. 2004. Information extraction – utilizing table patterns. Master’s thesis, Institutf u¨r Softwaretechnik und Interaktive Systeme. Ce Zhang and Christopher R e´. 2013. Towards high-throughput Gibbs sampling at scale: A study across storage managers. SIGMOD ’13. Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher R e´, and Shanan Peters. 2013. GeoDeepDive: Statistical inference using familiar data-processing languages. SIGMOD ’13. 664