nips nips2004 nips2004-162 nips2004-162-reference knowledge-graph by maker-knowledge-mining

162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction

Source: pdf

Author: Sunita Sarawagi, William W. Cohen

Abstract: We describe semi-Markov conditional random ﬁelds (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time—often only a small constant factor slower than conventional CRFs. In experiments on ﬁve named entity recognition problems, semi-CRFs generally outperform conventional CRFs. 1

reference text

[1] V. R. Borkar, K. Deshmukh, and S. Sarawagi. Automatic text segmentation for extracting structured records. In Proc. ACM SIGMOD International Conf. on Management of Data, Santa Barabara,USA, 2001.

[2] A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Sixth Workshop on Very Large Corpora New Brunswick, New Jersey. Association for Computational Linguistics., 1998.

[3] R. Bunescu and R. J. Mooney. Relational markov networks for collective information extraction. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning (SRL2004), Banff, Canada, July 2004.

[4] M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177–210, 2003.

[5] W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03), 2003.

[6] W. W. Cohen and S. Sarawagi. Exploiting dictionaries in named entity extraction: Combining semi-markov extraction processes and data integration methods. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004.

[7] M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Empirical Methods in Natural Language Processing (EMNLP), 2002.

[8] X. Ge. Segmental Semi-Markov Models and Applications to Sequence Analysis. PhD thesis, University of California, Irvine, December 2002.

[9] J. Janssen and N. Limnios. Semi-Markov Models and Applications. Kluwer Academic, 1999.

[10] R. E. Kraut, S. R. Fussell, F. J. Lerch, and J. A. Espinosa. Coordination in teams: evidence from a simulated management game. To appear in the Journal of Organizational Behavior, 2005.

[11] A. Krogh. Gene ﬁnding: putting the parts together. In M. J. Bishop, editor, Guide to Human Genome Computing, pages 261–274. Academic Press, 2nd edition, 1998.

[12] J. Lafferty, A. McCallum, and F. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML-2001), Williams, MA, 2001.

[13] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large-scale optimization. Mathematic Programming, 45:503–528, 1989.

[14] R. Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of The Sixth Conference on Natural Language Learning (CoNLL-2002), pages 49–55, 2002.

[15] A. McCallum and W. Li. Early results for named entity recognition with conditional random ﬁelds, feature induction and web-enhanced lexicons. In Proceedings of The Seventh Conference on Natural Language Learning (CoNLL-2003), Edmonton, Canada, 2003.

[16] F. Sha and F. Pereira. Shallow parsing with conditional random ﬁelds. In Proceedings of HLTNAACL, 2003.

[17] M. Skounakis, M. Craven, and S. Ray. Hierarchical hidden Markov models for information extraction. In Proceedings of the 18th International Joint Conference on Artiﬁcial Intelligence, Acapulco, Mexico. Morgan Kaufmann., 2003.

[18] C. Sutton, K. Rohanimanesh, and A. McCallum. Dynamic conditional random ﬁelds: Factorized probabilistic models for labeling and segmenting sequence data. In ICML, 2004.

[19] R. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, 112:181–211, 1999.

[20] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proceedings of Eighteenth Conference on Uncertainty in Artiﬁcial Intelligence (UAI02), Edmonton, Canada, 2002.

[21] D. H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.