nips nips2004 nips2004-162 nips2004-162-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sunita Sarawagi, William W. Cohen
Abstract: We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semi-CRFs generally outperform conventional CRFs. 1
[1] V. R. Borkar, K. Deshmukh, and S. Sarawagi. Automatic text segmentation for extracting structured records. In Proc. ACM SIGMOD International Conf. on Management of Data, Santa Barabara,USA, 2001.
[2] A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Sixth Workshop on Very Large Corpora New Brunswick, New Jersey. Association for Computational Linguistics., 1998.
[3] R. Bunescu and R. J. Mooney. Relational markov networks for collective information extraction. In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning (SRL2004), Banff, Canada, July 2004.
[4] M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177–210, 2003.
[5] W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03), 2003.
[6] W. W. Cohen and S. Sarawagi. Exploiting dictionaries in named entity extraction: Combining semi-markov extraction processes and data integration methods. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004.
[7] M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Empirical Methods in Natural Language Processing (EMNLP), 2002.
[8] X. Ge. Segmental Semi-Markov Models and Applications to Sequence Analysis. PhD thesis, University of California, Irvine, December 2002.
[9] J. Janssen and N. Limnios. Semi-Markov Models and Applications. Kluwer Academic, 1999.
[10] R. E. Kraut, S. R. Fussell, F. J. Lerch, and J. A. Espinosa. Coordination in teams: evidence from a simulated management game. To appear in the Journal of Organizational Behavior, 2005.
[11] A. Krogh. Gene finding: putting the parts together. In M. J. Bishop, editor, Guide to Human Genome Computing, pages 261–274. Academic Press, 2nd edition, 1998.
[12] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML-2001), Williams, MA, 2001.
[13] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large-scale optimization. Mathematic Programming, 45:503–528, 1989.
[14] R. Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of The Sixth Conference on Natural Language Learning (CoNLL-2002), pages 49–55, 2002.
[15] A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of The Seventh Conference on Natural Language Learning (CoNLL-2003), Edmonton, Canada, 2003.
[16] F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of HLTNAACL, 2003.
[17] M. Skounakis, M. Craven, and S. Ray. Hierarchical hidden Markov models for information extraction. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico. Morgan Kaufmann., 2003.
[18] C. Sutton, K. Rohanimanesh, and A. McCallum. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. In ICML, 2004.
[19] R. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999.
[20] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proceedings of Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02), Edmonton, Canada, 2002.
[21] D. H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.