acl acl2011 acl2011-261 acl2011-261-reference knowledge-graph by maker-knowledge-mining

261 acl-2011-Recognizing Named Entities in Tweets

Source: pdf

Author: Xiaohua LIU ; Shaodian ZHANG ; Furu WEI ; Ming ZHOU

Abstract: The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning.

reference text

Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Classbased n-gram models of natural language. Comput. Linguist., 18:467–479. Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, and Shivakumar Vaithyanathan. 2010. Domain adaptation of rule-based annotators for named-entity recognition tasks. In EMNLP, pages 1002–1012. Michael Collins. 2002. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In EMNLP, pages 1–8. Doug Downey, Matthew Broadhead, and Oren Etzioni. 2007. Locating Complex Named Entities in Web Text. In IJCAI. Oren Etzioni, Michael Cafarella, Doug Downey, AnaMaria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell., 165(1):91–134. Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating named entities in twitter data with crowdsourcing. In CSLDAMT, pages 80–88. Jenny Rose Finkel and Christopher D. Manning. 2009. Nested named entity recognition. In EMNLP, pages 141–150. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, pages 363–370. Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, Xian Wu, and Zhong Su. 2009. Domain adaptation with latent semantic association for named entity recognition. In NAACL, pages 281–289. Martin Jansche and Steven P. Abney. 2002. Information extraction from voicemail transcripts. In EMNLP, pages 320–327. Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in nlp. In ACL, pages 264– 271. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In ACL, pages 423–430. Vijay Krishnan and Christopher D. Manning. 2006. An effective two-stage model for exploiting non-local dependencies in named entity recognition. InACL, pages 1121–1 128. George R. Krupka and Kevin Hausman. 1998. Isoquest: Description of the netowlTM extractor system as used in muc-7. In MUC-7. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282–289. Andrew Mccallum and Wei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In HLT-NAACL, pages 188–191. 367 Scott Miller, Jethran Guinness, and Alex Zamanian. 2004. Name tagging with word clusters and discriminative training. In HLT-NAACL, pages 337–342. Einat Minkov, Richard C. Wang, and William W. Cohen. 2005. Extracting personal names from email: applying named entity recognition to informal text. In HLT, pages 443–450. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes, 30:3–26. Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In CoNLL, pages 147–155. Sameer Singh, Dustin Hillard, and Chris Leggetter. 2010. Minimally-supervised extraction of entities from text advertisements. In HLT-NAACL, pages 73–81. Jun Suzuki and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In ACL, pages 665–673. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: languageindependent named entity recognition. In HLTNAACL, pages 142–147. Yefeng Wang. 2009. Annotating and recognising named entities in clinical notes. In ACL-IJCNLP, pages 18– 26. Dan Wu, Wee Sun Lee, Nan Ye, and Hai Leong Chieu. 2009. Domain adaptive bootstrapping for named entity recognition. In EMNLP, pages 1523–1532. Kazuhiro Yoshida and Jun’ichi Tsujii. 2007. Reranking for biomedical named-entity recognition. In BioNLP, pages 209–216. Tong Zhang and David Johnson. 2003. A robust risk minimization based named entity recognition system. In HLT-NAACL, pages 204–207. GuoDong Zhou and Jian Su. 2002. Named entity recognition using an hmm-based chunk tagger. In ACL, pages 473–480.