emnlp emnlp2012 emnlp2012-24 emnlp2012-24-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
Arun Ahuja and Doug Downey. 2010. Improved extraction assessment through better language models. In Proceedings of the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT). Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2007. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems 20, Cambridge, MA. MIT Press. Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, 79: 15 1–175. John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP. John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jenn Wortman. 2007. Learning bounds for domain adaptation. In Advances in Neural Information Processing Systems. M. Candito and B. Crabb ´e. 2009. Improving generative statistical parsing with semi-supervised word clustering. In IWPT, pages 138–141. M. Chang, L. Ratinov, and D. Roth. 2007. Guiding semisupervision with constraint-driven learning. In Proceedings of the ACL. Hal Daum e´ III, Abhishek Kumar, and Avishek Saha. 2010. Frustratingly easy semi-supervised domain adaptation. In Proceedings of the ACL Workshop on Domain Adaptation (DANLP). Hal Daum e´ III. 2007. Frustratingly easy domain adaptation. In ACL. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407. Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. 2011. Multi-view learning of word embeddings via cca. In Neural Information Processing Systems (NIPS). A. Emami, P. Xu, and F. Jelinek. 2003. Using a connectionist model in a syntactical based language model. In Proceedings of the International Conference on Spoken Language Processing, pages 372–375. Kuzman Ganchev, Jo˜ ao Gra ¸ca, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:10–49. Zoubin Ghahramani and Michael I. Jordan. 1997. Factorial hidden markov models. Machine Learning, 29(23):245–273. Daniel Gildea. 2001 . Corpus Variation and Parser Performance. In Conference on Empirical Methods in Natural Language Processing. T. Honkela. 1997. Self-organizing maps of words for natural language processing applications. In In Proceedings of the International ICSC Symposium on Soft Computing. Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence labeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Fei Huang and Alexander Yates. 2010. Exploring representation-learning approaches to domain adaptation. In Proceedings of the ACL 2010 Workshop on Domain Adaptation for Natural Language Processing (DANLP). Fei Huang, Alexander Yates, Arun Ahuja, and Doug Downey. 2011. Language models as representations for weakly supervised nlp tasks. In Conference on Natural Language Learning (CoNLL). Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In ACL. P. Liang, M. I. Jordan, and D. Klein. 2009. Learning from measurements in exponential families. In International Conference on Machine Learning (ICML). D. Lin and X Wu. 2009. Phrase clustering for discriminative learning. In ACL-IJCNLP, pages 1030–1038. G. S. Mann and A. McCallum. 2007. Simple, robust, scalable semi-supervised learning via expectation regularization. In In Proc. ICML. Y. Mansour, M. Mohri, and A. Rostamizadeh. 2009. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems. F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, pages 246–252. Sameer Pradhan, Wayne Ward, and James H. Martin. 2007. Towards robust semantic role labeling. In Proceedings of NAACL-HLT, pages 556–563. M. Sahlgren. 2005. An introduction to random indexing. In In Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE). G. Salton and M.J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill. Satoshi Sekine. 1997. The domain dependence of parsing. In Proc. Applied Natural Language Processing (ANLP), pages 96–102. Huihsin Tseng, Daniel Jurafsky, and Christopher Manning. 2005. Morphological features help pos tagging 1323 of unknown words across language varieties. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 384–394. P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37: 141–188. Lijie Wang, Wanxiang Che, and Ting Liu. 2009. An svmtool-based chinese pos tagger. In Journal of Chinese Information Processing. X. Yang, H. Fu, H. Zha, and J. Barlow. 2006. Semisupervised nonlinear dimensionality reduction. In Proceedings of the 23rd International Conference on Machine Learning. T. Zhang and D. Johnson. 2003. A robust risk minimization based named entity recognition system. In CoNLL. D. Zhang, Z.H. Zhou, and S. Chen. 2007. Semisupervised dimensionality reduction. In Proceedings of the 7th SIAM International Conference on Data Mining.