emnlp emnlp2013 emnlp2013-77 emnlp2013-77-reference knowledge-graph by maker-knowledge-mining

77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Source: pdf

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

reference text

Amjad Abu-Jbara, Ben King, Mona Diab, and Dragomir Radev. 2013. Identifying Opinion Subgroups in Arabic Online Discussions. In Proceedings of ACL. Apoorv Agarwal and Jasneet Sabharwal. 2012. End-to- End Sentiment Analysis of Twitter Data. In Proceedings of the Workshop on Information Extraction and Entity Analytics on Social Media Data, at the 24th International Conference on Computational Linguistics (IEEASMD-COLING 2012), Vol. 2. David Andrzejewski, Xiaojin Zhu, and Mark Craven. 2009. Incorporating domain knowledge into topic modeling via Dirichlet Forest priors. In Proceedings of ICML, pages 25–32. David Andrzejewski, Xiaojin Zhu, Mark Craven, and Benjamin Recht. 201 1. A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. In Proceedings of IJCAI, pages 117 1–1 177. Sasha Blair-goldensohn, Tyler Neylon, Kerry Hannan, George A. Reis, Ryan Mcdonald, and Jeff Reynar. 2008. Building a sentiment summarizer for local service reviews. In Proceedings of In NLP in the Information Explosion Era. David M. Blei and Jon D. McAuliffe. 2007. Supervised Topic Models. In Proceedings of NIPS. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. S. R. K. Branavan, Harr Chen, Jacob Eisenstein, and Regina Barzilay. 2008. Learning Document-Level Semantic Properties from Free-Text Annotations. In Proceedings of ACL, pages 263–271 . Samuel Brody and Noemie Elhadad. 2010. An unsupervised aspect-sentiment model for online reviews. In Proceedings of NAACL, pages 804–8 12. Nicola Burns, Yaxin Bi, Hui Wang, and Terry Anderson. 2012. Extended Twofold-LDA Model for Two Aspects in One Sentence. Advances in Computational Intelligence, Vol. 298, pages 265– 275. Springer Berlin Heidelberg. Giuseppe Carenini, Raymond T. Ng, and Ed Zwart. 2005. Extracting knowledge from evaluative text. In Proceedings of K-CAP, pages 11–1 8. Jonathan Chang, Jordan Boyd-Graber, Wang Chong, Sean Gerrish, and David Blei, M. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. In Proceedings of NIPS, pages 288–296. Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013a. Leveraging Multi-Domain Prior Knowledge in Topic Models. In Proceedings of IJCAI, pages 2071–2077. Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013b. Discovering Coherent Topics Using General Knowledge. In Proceedings of CIKM. Yejin Choi and Claire Cardie. 2010. Hierarchical Sequential Learning for Extracting Opinions and their Attributes, pages 269–274. Lei Fang and Minlie Huang. 2012. Fine Granular Aspect Analysis using Latent Structural Models. In Proceedings of ACL, pages 333–337. Thomas L. Griffiths and Mark Steyvers. 2004. Finding Scientific Topics. PNAS, 101 Suppl, 5228–5235. Honglei Guo, Huijia Zhu, Zhili Guo, Xiaoxun Zhang, and Zhong Su. 2009. Product feature categorization with multilevel latent semantic association. In Proceedings of CIKM, pages 1087–1096. Gregor Heinrich. 2009. A Generic Approach to Topic Models. In Proceedings of ECML PKDD, pages 5 17 – 532. Minqing Hu and Bing Liu. 2004. Mining and Summarizing Customer Reviews. In Proceedings of KDD, pages 168–177. Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff. 201 1. Interactive Topic Modeling. In Proceedings of ACL, pages 248–257. Hemant Ishwaran and LF James. 2001 . Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173. Jagadeesh Jagarlamudi, Hal Daumé III, and Raghavendra Udupa. 2012. Incorporating Lexical Priors into Topic Models. In Proceedings of EACL, pages 204–2 13. Niklas Jakob and Iryna Gurevych. 2010. Extracting Opinion Targets in a Single- and Cross-Domain Setting with Conditional Random Fields. In Proceedings of EMNLP, pages 1035–1045. Yohan Jo and Alice H. Oh. 201 1. Aspect and sentiment unification model for online review analysis. In Proceedings of WSDM, pages 815–824. Alistair Kennedy and Diana Inkpen. 2006. Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence, 22(2), 110–125. Jungi Kim, Jinji Li, and Jong-Hyeok Lee. 2009. Discovering the Discriminative Views: Measuring Term Weights for Sentiment Analysis. In Proceedings of ACL/IJCNLP, pages 253–261 . Suin Kim, Jianwen Zhang, Zheng Chen, Alice Oh, and Shixia Liu. 2013. A Hierarchical Aspect-Sentiment Model for Online Reviews. In Proceedings of AAAI. Nozomi Kobayashi, Kentaro Inui, and Yuji Matsumoto. 2007. Extracting Aspect-Evaluation and Aspect-of Relations in Opinion Mining. In Proceedings of EMNLP, pages 1065–1074. Lun-Wei Ku, Yu-Ting Liang, and Hsin-Hsi Chen. 2006. Opinion Extraction, Summarization and Tracking in 1665 News and Blog Corpora. In Proceedings of AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pages 100–107. JR Landis and GG Koch. 1977. The measurement of observer agreement for categorical data. biometrics, 33. Angeliki Lazaridou, Ivan Titov, and Caroline Sporleder. 2013. A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations. In Proceedings of ACL. Fangtao Li, Chao Han, Minlie Huang, Xiaoyan Zhu, Yingju Xia, Shu Zhang, and Hao Yu. 2010. Structure-Aware Review Mining and Summarization. In Proceedings of COLING, pages 653–661 . Fangtao Li, Sinno Jialin Pan, Ou Jin, Qiang Yang, and Xiaoyan Zhu. 2012a. Cross-Domain Co-Extraction of Sentiment and Topic Lexicons. In Proceedings of ACL (1), pages 410–419. Peng Li, Yinglin Wang, Wei Gao, and Jing Jiang. 201 1. Generating Aspect-oriented Multi-Document Summarization with Event-aspect model. In Proceedings of EMNLP, pages 1137–1 146. Shoushan Li, Rongyang Wang, and Guodong Zhou. 2012b. Opinion Target Extraction Using a Shallow Semantic Parsing Framework. In Proceedings of AAAI. Chenghua Lin and Yulan He. 2009. Joint sentiment/topic model for sentiment analysis. In Proceedings of CIKM, pages 375–384. Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers. Bin Lu, Myle Ott, Claire Cardie, and Benjamin K. Tsou. 201 1. Multi-aspect Sentiment Analysis with Topic Models. In Proceedings of ICDM Workshops, pages 8 1–88. Yue Lu, Hongning Wang, ChengXiang Zhai, and Dan Roth. 2012. Unsupervised discovery of opposing opinion networks from forum discussions. In Proceedings of CIKM, pages 1642–1646. Yue Lu and Chengxiang Zhai. 2008. Opinion integration through semi-supervised topic modeling. In Proceedings of WWW, pages 121–130. Yue Lu, ChengXiang Zhai, and Neel Sundaresan. 2009. Rated aspect summarization of short comments. In Proceedings of WWW, pages 13 1–140. Hosam Mahmoud. 2008. Polya Urn Models. Chapman & Hall/CRC Texts in Statistical Science. Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. 2007. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of WWW, pages 171–1 80. Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Sujian Li, and Houfeng Wang. 2012. Entity-centric topic-oriented opinion summarization in twitter. In Proceedings of KDD, pages 379–387. George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, 38(1 1), 39–41 . David Mimno, Hanna M. Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 201 1. Optimizing semantic coherence in topic models. In Proceedings of EMNLP, pages 262–272. Samaneh Moghaddam and Martin Ester. 201 1. ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews. In Proceedings of SIGIR, pages 665–674. Saif Mohammad, Cody Dunne, and Bonnie J. Dorr. 2009. Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus. In Proceedings of EMNLP, pages 599–608. Arjun Mukherjee and Bing Liu. 2012. Aspect Extraction through Semi-Supervised Modeling. In Proceedings of ACL, pages 339–348. David Newman, Youn Noh, Edmund Talley, Sarvnaz Karimi, and Timothy Baldwin. 2010. Evaluating topic models for digital libraries. In Proceedings of JCDL, pages 215–224. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1–135. James Petterson, Alex Smola, Tibério Caetano, Wray Buntine, and Shravan Narayanamurthy. 2010. Word Features for Latent Dirichlet Allocation. In Proceedings of NIPS, pages 192 1–1929. AM Popescu and Oren Etzioni. 2005. Extracting product features and opinions from reviews. In Proceedings of HLT, pages 339–346. Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. 201 1. Opinion Word Expansion and Target Extraction through Double Propagation. Computational Linguistics, 37(1), 9–27. Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher 2009. D. Manning. Labeled LDA: a supervised topic model for credit attribution in multilabeled corpora. In 248–256. Proceedings of EMNLP, 2013. Christina Sauper and Regina Barzilay. Aggregation by Joint Modeling Values. J. Artif. Intell. Res. (JAIR), Christina Automatic of Aspects and 46, 89–127. Sauper, Aria Haghighi, and Regina Barzilay. 2011. Content of ACL, pages Swapna Models with Attitude. In 350–358. Somasundaran Recognizing Proceedings Keith Stevens in J. 226–234. and PKDAD Buttler. topics. In over many Proceedings Proceedings 2009. Wiebe. online of ACL, pages Coherence 952–961 and stances Topic Veselin pages debates. 2012. models In Exploring and of EMNLP-CoNLL, many pages . Stoyanov Automatically and Claire Creating Cardie. General-Purpose 2011. Opinion 1666 Summaries from Text. In Proceedings of RANLP, pages 202–209. Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2006. Hierarchical dirichlet processes. Journal of the American Statistical Association, 1–30. Ivan Titov and Ryan McDonald. 2008. Modeling online reviews with multi-grain topic models. In Proceedings of WWW, pages 111–120. Dong Wang and Yang Liu. 201 1. A Pilot Study of Opinion Summarization in Conversations. In Proceedings of ACL, pages 33 1–339. Hongning Wang, Yue Lu, and Chengxiang Zhai. 2010. Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of KDD, pages 783–792. Hongning Wang, Yue Lu, and ChengXiang Zhai. 2011. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of KDD, pages 618–626. Janyce Wiebe and Ellen Riloff. 2005. Creating Opinions with a MaxEnt-LDA Hybrid. In Proceedings of EMNLP, pages 56–65. Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2012. Cross-Language Opinion Target Extraction in Review Texts. In Proceedings of ICDM, pages 1200–1205. Li Zhuang, Feng Jing, and Xiao-Yan Zhu. 2006. Movie review mining and summarization. In Proceedings of CIKM, pages 43–50. ACM Press. Subjective and Objective Sentence Classifiers from Unannotated Texts. In Proceedings of CICLing, pages 486–497. Janyce Wiebe, Theresa Wilson, Rebecca F. Bruce, Matthew Bell, and Melanie Martin. 2004. Learning Subjective Language. Computational Linguistics, 30(3), 277–308. Michael Wiegand and Dietrich Klakow. 2010. Convolution Kernels for Opinion Holder Extraction. In Proceedings of HLT-NAACL, pages 795–803. Yuanbin Wu, Qi Zhang, Xuanjing Huang, and Lide Wu. 2009. Phrase dependency parsing for opinion mining. In Proceedings of EMNLP, pages 1533– 1541 . Yuanbin Wu, Qi Zhang, Xuanjing Huang, and Lide Wu. 2011. Structural Opinion Mining for Graphbased Sentiment Representation. In Proceedings of EMNLP, pages 1332–1341 . Yunqing Xia, Boyi Hao, and Kam-Fai Wong. 2009. Opinion Target Network and Bootstrapping Method for Chinese Opinion Target Extraction. In Proceedings of AIRS, pages 339–350. Jianxing Yu, Zheng-Jun Zha, Meng Wang, and TatSeng Chua. 2011. Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews. In Proceedings of ACL, pages 1496–1505. Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia. 2011. Constrained LDA for grouping product features in opinion mining. In Proceedings of the 15th PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 448–459. Lei Zhang and Bing Liu. 201 1. Identifying Noun Product Features that Imply Opinions. In Proceedings of ACL (Short Papers), pages 575–580. Wayne Xin Zhao, Jing Jiang, Hongfei Yan, and Xiaoming Li. 2010. Jointly Modeling Aspects and 1667