nips nips2009 nips2009-153 nips2009-153-reference knowledge-graph by maker-knowledge-mining

153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model


Source: pdf

Author: Tomoharu Iwata, Takeshi Yamada, Naonori Ueda

Abstract: We propose a probabilistic topic model for analyzing and extracting contentrelated annotations from noisy annotated discrete data such as web pages stored in social bookmarking services. In these services, since users can attach annotations freely, some annotations do not describe the semantics of the content, thus they are noisy, i.e. not content-related. The extraction of content-related annotations can be used as a preprocessing step in machine learning tasks such as text classification and image recognition, or can improve information retrieval performance. The proposed model is a generative model for content and annotations, in which the annotations are assumed to originate either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method by using synthetic data and real social annotation data for text and images.


reference text

[1] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107–1135, 2003.

[2] D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 127–134, 2003.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[4] C. Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural o Information Processing Systems 19, pages 241–248. MIT Press, 2007.

[5] CiteULike. http://www.citeulike.org.

[6] G. Csurka, C. Dance, J. Willamowski, L. Fan, and C. Bray. Visual categorization with bags of keypoints. In ECCV International Workshop on Statistical Learning in Computer Vision, 2004.

[7] Delicious. http://delicious.com.

[8] S. Feng, R. Manmatha, and V. Lavrenko. Multiple Bernoulli relevance models for image and video annotation. In CVPR ’04: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 1002–1009, 2004.

[9] Flickr. http://flickr.com.

[10] S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198–208, 2006.

[11] T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101 Suppl 1:5228–5235, 2004.

[12] Hatena::Bookmark. http://b.hatena.ne.jp.

[13] T. Hofmann. Probabilistic latent semantic analysis. In UAI ’99: Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, pages 289–296, 1999.

[14] T. Hofmann. Collaborative filtering via Gaussian probabilistic latent semantic analysis. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 259–266. ACM Press, 2003.

[15] T. Iwata, T. Yamada, and N. Ueda. Probabilistic latent semantic visualization: topic model for visualizing documents. In KDD ’08: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 363–371. ACM, 2008.

[16] J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 119–126. ACM, 2003.

[17] J. Jeon and R. Manmatha. Using maximum entropy for automatic image annotation. In CIVR ’04: Proceedings of the 3rd International Conference on Image and Video Retrieval, pages 24–32, 2004.

[18] K. Lang. NewsWeeder: learning to filter netnews. In ICML ’95: Proceedings of the 12th International Conference on Machine Learning, pages 331–339, 1995.

[19] Last.fm. http://www.last.fm.

[20] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

[21] T. Minka. Estimating a Dirichlet distribution. Technical report, M.I.T., 2000.

[22] K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67, 1999.

[23] Technorati. http://technorati.com.

[24] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

[25] X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In WWW ’06: Proceedings of the 15th International Conference on World Wide Web, pages 417–426. ACM, 2006.

[26] YouTube. http://www.youtube.com.

[27] D. Zhou, J. Bian, S. Zheng, H. Zha, and C. L. Giles. Exploring social annotations for information retrieval. In WWW ’08: Proceeding of the 17th International Conference on World Wide Web, pages 715–724. ACM, 2008. 9