nips nips2012 nips2012-58 nips2012-58-reference knowledge-graph by maker-knowledge-mining

58 nips-2012-Bayesian models for Large-scale Hierarchical Classification


Source: pdf

Author: Siddharth Gopal, Yiming Yang, Bing Bai, Alexandru Niculescu-mizil

Abstract: A challenging problem in hierarchical classification is to leverage the hierarchical relations among classes for improving classification performance. An even greater challenge is to do so in a manner that is computationally feasible for large scale problems. This paper proposes a set of Bayesian methods to model hierarchical dependencies among class labels using multivariate logistic regression. Specifically, the parent-child relationships are modeled by placing a hierarchical prior over the children nodes centered around the parameters of their parents; thereby encouraging classes nearby in the hierarchy to share similar model parameters. We present variational algorithms for tractable posterior inference in these models, and provide a parallel implementation that can comfortably handle largescale problems with hundreds of thousands of dimensions and tens of thousands of classes. We run a comparative evaluation on multiple large-scale benchmark datasets that highlights the scalability of our approach and shows improved performance over the other state-of-the-art hierarchical methods. 1


reference text

[1] P.N. Bennett and N. Nguyen. Refined experts: improving classification in large taxonomies. In SIGIR, 2009.

[2] C.M. Bishop. Pattern recognition and machine learning.

[3] C.M. Bishop and M.E. Tipping. Bayesian regression and classification. 2003.

[4] D. Borthakur. The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11:21, 2007.

[5] G. Bouchard. Efficient bounds for the softmax function. 2007.

[6] L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In CIKM, pages 78–87. ACM, 2004.

[7] George Casella. Empirical bayes method - a tutorial. Technical report.

[8] I. Dimitrovski, D. Kocev, L. Suzana, and S. Dˇ eroski. Hierchical annotation of medical images. z In IMIS, 2008.

[9] C.B. Do, C.S. Foo, and A.Y. Ng. Efficient multiple hyperparameter learning for log-linear models. In Neural Information Processing Systems, volume 21, 2007.

[10] S. Dumais and H. Chen. Hierarchical classification of web content. In SIGIR, 2000.

[11] A. Gelman. Prior distributions for variance parameters in hierarchical models. BA.

[12] R.E. Kass and R. Natarajan. A default conjugate prior for variance components in generalized linear mixed models. Bayesian Analysis, 2006.

[13] D.C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1):503–528, 1989.

[14] T.Y. Liu, Y. Yang, H. Wan, H.J. Zeng, Z. Chen, and W.Y. Ma. Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD, pages 36–43, 2005.

[15] Z.Q. Luo and P. Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, 1992.

[16] D.J.C. MacKay. The evidence framework applied to classification networks. Neural computation, 1992.

[17] A. McCallum, R. Rosenfeld, T. Mitchell, and A.Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In ICML, pages 359–367, 1998.

[18] B. Shahbaba and R.M. Neal. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Analysis, 2(1):221–238, 2007.

[19] M.E. Tipping. Sparse bayesian learning and the relevance vector machine. JMLR, 1:211–244, 2001.

[20] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6(2):1453, 2006.

[21] J. Weston and C. Watkins. Multi-class support vector machines. Technical report, 1998.

[22] G.R. Xue, D. Xing, Q. Yang, and Y. Yu. Deep classification in large-scale text hierarchies. In SIGIR, pages 619–626. ACM, 2008.

[23] D. Zhou, L. Xiao, and M. Wu. Hierarchical classification via orthogonal transfer. Technical report, MSR-TR-2011-54, 2011. 9