nips nips2006 nips2006-178 nips2006-178-reference knowledge-graph by maker-knowledge-mining

178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation


Source: pdf

Author: Gavin C. Cawley, Nicola L. Talbot, Mark Girolami

Abstract: Multinomial logistic regression provides the standard penalised maximumlikelihood solution to multi-class pattern recognition problems. More recently, the development of sparse multinomial logistic regression models has found application in text processing and microarray classification, where explicit identification of the most informative features is of value. In this paper, we propose a sparse multinomial logistic regression method, in which the sparsity arises from the use of a Laplace prior, but where the usual regularisation parameter is integrated out analytically. Evaluation over a range of benchmark datasets reveals this approach results in similar generalisation performance to that obtained using cross-validation, but at greatly reduced computational expense. 1


reference text

[1] P. McCullagh and J. A. Nelder. Generalized linear models, volume 37 of Monographs on Statistics and Applied Probability. Chapman & Hall/CRC, second edition, 1989.

[2] D. W. Hosmer and S. Lemeshow. Applied logistic regression. Wiley, second edition, 2000.

[3] C. K. Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1):41–46, January 1970.

[4] M. Saerens, P. Latinne, and C. Decaestecker. Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neural Computation, 14(1):21–41, 2001.

[5] J. O. Berger. Statistical decision theory and Bayesian analysis. Springer Series in Statistics. Springer, second edition, 1985.

[6] J. Zhu and T. Hastie. Classification of gene microarrays by penalized logistic regression. Biostatistics, 5(3):427–443, 2004.

[7] X. Zhou, X. Wang, and E. R. Dougherty. Multi-class cancer classification using multinomial probit regression with Bayesian gene selection. IEE Proceedings - Systems Biology, 153(2):70–76, March 2006.

[8] T. Zhang and F. J. Oles. Text categorization based on regularised linear classification methods. Information Retrieval, 4(1):5–31, April 2001.

[9] L. Narlikar and A. J. Hartemink. Sequence features of DNA binding sites reveal structural class of associated transcription factor. Bioinformatics, 22(2):157–163, 2006.

[10] M. J. L. Orr. Regularisation in the selection of radial basis function centres. Neural Computation, 7(3):606–623, 1995.

[11] Y. Grandvalet. Least absolute shrinkage is equivalent to quadratic penalisation. In L. Niklasson, M. Bod´ n, and T. Ziemske, editors, Proceedings of the International Conference on Artificial Neural e Networks, Perspectives in Neural Computing, pages 201–206, Sk¨ vde, Sweeden, September 2–4 1998. o Springer.

[12] Y. Grandvalet and S. Canu. Outcomes of the quivalence of adaptive ridge with least absolute shrinkage. In Advances in Neural Information Processing Systems, volume 11. MIT Press, 1999.

[13] M. E. Tipping. Sparse Bayesian learning and the Relevance Vector Machine. Journal of Machine Learning Research, 1:211–244, 2001.

[14] A. C. Faul and M. E. Tipping. Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, 3–6 January 2003.

[15] R. Tibshirani. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society - Series B, 58:267–288, 1996.

[16] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.

[17] P. M. Williams. Bayesian regularization and pruning using a Laplace prior. Neural Computation, 7(1):117–143, 1995.

[18] C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.

[19] J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogelman Souli´ and J. H´ rault, editors, Neurocomputing: Algoe e rithms, architectures and applications, pages 227–236. Springer-Verlag, New York, 1990.

[20] A. N. Tikhonov and V. Y. Arsenin. Solutions of ill-posed problems. John Wiley, New York, 1977.

[21] S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilema. Neural Computation, 4(1):1–58, 1992.

[22] D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720–736, 1992.

[23] W. L. Buntine and A. S. Weigend. Bayesian back-propagation. Complex Systems, 5:603–643, 1991.

[24] I. S. Gradshteyn and I. M. Ryzhic. Table of Integrals, Series and Products. Academic Press, fifth edition, 1994.

[25] S. K. Shevade and S. S. Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246–2253, 2003.

[26] D. Madigan, A. Genkin, D. D. Lewis, and D. Fradkin. Bayesian multinomial logistic regression for author identification. In AIP Conference Proceedings, volume 803, pages 509–516, 2005.

[27] P. M. Williams. A Marquardt algorithm for choosing the step size in backpropagation learning with conjugate gradients. Technical Report CSRP-229, University of Sussex, February 1991.

[28] G. J. McLachlan. Discriminant analysis and statistical pattern recognition. Wiley, 1992.

[29] M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1150–1159, September 2003.

[30] B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink. Sprse multinomial logistic regression: Fast algorithms and generalisation bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):957–968, June 2005.

[31] J. M. Bioucas-Dias, M. A. T. Figueiredo, and J. P. Oliveira. Adaptive total variation image deconvolution: A majorization-minimization approach. In Proceedings of the European Signal Processing Conference (EUSIPCO’2006), Florence, Italy, September 2006.

[32] G. C. Cawley and N. L. C. Talbot. Gene selection in cancer classification using sparse logistic regression with Bayesian regularisation. Bioinformatics, 22(19):2348–2355, October 2006.