jmlr jmlr2008 jmlr2008-91 jmlr2008-91-reference knowledge-graph by maker-knowledge-mining

91 jmlr-2008-Trust Region Newton Method for Logistic Regression


Source: pdf

Author: Chih-Jen Lin, Ruby C. Weng, S. Sathiya Keerthi

Abstract: Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also extend the proposed method to large-scale L2-loss linear support vector machines (SVM). Keywords: logistic regression, newton method, trust region, conjugate gradient, support vector machines


reference text

Arthur Asuncion and David J. Newman. UCI machine learning repository, 2007. URL http: //www.ics.uci.edu/$\sim$mlearn/{MLR}epository.html. Suhrid Balakrishnan and David Madigan. Algorithms for sparse linear classifiers in the massive data setting. 2005. URL http://www.stat.rutgers.edu/˜madigan/PAPERS/sm.pdf. Steven Benson and Jorge J. Mor´ . A limited memory variable metric method for bound constrained e minimization. Preprint MCS-P909-0901, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, 2001. Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, 1992. Ali Bouaricha, Jorge J. Mor´ , and Zhijun Wu. Newton’s method for large-scale optimization. e Preprint MCS-P635-0197, Argonne National Laboratory, Argonne, Illinois, 1997. John N. Darroch and Douglas Ratcliff. Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5):1470–1480, 1972. Iain S. Duff, Roger G. Grimes, and John G. Lewis. Sparse matrix test problems. ACM Transactions on Mathematical Software, 15:1–14, 1989. Joshua Goodman. Sequential conditional generalized iterative scaling. In ACL, pages 9–16, 2002. Rong Jin, Rong Yan, Jian Zhang, and Alex G. Hauptmann. A faster iterative scaling algorithm for conditional exponential model. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003. Thorsten Joachims. Training linear SVMs in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2006. Wei-Chun Kao, Kai-Min Chung, Chia-Liang Sun, and Chih-Jen Lin. Decomposition methods for linear support vector machines. Neural Computation, 16(8):1689–1704, 2004. URL http:// www.csie.ntu.edu.tw/˜cjlin/papers/linear.pdf. S. Sathiya Keerthi and Dennis DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341–361, 2005. Kwangmoo Koh, Seung-Jean Kim, and Stephen Boyd. An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research, 8:1519–1555, 2007. URL http://www.stanford.edu/˜boyd/l1_logistic_reg.html. Paul Komarek and Andrew W. Moore. Making logistic regression a core data mining tool: A practical investigation of accuracy, speed, and simplicity. Technical Report TR-05-27, Robotics Institute, Carnegie Mellon University, 2005. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004. 648 T RUST R EGION N EWTON M ETHOD FOR L OGISTIC R EGRESSION Chih-Jen Lin and Jorge J. Mor´ . Newton’s method for large-scale bound constrained problems. e SIAM Journal on Optimization, 9:1100–1127, 1999. Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. Trust region Newton method for large-scale logistic regression. In Proceedings of the 24th International Conference on Machine Learning (ICML), 2007. Software available at http://www.csie.ntu.edu.tw/˜cjlin/liblinear. Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989. Robert Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th conference on Natural language learning, pages 1–7. Association for Computational Linguistics, 2002. Olvi L. Mangasarian. A finite Newton method for classification. Optimization Methods and Software, 17(5):913–929, 2002. Thomas P. Minka. A comparison of numerical optimizers for logistic regression, 2003. URL http://research.microsoft.com/˜minka/papers/logreg/. Stephen G. Nash. A survey of truncated-Newton methods. Journal of Computational and Applied Mathematics, 124(1-2):45–59, 2000. Jorge Nocedal and Stephen G. Nash. A numerical study of the limited memory BFGS method and the truncated-newton method for large scale optimization. SIAM Journal on Optimization, 1(3): 358–372, 1991. Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997. John C. Platt. Fast training of support vector machines using sequential minimal optimization. In Bernhard Sch¨ lkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in o Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. MIT Press. Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML), 2007. Alex J. Smola, S V N Vishwanathan, and Quoc Le. Bundle methods for machine learning. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008. Trond Steihaug. The conjugate gradient method and trust regions in large scale optimization. SIAM Journal on Numerical Analysis, 20:626–637, 1983. Charles Sutton and Andrew McCallum. An introduction to conditional random fields for relational learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning. MIT Press, 2006. 649 L IN , W ENG AND K EERTHI Xiaolei Zou, I. Michael Navon, M. Berger, P. K. H. Phua, Tamar Schlick, and F.X. Le Dimet. Numerical experience with limited-memory quasi-Newton and truncated Newton methods. SIAM Journal on Optimization, 3(3):582–608, 1993. 650