nips nips2012 nips2012-131 nips2012-131-reference knowledge-graph by maker-knowledge-mining

131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

Source: pdf

Author: Chad Scherrer, Ambuj Tewari, Mahantesh Halappanavar, David Haglin

Abstract: Large-scale `1 -regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classiﬁcation and regression problems. High-performance algorithms and implementations are critical to efﬁciently solving these problems. Building upon previous work on coordinate descent algorithms for `1 -regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descent that includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a uniﬁed convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered so that the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental results using data from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the ﬁeld, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale `1 -regularization problems. 1

reference text

J Friedman, T Hastie, H H¨ ﬂing, and R Tibshirani. Pathwise coordinate optimization. Annals of o Applied Statistics, 1(2):302–332, 2007. T Wu and K Lange. Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2:224–244, 2008. S Shalev-Shwartz and A Tewari. Stochastic methods for `1 -regularized loss minimization. Journal of Machine Learning Research, 12:1865–1892, 2011. J K Bradley, A Kyrola, D Bickson, and C Guestrin. Parallel Coordinate Descent for L1-Regularized Loss Minimization. In Proceedings of the 28th International Conference on Machine Learning, pages 321–328, 2011. C Scherrer, A Tewari, M Halappanavar, and D Haglin. Scaling up Parallel Coordinate Descent Algorithms. In International Conference on Machine Learning, 2012. Y Li and S Osher. Coordinate Descent Optimization for `1 Minimization with Application to Compressed Sensing ; a Greedy Algorithm Solving the Unconstrained Problem. Inverse Problems and Imaging, 3:487–503, 2009. I S Dhillon, P Ravikumar, and A Tewari. Nearest neighbor based greedy coordinate descent. In Advances in Neural Information Processing Systems 24, pages 2160–2168, 2011. D Lewis, Y Yang, T Rose, and F Li. RCV1 : A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5:361–397, 2004. S S Keerthi and D DeCoste. A modiﬁed ﬁnite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341–361, 2005. RealSim. Document classiﬁcation data gathered by Andrew McCallum., circa 1997. URL:http: //people.cs.umass.edu/˜mccallum/data.html. Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun-Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, and Shou de Lin. Feature engineering and classiﬁer ensemble for KDD Cup 2010, 2011. To appear in JMLR Workshop and Conference Proceedings. 9