nips nips2011 nips2011-226 nips2011-226-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
[1] E. Berg, M. Schmidt, M. P. Friedlander, and K. Murphy. Group sparsity via linear-time projection. Tech. Rep. TR-2008-09, Department of Computer Science, University of British Columbia, Vancouver, July 2008.
[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[3] N. Choi, W. Li, and J. Zhu. Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105:354–364, 2010.
[4] Z. Dost´l. Box constrained quadratic programming with proportioning and projections. SIAM a Journal on Optimization, 7(3):871–887, 1997.
[5] J. Duchi, S. Shalev-Shwartz, Y. Singer, and C. Tushar. Efficient projection onto the ℓ1 -ball for learning in high dimensions. In International Conference on Machine Learning, 2008.
[6] M. Hamada and C. Wu. Analysis of designed experiments with complex aliasing. Journal of Quality Technology, 24:130–137, 1992.
[7] J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. In International Conference on Machine Learning. 2009.
[8] L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In International Conference on Machine Learning, 2009.
[9] R. Jenatton, J.-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, arXiv:0904.3523v2, 2009.
[10] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In International Conference on Machine Learning, 2010.
[11] S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In International Conference on Machine Learning, 2010.
[12] X. Li, N. Sundarsanam, and D. Frey. Regularities in data from factorial experiments. Complexity, 11:32–45, 2006.
[13] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009.
[14] J. Liu and J. Ye. Efficient Euclidean projections in linear time. In International Conference on Machine Learning, 2009.
[15] J. Liu and J. Ye. Moreau-yosida regularization for grouped tree structure learning. In Advances in Neural Information Processing Systems, 2010.
[16] R. Luss, S. Rosset, and M. Shahar. Decomposing isotonic regression for efficiently solving large problems. In Advances in Neural Information Processing Systems, 2010.
[17] C. Micchelli, J. Morales, and M. Pontil. A family of penalty functions for structured sparsity. In Advances in Neural Information Processing Systems 23, pages 1612–1623. 2010.
[18] J. Nelder. The selection of terms in response-surface models—how strong is the weak-heredity principle? Annals of Applied Statistics, 52:315–318, 1998.
[19] A. Nemirovski. Efficient methods in convex programming. Lecture Notes, 1994.
[20] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.
[21] P. M. Pardalos and G. Xue. Algorithms for a class of isotonic regression problems. Algorithmica, 23:211–222, 1999.
[22] S. Shalev-Shwartz and Y. Singer. Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7:1567–1599, 2006.
[23] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1):267–288, 1996.
[24] M. Yuan, V. R. Joseph, and H. Zou. Structured variable selection and estimation. Annals of Applied Statistics, 3:1738–1757, 2009.
[25] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):3468–3497, 2009.
[26] L.W. Zhong and J.T. Kwok. Efficient sparse modeling with automatic feature grouping. In International Conference on Machine Learning, 2011. 9