nips nips2011 nips2011-226 nips2011-226-reference knowledge-graph by maker-knowledge-mining

226 nips-2011-Projection onto A Nonnegative Max-Heap

Source: pdf

Author: Jun Liu, Liang Sun, Jieping Ye

Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classiﬁcation task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to ﬁnd the so-called maximal root-tree of the subtree rooted at each node. A naive approach for ﬁnding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for eﬃciently ﬁnding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the eﬀectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1

reference text

[1] E. Berg, M. Schmidt, M. P. Friedlander, and K. Murphy. Group sparsity via linear-time projection. Tech. Rep. TR-2008-09, Department of Computer Science, University of British Columbia, Vancouver, July 2008.

[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[3] N. Choi, W. Li, and J. Zhu. Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105:354–364, 2010.

[4] Z. Dost´l. Box constrained quadratic programming with proportioning and projections. SIAM a Journal on Optimization, 7(3):871–887, 1997.

[5] J. Duchi, S. Shalev-Shwartz, Y. Singer, and C. Tushar. Eﬃcient projection onto the ℓ1 -ball for learning in high dimensions. In International Conference on Machine Learning, 2008.

[6] M. Hamada and C. Wu. Analysis of designed experiments with complex aliasing. Journal of Quality Technology, 24:130–137, 1992.

[7] J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. In International Conference on Machine Learning. 2009.

[8] L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In International Conference on Machine Learning, 2009.

[9] R. Jenatton, J.-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, arXiv:0904.3523v2, 2009.

[10] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In International Conference on Machine Learning, 2010.

[11] S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In International Conference on Machine Learning, 2010.

[12] X. Li, N. Sundarsanam, and D. Frey. Regularities in data from factorial experiments. Complexity, 11:32–45, 2006.

[13] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Eﬃcient Projections. Arizona State University, 2009.

[14] J. Liu and J. Ye. Eﬃcient Euclidean projections in linear time. In International Conference on Machine Learning, 2009.

[15] J. Liu and J. Ye. Moreau-yosida regularization for grouped tree structure learning. In Advances in Neural Information Processing Systems, 2010.

[16] R. Luss, S. Rosset, and M. Shahar. Decomposing isotonic regression for eﬃciently solving large problems. In Advances in Neural Information Processing Systems, 2010.

[17] C. Micchelli, J. Morales, and M. Pontil. A family of penalty functions for structured sparsity. In Advances in Neural Information Processing Systems 23, pages 1612–1623. 2010.

[18] J. Nelder. The selection of terms in response-surface models—how strong is the weak-heredity principle? Annals of Applied Statistics, 52:315–318, 1998.

[19] A. Nemirovski. Eﬃcient methods in convex programming. Lecture Notes, 1994.

[20] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.

[21] P. M. Pardalos and G. Xue. Algorithms for a class of isotonic regression problems. Algorithmica, 23:211–222, 1999.

[22] S. Shalev-Shwartz and Y. Singer. Eﬃcient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7:1567–1599, 2006.

[23] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1):267–288, 1996.

[24] M. Yuan, V. R. Joseph, and H. Zou. Structured variable selection and estimation. Annals of Applied Statistics, 3:1738–1757, 2009.

[25] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):3468–3497, 2009.

[26] L.W. Zhong and J.T. Kwok. Eﬃcient sparse modeling with automatic feature grouping. In International Conference on Machine Learning, 2011. 9