nips nips2010 nips2010-170 knowledge-graph by maker-knowledge-mining

170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning


Source: pdf

Author: Jun Liu, Jieping Ye

Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. [sent-5, score-1.126]

2 The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. [sent-6, score-0.774]

3 Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. [sent-7, score-0.509]

4 However, the tree structured group Lasso is challenging to solve due to the complex regularization. [sent-8, score-0.511]

5 In this paper, we develop an efficient algorithm for the tree structured group Lasso. [sent-9, score-0.559]

6 One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. [sent-10, score-0.573]

7 The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. [sent-11, score-0.589]

8 , the least squares loss and the logistic loss), λ > 0 is the regularization parameter, and φ(x) is the penalty term. [sent-15, score-0.155]

9 Recently, sparse learning via 1 regularization [20] and its various extensions has received increasing attention in many areas including machine learning, signal processing, and statistics. [sent-16, score-0.13]

10 In particular, the group Lasso [1, 16, 22] utilizes the group information of the features, and yields a solution with grouped sparsity. [sent-17, score-0.426]

11 The traditional group Lasso assumes that the groups are non-overlapping. [sent-18, score-0.132]

12 [23] extended the group Lasso to the case of overlapping groups, imposing hierarchical relationships for the features. [sent-21, score-0.185]

13 [6] considered group Lasso with overlaps, and studied theoretical properties of the estimator. [sent-23, score-0.132]

14 [7] considered the consistency property of the structured overlapping group Lasso, and designed an active set algorithm. [sent-25, score-0.265]

15 In many applications, the features can naturally be represented using certain tree structures. [sent-26, score-0.273]

16 For example, the image pixels of the face image shown in Figure 1 can be represented as a tree, where each parent node contains a series of child nodes that enjoy spatial locality; genes/proteins may form certain hierarchical tree structures. [sent-27, score-0.721]

17 Kim and Xing [9] studied the tree structured group Lasso for multi-task learning, where multiple related tasks follow a tree structure. [sent-28, score-0.784]

18 One challenge in the practical application of the tree structured group Lasso is that the resulting optimization problem is much more difficult to solve than Lasso and group Lasso, due to the complex regularization. [sent-29, score-0.666]

19 1 (a) (b) (c) (d) Figure 1: Illustration of the tree structure of a two-dimensional face image. [sent-30, score-0.37]

20 The 64 × 64 image (a) can be divided into 16 sub-images in (b) according to the spatial locality, where the sub-images can be viewed as the child nodes of (a). [sent-31, score-0.18]

21 G01 G 11 G21 G12 G22 G23 G 13 G24 Figure 2: A sample index tree for illustration. [sent-33, score-0.357]

22 2 3 1 2 3 4 In this paper, we develop an efficient algorithm for the tree structured group Lasso, i. [sent-37, score-0.559]

23 , the optimization problem (1) with φ(·) being the grouped tree structure regularization (see Equation 2). [sent-39, score-0.559]

24 One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization [17, 21] associated with the grouped tree structure. [sent-40, score-0.573]

25 We have performed experimental studies using the AR and JAFFE face data sets, where the features form a hierarchical tree structure based on the spatial locality as shown in Figure 1. [sent-42, score-0.472]

26 For an index tree T of depth d, we let Ti = {Gi , Gi , . [sent-47, score-0.439]

27 The 1 nodes satisfy the following conditions: 1) the nodes from the same depth level have non-overlapping indices, i. [sent-57, score-0.318]

28 , d, j = k, 1 ≤ j, k ≤ ni ; and 2) let Gi−1 be the parent node j j0 k of a non-root node Gi , then Gi ⊆ Gi−1 . [sent-62, score-0.453]

29 We can observe that 1) the index sets from different nodes may overlap, e. [sent-64, score-0.202]

30 , any parent node overlaps with its child nodes; 2) the nodes from the same depth level do not overlap; and 3) the index set of a child node is a subset of that of its parent node. [sent-66, score-0.691]

31 The grouped tree structure regularization is defined as: d ni i wj xGi , j φ(x) = (2) i=0 j=1 i where x ∈ Rp , wj ≥ 0 (i = 0, 1, . [sent-67, score-0.928]

32 , ni ) is the pre-defined weight for the node Gi , j · is the Euclidean norm, and xGi is a vector composed of the entries of x with the indices in Gi . [sent-73, score-0.361]

33 j j 2 In the next section, we study the Moreau-Yosida regularization [17, 21] associated with (2), develop an analytical solution for such a regularization, propose an efficient algorithm for solving (1), and specify the meaningful interval for the regularization parameter λ. [sent-74, score-0.545]

34 3 Moreau-Yosida Regularization of φ(·) The Moreau-Yosida regularization associated with the grouped tree structure regularization φ(·) for a given v ∈ Rp is given by:   d ni   1 i φλ (v) = min f (x) = x−v 2+λ wj xGi , (3) j  x  2 i=0 j=1 for some λ > 0. [sent-75, score-0.984]

35 The Moreau-Yosida regularization has many useful properties: 1) φλ (·) is continuously differentiable despite the fact that φ(·) is non-smooth; 2) πλ (·) is a non-expansive operator. [sent-77, score-0.13]

36 More properties on the general Moreau-Yosida regularization can be found in [5, 10]. [sent-78, score-0.13]

37 Our recent study has shown that, the efficient optimization of the Moreau2 Yosida regularization is key to many optimization algorithms [13, Section 2]. [sent-80, score-0.196]

38 1 An Analytical Solution We show that the minimization of (3) admits an analytical solution. [sent-84, score-0.139]

39 Algorithm 1 Moreau-Yosida Regularization of the tree structured group Lasso (MYtgLasso ) Input: v ∈ Rp , the index tree T with nodes Gi (i = 0, 1, . [sent-86, score-0.986]

40 , ni ) that satisfy j i i Definition 1, the weights wj ≥ 0 (i = 0, 1, . [sent-92, score-0.295]

41 We then traverse the index tree T in the reverse breadth-first order to update u. [sent-99, score-0.357]

42 j j j ni By using Definition 1, we have j=1 |Gi | ≤ p. [sent-102, score-0.198]

43 MYtgLasso can help explain why the structured group sparsity can be induced. [sent-107, score-0.238]

44 Let us analyze the √ i tree given in Figure 2, with the solution denoted as x∗ . [sent-108, score-0.328]

45 After traversing the nodes of depth 2, we can get that the elements of x∗ with indices in G2 and G2 are zero; and when the traversal continues to the nodes of depth 1, the elements 1 3 of x∗ with indices in G1 and G1 are set to zero, but those with G2 are still nonzero. [sent-110, score-0.502]

46 j j (6) We can then express φ(x) defined in (2) as: ni d λi φi (x). [sent-117, score-0.198]

47 For any 1 ≤ i ≤ d, 1 ≤ j ≤ ni , we can find a unique path from the node Gi to the root j node G0 . [sent-120, score-0.534]

48 Let the nodes on this path be Gl l , for l = 0, 1, . [sent-121, score-0.168]

49 j r (10) Proof: According to Definition 1, we can find a unique path from the node Gi to the root node G0 . [sent-136, score-0.336]

50 , ni , we have ui i ∈ ui+1 − λi ∂φi (ui ) Gi , (11) j j G Gi j j j ∂φi (ui ) ⊆ ∂φi (u0 ). [sent-145, score-0.448]

51 For (12), it follows from (6) and (8) that, it is sufficient to verify that i i u0 i = αj ui i , for some αj ≥ 0. [sent-147, score-0.272]

52 Denote the nodes on the 1 j path as: Gl l , where l = 0, 1, . [sent-149, score-0.168]

53 We first analyze the relationship between r ui i and ui−1 . [sent-153, score-0.25]

54 If ui i−1 G Gi G ≤ λi−1 , we have ui−1 = 0, which leads to ui−1 = 0 by using ri−1 Gi Gi−1 (9). [sent-154, score-0.25]

55 Otherwise, if ui i−1 G > λi−1 , we have ui−1 = ri−1 Gi−1 j ri−1 j ri−1 j ui ri−1 ui ui−1 = Gi j i−1 Gr i−1 −λi−1 r ui i−1 Gr i−1 i−1 ri−1 i−1 Gr i−1 −λi−1 r ui i−1 Gr i−1 i−1 ui i−1 , which leads to G ri−1 ui i by using (9). [sent-155, score-1.75]

56 Therefore, we have G j ui−1 = βi ui i , for some βi ≥ 0. [sent-156, score-0.25]

57 G Gi j j 4 (14) By a similar argument, we have ul−1 = βl ul l , βl ≥ 0, ∀l = 1, 2, . [sent-157, score-0.177]

58 Gr Gl (15) ul−1 = βl ul i , βl ≥ 0, , ∀l = 1, 2, . [sent-161, score-0.177]

59 According to Definition 1, the leaf nodes are non-overlapping. [sent-169, score-0.198]

60 We assume that the union of the leaf nodes equals to {1, 2, . [sent-170, score-0.198]

61 , p}; otherwise, we can add to the index tree the additional leaf nodes with weight 0 to satisfy the aforementioned assumption. [sent-173, score-0.577]

62 Clearly, the original index tree and the new index tree with the additional leaf nodes of weight 0 yield the same penalty φ(·) in (2), the same MoreauYosida regularization in (3), and the same solution from Algorithm 1. [sent-174, score-1.122]

63 Therefore, to prove (17), it suffices to show 0 ∈ ∂f (u0 )Gi , for all the leaf nodes Gi . [sent-175, score-0.198]

64 (18) 1 It follows from Lemma 1 that, we can find a unique path from the node Gd to the root G0 . [sent-177, score-0.229]

65 Let the 1 1 nodes on this path are Gl l , for l = 0, 1, . [sent-178, score-0.168]

66 By using (10) of Lemma 1, r we can get that the nodes that contain the index set Gd are exactly on the aforementioned path. [sent-182, score-0.224]

67 Applying (11) and (12) of Lemma 2 to each node on the aformetioned path, we have ul+1 − ul l ∈ λl l ∂φl l (ul ) r r Gr Gl rl Gl r l ⊆ λl l ∂φl l (u0 ) r r l Gl r , ∀l = 0, 1, . [sent-190, score-0.34]

68 (20) l Making using of (9), we obtain from (20) the following relationship: ul+1 − ul d ∈ λl l ∂φl l (u0 ) r r G Gd 1 1 Gd 1 , ∀l = 0, 1, . [sent-194, score-0.177]

69 Similarly, we have 0 ∈ f (u0 )Gi for the other leaf nodes Gi . [sent-202, score-0.198]

70 2 The Proposed Optimization Algorithm With the analytical solution for πλ (·), the minimizer of (3), we can apply many existing methods for solving (1). [sent-205, score-0.197]

71 In addition, the scheme in (23) can be accelerated to obtain the accelerated gradient descent [2, 19], where the Moreau-Yosidea regularization also needs to be evaluated in each of its iteration. [sent-218, score-0.196]

72 The algorithm is called “tgLasso”, which stands for the tree structured group Lasso. [sent-220, score-0.531]

73 Note that, tgLasso includes our previous algorithm [11] as a special case, 0 when the index tree is of depth 1 and w1 = 0. [sent-221, score-0.459]

74 3 The Effective Interval for the Values of λ When estimating the model parameters via (1), a key issue is to choose the appropriate values for the regularization parameter λ. [sent-223, score-0.15]

75 A commonly used approach is to select the regularization parameter from a set of candidate values, whose values, however, need to be pre-specified in advance. [sent-224, score-0.13]

76 , p}, there exists at least one node Gi that contains l and meanwhile wj > 0. [sent-236, score-0.235]

77 Then, j for any 0 < − l (0) < +∞, there exists a unique λmax < +∞ satisfying: 1) if λ ≥ λmax the zero point is a solution to (1), and 2) if 0 < λ < λmax , the zero point is not a solution to (1). [sent-237, score-0.143]

78 , p}, i there exists at least one node Gi that contains l and meanwhile wj > 0. [sent-258, score-0.235]

79 Our empirical study shows that λ0 = 2 ni i 2 j=1 (wj ) l (0) d i=0 works quite well. [sent-267, score-0.198]

80 6 Algorithm 2 Finding λmax via Bisection i Input: l (0), the index tree T with nodes Gi (i = 0, 1, . [sent-270, score-0.475]

81 , ni ), λ0 , and δ = 10 Output: λmax 1: Set λ = λ0 2: if φλ (−l (0)) = 0 then 3: Set λ2 = λ, and find the largest λ1 = 2−i λ, i = 1, 2, . [sent-282, score-0.198]

82 For both data sets, we resize the image size to 64 × 64, and make use of the tree structure depicted in Figure 1. [sent-291, score-0.299]

83 We employ the least squares loss for l(·), and set the regularization parameter λ = r × λmax , where λmax is computed using Algorithm 2, and r = {5 × 10−1 , 2 × 10−1 , 1 × 10−1 , 5 × 10−2 , 2 × 10−2 , 1 × 10−2 , 5 × 10−3 , 2 × 10−3 }. [sent-294, score-0.13]

84 The total time for all eight regularization parameters is reported. [sent-297, score-0.15]

85 tgLasso alternating algorithm [9] JAFFE 30 4054 AR 73 5155 Efficiency of the Proposed tgLasso We compare our proposed tgLasso with the recently proposed alternating algorithm [9] designed for the tree-guided group Lasso. [sent-298, score-0.244]

86 We report the total computational time (seconds) for running one binary classification task (averaged over 7 and 4 tasks for JAFFE and AR, respectively) corresponding to the eight regularization parameters in Table 1. [sent-299, score-0.15]

87 On AR, we use 50 subjects for training, and the remaining 50 subjects for testing; and on JAFFE, we use 8 subjects for training, and the remaining 2 subjects for testing. [sent-303, score-0.248]

88 5 19 36 5e−1 2e−1 1e−1 5e−2 2e−2 1e−2 5e−3 2e−3 regularization parameter r 18 5e−1 2e−1 1e−1 5e−2 2e−2 1e−2 5e−3 2e−3 regularization parameter r Figure 3: Classification performance comparison between Lasso and the tree structured group Lasso. [sent-314, score-0.771]

89 The horizontal axis corresponds to different regularization parameters λ = r × λmax . [sent-315, score-0.13]

90 Neutral Smile Anger Sceam Figure 4: Markers obtained by Lasso, and tree structured group Lasso (white pixels correspond to the markers). [sent-316, score-0.534]

91 First row: face images of four expression from the AR data set; Second row: the markers identified by tree structured group Lasso; Third row: the markers identified by Lasso. [sent-317, score-0.716]

92 This verifies the effectiveness of tgLasso in incorporating the tree structure in the formulation, i. [sent-320, score-0.334]

93 Figure 4 shows the markers identified by tgLasso and Lasso under the best regularization parameter. [sent-323, score-0.197]

94 5 Conclusion In this paper, we consider the efficient optimization for the tree structured group Lasso. [sent-325, score-0.534]

95 Our main technical result show the Moreau-Yosida regularization associated with the tree structured group Lasso admits an analytical solution. [sent-326, score-0.824]

96 Based on the Moreau-Yosida regularization, we an design efficient algorithm for solving the grouped tree structure regularized optimization problem for smooth convex loss functions, and develop an efficient algorithm for determining the effective interval for the parameter λ. [sent-327, score-0.617]

97 We plan to apply the proposed algorithm to other applications in computer vision and bioinformatics involving the tree structure. [sent-329, score-0.293]

98 Tree-guided group lasso for multi-task regression with structured sparsity. [sent-387, score-0.466]

99 An efficient algorithm for a class of fused lasso problems. [sent-410, score-0.248]

100 The composite absolute penalties family for grouped and hierarchical variable selection. [sent-461, score-0.165]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gi', 0.411), ('tglasso', 0.337), ('tree', 0.273), ('ui', 0.25), ('jaffe', 0.247), ('lasso', 0.228), ('ni', 0.198), ('mytglasso', 0.18), ('xgi', 0.18), ('ul', 0.177), ('ar', 0.149), ('group', 0.132), ('regularization', 0.13), ('gd', 0.129), ('nodes', 0.118), ('gl', 0.114), ('grouped', 0.107), ('node', 0.107), ('structured', 0.106), ('wj', 0.097), ('analytical', 0.085), ('index', 0.084), ('gr', 0.083), ('depth', 0.082), ('leaf', 0.08), ('ri', 0.072), ('face', 0.071), ('ygi', 0.067), ('markers', 0.067), ('subjects', 0.062), ('anger', 0.059), ('facial', 0.059), ('minimizer', 0.057), ('rp', 0.056), ('rl', 0.056), ('locality', 0.055), ('jenatton', 0.055), ('solution', 0.055), ('arizona', 0.054), ('admits', 0.054), ('interval', 0.053), ('path', 0.05), ('lemma', 0.045), ('slep', 0.045), ('smile', 0.045), ('ugi', 0.045), ('max', 0.044), ('neutral', 0.044), ('balanced', 0.044), ('child', 0.041), ('parent', 0.041), ('root', 0.039), ('alternating', 0.036), ('lemar', 0.036), ('traversing', 0.036), ('bisection', 0.036), ('effectiveness', 0.035), ('ud', 0.034), ('accelerated', 0.033), ('ciency', 0.033), ('unique', 0.033), ('indices', 0.033), ('composite', 0.032), ('penalized', 0.032), ('meanwhile', 0.031), ('overlaps', 0.029), ('liu', 0.029), ('develop', 0.028), ('overlap', 0.027), ('continuation', 0.027), ('overlapping', 0.027), ('structure', 0.026), ('hierarchical', 0.026), ('verlag', 0.025), ('closed', 0.025), ('penalty', 0.025), ('ef', 0.025), ('jacob', 0.023), ('optimization', 0.023), ('associated', 0.023), ('pixels', 0.023), ('entries', 0.023), ('effective', 0.023), ('zhao', 0.022), ('determining', 0.022), ('convex', 0.022), ('averaged', 0.022), ('aforementioned', 0.022), ('seven', 0.022), ('verify', 0.022), ('technical', 0.021), ('spatial', 0.021), ('proof', 0.021), ('berlin', 0.021), ('specify', 0.021), ('obozinski', 0.02), ('algorithm', 0.02), ('eight', 0.02), ('yuan', 0.02), ('key', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

Author: Jun Liu, Jieping Ye

Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

2 0.20416433 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

Author: Samy Bengio, Jason Weston, David Grangier

Abstract: Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including One-vs-Rest while being orders of magnitude faster. 1

3 0.17929628 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

Author: Hongbo Zhou, Qiang Cheng

Abstract: Regularization technique has become a principled tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases. Within this minimax framework, we further give mathematically exact definition for a novel representation called sparse grouping representation (SGR), and prove a set of sufficient conditions for generating such group level sparsity. Under these sufficient conditions, a large set of consistent regularization terms can be designed. This SGR is essentially different from group lasso in the way of using class or group information, and it outperforms group lasso when there appears group label noise. We also provide some generalization bounds in a classification setting. 1

4 0.17365783 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

Author: Sofia Mosci, Silvia Villa, Alessandro Verri, Lorenzo Rosasco

Abstract: We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in [12], where the group lasso penalty is generalized to overlapping groups of variables. While in [12] the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations. 1

5 0.15942694 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1

6 0.14563917 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

7 0.12917629 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

8 0.12280247 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

9 0.11904529 181 nips-2010-Network Flow Algorithms for Structured Sparsity

10 0.11869314 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

11 0.1168851 144 nips-2010-Learning Efficient Markov Networks

12 0.10411944 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection

13 0.10308354 20 nips-2010-A unified model of short-range and long-range motion perception

14 0.099720292 265 nips-2010-The LASSO risk: asymptotic results and real world examples

15 0.096250899 5 nips-2010-A Dirty Model for Multi-task Learning

16 0.089708276 223 nips-2010-Rates of convergence for the cluster tree

17 0.087846503 91 nips-2010-Fast detection of multiple change-points shared by many signals using group LARS

18 0.08400742 148 nips-2010-Learning Networks of Stochastic Differential Equations

19 0.081541114 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

20 0.076554999 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.204), (1, 0.068), (2, 0.084), (3, 0.065), (4, -0.018), (5, -0.181), (6, -0.023), (7, 0.111), (8, -0.012), (9, -0.004), (10, -0.003), (11, 0.147), (12, -0.23), (13, 0.157), (14, 0.061), (15, -0.132), (16, -0.072), (17, 0.017), (18, -0.106), (19, -0.178), (20, -0.062), (21, -0.067), (22, 0.105), (23, -0.09), (24, -0.075), (25, 0.078), (26, 0.024), (27, 0.009), (28, -0.033), (29, -0.109), (30, -0.001), (31, 0.005), (32, 0.138), (33, -0.032), (34, 0.069), (35, -0.012), (36, 0.045), (37, -0.027), (38, -0.047), (39, -0.033), (40, 0.054), (41, -0.007), (42, 0.031), (43, -0.001), (44, -0.073), (45, 0.021), (46, -0.04), (47, -0.081), (48, 0.021), (49, -0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96983296 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

Author: Jun Liu, Jieping Ye

Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

2 0.65961033 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

Author: Hongbo Zhou, Qiang Cheng

Abstract: Regularization technique has become a principled tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases. Within this minimax framework, we further give mathematically exact definition for a novel representation called sparse grouping representation (SGR), and prove a set of sufficient conditions for generating such group level sparsity. Under these sufficient conditions, a large set of consistent regularization terms can be designed. This SGR is essentially different from group lasso in the way of using class or group information, and it outperforms group lasso when there appears group label noise. We also provide some generalization bounds in a classification setting. 1

3 0.65891731 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

Author: Sofia Mosci, Silvia Villa, Alessandro Verri, Lorenzo Rosasco

Abstract: We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in [12], where the group lasso penalty is generalized to overlapping groups of variables. While in [12] the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations. 1

4 0.60654813 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection

Author: Seunghak Lee, Jun Zhu, Eric P. Xing

Abstract: To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.

5 0.57391864 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

Author: Jean Morales, Charles A. Micchelli, Massimiliano Pontil

Abstract: We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. We present a family of convex penalty functions, which encode this prior knowledge by means of a set of constraints on the absolute values of the regression coefficients. This family subsumes the ℓ1 norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish some important properties of these functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso and other related methods.

6 0.55894631 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems

7 0.55317086 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

8 0.54798681 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

9 0.53555447 144 nips-2010-Learning Efficient Markov Networks

10 0.50899744 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

11 0.49410483 181 nips-2010-Network Flow Algorithms for Structured Sparsity

12 0.48477593 5 nips-2010-A Dirty Model for Multi-task Learning

13 0.4536576 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

14 0.44391358 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation

15 0.44061238 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

16 0.42147303 63 nips-2010-Distributed Dual Averaging In Networks

17 0.41175258 45 nips-2010-CUR from a Sparse Optimization Viewpoint

18 0.40450028 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

19 0.40136528 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

20 0.39821151 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.033), (27, 0.041), (30, 0.059), (35, 0.448), (45, 0.172), (50, 0.039), (52, 0.021), (60, 0.032), (77, 0.035), (90, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84094357 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

Author: Jun Liu, Jieping Ye

Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

2 0.75260109 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas

Author: Georg Langs, Yanmei Tie, Laura Rigolo, Alexandra Golby, Polina Golland

Abstract: Matching functional brain regions across individuals is a challenging task, largely due to the variability in their location and extent. It is particularly difficult, but highly relevant, for patients with pathologies such as brain tumors, which can cause substantial reorganization of functional systems. In such cases spatial registration based on anatomical data is only of limited value if the goal is to establish correspondences of functional areas among different individuals, or to localize potentially displaced active regions. Rather than rely on spatial alignment, we propose to perform registration in an alternative space whose geometry is governed by the functional interaction patterns in the brain. We first embed each brain into a functional map that reflects connectivity patterns during a fMRI experiment. The resulting functional maps are then registered, and the obtained correspondences are propagated back to the two brains. In application to a language fMRI experiment, our preliminary results suggest that the proposed method yields improved functional correspondences across subjects. This advantage is pronounced for subjects with tumors that affect the language areas and thus cause spatial reorganization of the functional regions. 1

3 0.74571878 59 nips-2010-Deep Coding Network

Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu

Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.

4 0.73046356 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding

Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1

5 0.63157123 149 nips-2010-Learning To Count Objects in Images

Author: Victor Lempitsky, Andrew Zisserman

Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1

6 0.59499043 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

7 0.56158614 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

8 0.54938465 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection

9 0.52882469 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding

10 0.51648736 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

11 0.51605827 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

12 0.51466173 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

13 0.51317304 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

14 0.5130254 181 nips-2010-Network Flow Algorithms for Structured Sparsity

15 0.51221794 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles

16 0.50785905 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

17 0.50583696 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

18 0.50115687 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

19 0.49892002 5 nips-2010-A Dirty Model for Multi-task Learning

20 0.49818653 258 nips-2010-Structured sparsity-inducing norms through submodular functions