nips nips2011 nips2011-226 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). [sent-7, score-0.79]
2 Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. [sent-9, score-0.571]
3 In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. [sent-10, score-0.536]
4 A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. [sent-11, score-0.234]
5 We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. [sent-12, score-0.497]
6 We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. [sent-14, score-0.4]
7 Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. [sent-15, score-0.5]
8 In this paper, we consider an ordered tree structure: a given feature is selected for the given regression/classification task only if its parent node is selected. [sent-18, score-0.571]
9 To incorporate such ordered tree structure, we assume that the model parameter x ∈ Rp follows the non-negative max-heap structure1 : P = {x ≥ 0, xi ≥ xj ∀(xi , xj ) ∈ E t }, (1) where T t = (V t , E t ) is a target tree with V t = {x1 , x2 , . [sent-19, score-0.802]
10 The constraint set P implies that if xi is the parent node of a child node xj then the value of xi is no less than the value of xj . [sent-23, score-0.544]
11 In other words, if a parent node xi is 0, then any of its child nodes xj is also 0. [sent-24, score-0.41]
12 Figure 1 illustrates three special tree structures: 1) a full binary tree, 2) a sequential list, and 3) a tree with depth 1. [sent-25, score-0.76]
13 Plots (a), (b), and (c) correspond to a full binary tree, a sequential list, and a tree with depth 1, respectively. [sent-28, score-0.473]
14 The ordered tree structure is a special case of the non-negative garrote discussed in [24] when the hierarchical relationship is depicted by a tree. [sent-32, score-0.434]
15 Therefore, the asymptotic properties established in [24] are applicable to the ordered tree structrue. [sent-33, score-0.374]
16 Several related approaches that can incorporate the ordered tree structure include the Wedge approach [17] and the hierarchical group Lasso [25]. [sent-34, score-0.458]
17 The Wedge approach incorporates such ordering information x2 p 1 i by designing a penalty for the model parameter x as Ω(x|P ) = inf t∈P 2 i=1 ( ti + ti ), with tree being a sequential list. [sent-35, score-0.701]
18 By imposing the mixed ℓ1 -ℓ2 norm on each group formed by the nodes in the subtree of a parent node, the hierarchical group Lasso is able to incorporate such ordering information. [sent-36, score-0.456]
19 The hierarchical group Lasso has been applied for multi-task learning in [11] with a tree structure, and the efficient computation was discussed in [10, 15]. [sent-37, score-0.371]
20 Compared to Wedge and hierarchical group Lasso, the max-heap in (1) incorporates such ordering information in a direct way, and our simulation results show that the max-heap can achieve lower reconstruction error than both approaches. [sent-38, score-0.178]
21 In estimating the model parameter satisfying the ordered tree structure, one needs to solve the following constrained optimization problem: min f (x) (2) x∈P for some convex function f (·). [sent-39, score-0.374]
22 , hyperplane, halfspace, and rectangle), the Euclidean projection admits a simple analytical solution, see [2]. [sent-44, score-0.124]
23 In Section 2, we show that the Euclidean projection admits an analytical solution, and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. [sent-49, score-0.536]
24 In Section 3, we design a bottom-up algorithm with merge for efficiently finding the maximal root-tree by using its properties. [sent-50, score-0.285]
25 With the target tree T t = (V t , E t ), we construct the input tree T = (V, E) with the input vector v, where V = {v1 , v2 , . [sent-53, score-0.62]
26 For a non-empty tree T = (V, E), we define its root-tree as any non-empty ˜ ˜ ˜ ˜ ˜ ˜ tree T = (V , E) that satisfies: 1) V ⊆ V , 2) E ⊆ E, and 3) T shares the same root as T . [sent-61, score-0.638]
27 For a non-empty tree T = (V, E), we define R(T ) as the root-tree set containing all its root-trees. [sent-63, score-0.287]
28 For a non-empty tree T = (V, E), we define vi ∈V m(T ) = max vi |V | ,0 , (4) which equals the mean of all the nodes in T if such mean is non-negative, and 0 otherwise. [sent-65, score-1.012]
29 For a non-empty tree T = (V, E), we define its maximal root-tree as: Mmax (T ) = arg where max ˜ ˜ ˜ ˜ ˜ T =(V ,E):T ∈R(T ),m(T )=mmax (T ) ˜ mmax (T ) = max m(T ) ˜ T ∈R(T ) ˜ |V |, (5) (6) is the maximal value of all the root-trees of the tree T . [sent-67, score-1.127]
30 Note that if two root-trees share the same maximal value, (5) selects the one with the largest tree size. [sent-68, score-0.499]
31 ˜ ˜ ˜ ˜ ˜ When T = (V , E) is a part of a “larger” tree T = (V, E), i. [sent-69, score-0.287]
32 , V ⊆ V and E ⊆ E, we ˜ as a “super-node” of the tree T with value m(T ). [sent-71, score-0.287]
33 For a non-empty tree T = (V, E), we define its super-tree as S = (VS , ES ), which satisfies: 1) each node in VS = {T1 , T2 , . [sent-73, score-0.39]
34 , Tn } is a non-empty tree with Ti = (Vi , Ei ), n 2) Vi ⊆ V and Ei ⊆ E, 3) Vi Vj = ∅, i = j and V = i=1 Vi , and 4) (Ti , Tj ) ∈ ES if and only if there exists a node in Tj whose parent node is in Ti . [sent-76, score-0.587]
35 The key idea of the proposed algorithm is that, in the i-th call, we find Ti = Mmax (T ), the maximal root-tree of T , set x corresponding to the nodes of Ti to mi = mmax (T ) = m(Ti ), remove Ti from the tree T , ˜ and apply Atda to the resulting trees one by one recursively. [sent-79, score-0.809]
36 , ri apply Atda(T 6: else 7: Set xj = mi , ∀vj ∈ Vi ˜ 8: end if 2. [sent-86, score-0.164]
37 2 Illustration & Justification For a better illustration and justification of the proposed algorithm, we provide the analysis of Atda for a special case—the sequential list—in the supplementary file A. [sent-87, score-0.127]
38 Figure 2 illustrates solving (3) via Algorithm 1 for a tree with depth 3. [sent-90, score-0.351]
39 Plot (a) shows a target tree T t , and plot (b) denotes the input tree T . [sent-91, score-0.663]
40 Plot (a) shows the target tree T t , and plots (b-e) illustrate Atda. [sent-93, score-0.318]
41 Plot (c) depicts ˜ the resulting trees after removing the maximal root-tree in plot (b), and plot (d) shows the generated maximal root-trees (enclosed by dashed frame) by the algorithm. [sent-97, score-0.652]
42 When treating each generated maximal root-tree as a super-node with the value defined in Definition 3, plot (d) is a super-tree of the input tree T . [sent-98, score-0.588]
43 , the value of the parent node is no less than the values of its child nodes. [sent-101, score-0.261]
44 The edges of plot (f) correspond to the values of the dual variables, from ˜ which we can also obtain the optimal solution x ∈ R15 . [sent-103, score-0.148]
45 ˜ We verify the correctness of Algorithm 1 for the general tree in the following theorem. [sent-105, score-0.321]
46 , k i=1 constitute a disjoint partition of the input tree T . [sent-114, score-0.31]
47 With the sequences {Ti }k and {mi }k , i=1 i=1 we can construct a super-tree of the input tree T as follows: 1) we treat Ti as a super-node with value mi , and 2) we put an edge between Ti and Tj if there is an edge between the nodes of Ti and Tj in the input tree T . [sent-115, score-0.779]
48 With Algorithm 1, we can verify that the resulting super-tree has the property that the value of the parent node is no less than its child nodes. [sent-116, score-0.295]
49 ˜ ˜ Let xl and vl denote a subset of x and v corresponding to the indices appearing in the subtree Tl , respectively. [sent-118, score-0.595]
50 Denote P l = {xl : xl ≥ 0, xi ≥ xj , (vi , vj ) ∈ El }, I1 = {l : ml > 0}, I2 = {l : ml = 0}. [sent-119, score-0.614]
51 Our proof is based on the following inequality: min x∈P 1 x−v 2 2 2 ≥ min l∈I1 xl ∈P l 1 l x − vl 2 2 2 + min l∈I2 xl ∈P l 1 l x − vl 2 2 2, (7) which holds as the left hand side has the additional inequality constraints compared to the right hand side. [sent-120, score-1.005]
52 , xl = arg min ˜ 1 l x − vl 2 2 2 , ∀l ∈ I1 , (8) xl = arg min ˜ 1 l x − vl 2 2 2 , ∀l ∈ I2 , (9) xl ∈P l xl ∈P l 4 1 which, together with the fact 1 x − v 2 ≥ minx∈P 2 x − v 2 , x ∈ P lead to our main 2 2 ˜ 2 ˜ argument. [sent-123, score-1.576]
53 Firstly, ∀l ∈ I1 , we introduce the dual variable yij for the edge (vi , vj ) ∈ El , and yii if vi ∈ Ll , where Ll contains all the leaf nodes of the tree Tl . [sent-125, score-1.225]
54 For all vi ∈ Vl , vi = vrl , we denote its parent node by vji , and for the root vrl , we denote jrl = rl . [sent-127, score-1.044]
55 We let l Ci = {j|vj is a child node of vi in the tree Tl }. [sent-128, score-0.787]
56 l Ri = {j|vj is in the subtree of Tl rooted at vi }. [sent-129, score-0.498]
57 ˜ y j i i = v i − ml + ˜ (17) l j∈Ci According to Algorithm 1, xi = ml > 0, ∀vi ∈ Vl , l ∈ I1 . [sent-131, score-0.143]
58 According to (16) and (17), we have l vj − |Ri |ml , ∀vi ∈ Vl , y ji i = ˜ (18) l j∈Ri l l where |Ri | denotes the number of elements in Ri , the subtree of Tl rooted at vi . [sent-134, score-0.621]
59 From l the nature of the maximal root-tree Tl , l ∈ I1 , we have j∈Rl vj ≥ |Ri |ml . [sent-135, score-0.335]
60 Otherwise, if i l ¯ l j∈Ri vj < |Ri |ml , we can construct from Tl a new root-tree Tl by removing the subtree ¯ of Tl rooted at vi , so that Tl achieves a larger value than Tl . [sent-136, score-0.654]
61 This contradicts with the argument that Tl , l ∈ I1 is the maximal root-tree of the working tree T . [sent-137, score-0.499]
62 Secondly, we prove (9) by verifying the following optimality condition: xl − xl , xl − vl ≥ 0, ∀xl ∈ P l , l ∈ I2 , ˜ ˜ (19) l which is the so-called variational inequality condition for x being the optimal solution to (9). [sent-139, score-1.108]
63 Thus, (19) is equivalent to ˜ xl , vl ≤ 0, ∀xl ∈ P l , l ∈ I2 . [sent-141, score-0.491]
64 (20) For a given xl ∈ P l , if xi = 0, ∀vi ∈ V l , (20) naturally holds. [sent-142, score-0.336]
65 Denote by xl the minimal nonzero element in xl , and Tl1 = (Vl1 , El1 ) a tree constructed by ¯1 removing the nodes corresponding to the indices in the set {i : xl = 0, vi ∈ Vl } from Tl . [sent-144, score-1.603]
66 It follows from Algorithm 1 that i:vi ∈V 1 vi ≤ 0. [sent-146, score-0.333]
67 l Thus, we have x l , v l = xl ¯1 (xi − xl )vi ≤ ¯1 vi + i:vi ∈Vl1 i:vi ∈Vl1 (xi − xl )vi . [sent-147, score-1.224]
68 ¯1 i:vi ∈Vl1 5 ¯r If xl = xl , ∀vi ∈ Vl1 , we arrive at (20). [sent-148, score-0.594]
69 Otherwise, we set r = 2; denote by xl the minimal ¯1 i r−1 l r−1 r r nonzero element in the set {xi − j=1 xj : vi ∈ Vl }, and Tl = (Vl , Elr ) a subtree of ¯ r−1 Tlr−1 by removing those nodes with the indices in the set {i : xl − j=1 xl = 0, vi ∈ Vlr−1 }. [sent-149, score-1.804]
70 ¯j i r−1 r and Tl as well, so that it follows from It is clear that Tl shares the same root as Tl Algorithm 1 that i:vi ∈V r vi ≤ 0. [sent-150, score-0.397]
71 Therefore, we have l r−1 i:vi ∈Vlr−1 r xl )vi = xl ¯j ¯r (xi − j=1 i:vi ∈Vlr Repeating the above process until Vlr i:vi ∈Vlr r xl )vi ≤ ¯j (xi − vi + j=1 xl )vi . [sent-151, score-1.521]
72 For a better understanding of the proof, we make use of the edges of Figure 2 (f) to show the dual variables, where the edge connecting vi and vj corresponds to the dual variable yij , ˜ and the edge starting from the leaf node vi corresponds to the dual variable yii . [sent-153, score-1.525]
73 We note that, for the maximal root-tree with a ˜ positive value, the associated dual variables are unique, but for the maximal root-tree with zero value, the associated dual variables may not be unique. [sent-155, score-0.588]
74 For example, in Figure 2 (f), we set yii = 1 for i = 12, yii = 0 for i = 13, yij = 2 for i = 6, j = 12, and yij = 2 for ˜ ˜ ˜ ˜ i = 6, j = 13. [sent-156, score-0.542]
75 It is easy to check that the dual variables can also be set as follows: yii = 0 ˜ for i = 12, yii = 1 for i = 13, yij = 1 for i = 6, j = 12, and yij = 3 for i = 6, j = 13. [sent-157, score-0.624]
76 ˜ ˜ ˜ 3 Finding the Maximal Root-Tree A key operation of Algorithm 1 is to find the maximal root-tree used in Step 2. [sent-158, score-0.247]
77 A naive approach for finding the maximal root-tree of a tree T is to enumerate all possible roottrees in the root-tree set R(T ), and identify the maximal root-tree via (5). [sent-159, score-0.733]
78 The underlying idea is to make use of the special structure of the maximal root-tree defined in (5) for avoiding the enumeration of all possible root-trees. [sent-164, score-0.212]
79 We begin the discussion with some key properties of the maximal root-tree, and the proof is given in the supplementary file A. [sent-165, score-0.212]
80 For a non-empty tree T = (V, E), denote its maximal root-tree as Tmax = ˜ ˜ ˜ (Vmax , Emax ). [sent-168, score-0.499]
81 , vin , which satisfy: 1) vij ∈ V , 2) vij ∈ V , and 3) the parent node of vij is in ˜ . [sent-173, score-0.591]
82 If n ≥ 1, we denote the subtree of T rooted at vi as T j = (V j , E j ), j = 1, 2, . [sent-174, score-0.498]
83 , n, V j j j j j Tmax = (Vmax , Emax ) as the maximal root-trees of T j , and m = maxj=1,2,. [sent-177, score-0.212]
84 Let T = (V, E) be a non-empty tree, and T1 = (V 1 , E 1 ) and T2 = (V 2 , E 2 ) be two trees that satisfy: 1) they are composed of a subset of the nodes and edges of T , i. [sent-183, score-0.098]
85 , V 1 V 2 = ∅, and E 1 E 2 = ∅; and 3) in the tree T , vi2 , the root node of T2 is a child of vi1 , a leaf node ˜ ˜ ˜ ˜ of T1 . [sent-187, score-0.618]
86 Next, we make use of Lemma 1 to efficiently compute the maximal root-tree, and present the pseudo code for Abuam in Algorithm 2. [sent-189, score-0.236]
87 , vin the n nodes that satisfy: 1) vij ∈ V0 , 2) vij ∈ V , ˜ and 3) the parent node of vij is in V0 , and denote by T j = (V j , E j ), j = 1, 2, . [sent-196, score-0.65]
88 Tmax returned by Algorithm 2 is the maximal root-tree of the input tree T . [sent-202, score-0.522]
89 Lasso makes no use of such ordering, while Wedge incorporates the structure by using an auxiliary ordered variable. [sent-213, score-0.113]
90 For Group Lasso and Max-Heap, we try binary-tree grouping and list-tree grouping, where the associated trees are a full binary tree and a sequential list, respectively. [sent-214, score-0.502]
91 The regression vector is put on the tree so that, the closer the node to the root, the larger the element is placed. [sent-215, score-0.416]
92 In Group Lasso, the nodes appearing in the same subtree form a group. [sent-216, score-0.163]
93 MH-binary performs better than GL-binary, and MH-list performs better than Wedge and GL-list, due to the direct usage of such ordering information. [sent-220, score-0.097]
94 In addition, the list-tree grouping performs better than the binary-tree grouping, as it makes better usage of such ordering information. [sent-221, score-0.151]
95 In plots (a) and (b), we show the average model error x − x∗ 2 over 50 independet runs by different approaches with the full binary-tree ordering and the list-tree ordering. [sent-229, score-0.155]
96 Efficiency of the Proposed Projection We test the efficiency of the proposed Atda approach for solving the Euclidean projection onto the non-negative max-heap, equipped with our proposed Abuam approach for finding the maximal root-trees. [sent-232, score-0.362]
97 In Figure 3 (e) & (f), we report the computational time of Atda over 100 runs when the ordered tree structure is a full binary tree. [sent-239, score-0.459]
98 Empirical results show that: 1) the proposed approach deals with the ordering information better than existing approaches, and 2) the proposed algorithm has an expected linear time complexity for the sequential list, the full binary tree, and the tree of depth 1. [sent-244, score-0.595]
99 We plan to apply the proposed algorithms to real-world applications with an ordered tree structure. [sent-246, score-0.401]
100 Tree-guided group lasso for multi-task regression with structured sparsity. [sent-318, score-0.182]
wordName wordTfidf (topN-words)
[('vi', 0.333), ('tmax', 0.323), ('xl', 0.297), ('tree', 0.287), ('tl', 0.265), ('atda', 0.257), ('maximal', 0.212), ('vl', 0.194), ('yii', 0.166), ('wedge', 0.147), ('emax', 0.134), ('mmax', 0.129), ('ti', 0.127), ('vj', 0.123), ('vij', 0.119), ('abuam', 0.11), ('vlr', 0.11), ('yij', 0.105), ('subtree', 0.104), ('node', 0.103), ('parent', 0.094), ('vmax', 0.088), ('list', 0.088), ('ordered', 0.087), ('lasso', 0.084), ('dual', 0.082), ('merge', 0.073), ('ordering', 0.068), ('sequential', 0.066), ('plot', 0.066), ('depth', 0.064), ('child', 0.064), ('rooted', 0.061), ('projection', 0.059), ('nodes', 0.059), ('ri', 0.057), ('mi', 0.056), ('ectiveness', 0.055), ('vrl', 0.055), ('erent', 0.055), ('euclidean', 0.054), ('grouping', 0.054), ('ml', 0.052), ('xj', 0.051), ('heredity', 0.048), ('arizona', 0.047), ('group', 0.047), ('el', 0.045), ('isotonic', 0.045), ('tj', 0.044), ('tempe', 0.042), ('kkt', 0.042), ('admits', 0.042), ('trees', 0.039), ('xi', 0.039), ('hierarchical', 0.037), ('ll', 0.037), ('anae', 0.037), ('vin', 0.037), ('yjrl', 0.037), ('root', 0.037), ('onto', 0.037), ('operation', 0.035), ('az', 0.035), ('illustration', 0.034), ('rl', 0.034), ('verify', 0.034), ('di', 0.034), ('frame', 0.033), ('removing', 0.033), ('slep', 0.032), ('plots', 0.031), ('le', 0.029), ('binary', 0.029), ('usage', 0.029), ('runs', 0.029), ('presenting', 0.028), ('enclosed', 0.028), ('shares', 0.027), ('full', 0.027), ('proposed', 0.027), ('regression', 0.026), ('incorporates', 0.026), ('structured', 0.025), ('liu', 0.025), ('dashed', 0.024), ('pseudo', 0.024), ('factorial', 0.024), ('mh', 0.024), ('variable', 0.024), ('leaf', 0.024), ('analytical', 0.023), ('ei', 0.023), ('depicted', 0.023), ('gl', 0.023), ('input', 0.023), ('inequality', 0.023), ('edge', 0.022), ('enumerate', 0.022), ('jenatton', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 226 nips-2011-Projection onto A Nonnegative Max-Heap
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
2 0.17221884 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
3 0.14614394 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg
Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the first algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the first rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1
4 0.13386175 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
5 0.11891804 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
Author: Iasonas Kokkinos
Abstract: In this work we use Branch-and-Bound (BB) to efficiently detect objects with deformable part models. Instead of evaluating the classifier score exhaustively over image locations and scales, we use BB to focus on promising image locations. The core problem is to compute bounds that accommodate part deformations; for this we adapt the Dual Trees data structure [7] to our problem. We evaluate our approach using Mixture-of-Deformable Part Models [4]. We obtain exactly the same results but are 10-20 times faster on average. We also develop a multiple-object detection variation of the system, where hypotheses for 20 categories are inserted in a common priority queue. For the problem of finding the strongest category in an image this results in a 100-fold speedup.
6 0.11821204 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
7 0.11626693 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
8 0.10407554 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
9 0.1000048 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
10 0.095524788 177 nips-2011-Multi-armed bandits on implicit metric spaces
11 0.090148069 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
12 0.089290231 78 nips-2011-Efficient Methods for Overlapping Group Lasso
13 0.087810077 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
14 0.08412724 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
15 0.078869291 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
16 0.074602485 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
17 0.072792791 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time
18 0.072404958 157 nips-2011-Learning to Search Efficiently in High Dimensions
19 0.069711141 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
20 0.068135008 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
topicId topicWeight
[(0, 0.177), (1, 0.012), (2, -0.095), (3, -0.032), (4, -0.023), (5, -0.066), (6, -0.104), (7, -0.061), (8, 0.003), (9, -0.191), (10, -0.019), (11, 0.014), (12, -0.207), (13, 0.14), (14, -0.007), (15, 0.059), (16, -0.013), (17, -0.024), (18, 0.015), (19, -0.074), (20, 0.079), (21, -0.116), (22, -0.016), (23, -0.021), (24, -0.201), (25, -0.003), (26, 0.019), (27, 0.032), (28, -0.062), (29, -0.148), (30, -0.009), (31, 0.083), (32, -0.045), (33, -0.13), (34, -0.049), (35, 0.082), (36, -0.032), (37, -0.067), (38, 0.04), (39, 0.065), (40, -0.081), (41, -0.084), (42, 0.008), (43, 0.004), (44, 0.054), (45, -0.043), (46, 0.048), (47, -0.068), (48, -0.009), (49, -0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.96976924 226 nips-2011-Projection onto A Nonnegative Max-Heap
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
2 0.76960713 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg
Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the first algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the first rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1
3 0.64473307 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
Author: Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang
Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics. 1
4 0.64357197 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1
5 0.63914603 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
Author: Rémi Munos
Abstract: We consider a global optimization problem of a deterministic function f in a semimetric space, given a finite budget of n evaluations. The function f is assumed to be locally smooth (around one of its global maxima) with respect to a semi-metric ℓ. We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A first contribution is an algorithm, DOO, that requires the knowledge of ℓ. We report a finite-sample performance bound in terms of a measure of the quantity of near-optimal states. We then define a second algorithm, SOO, which does not require the knowledge of the semimetric ℓ under which f is smooth, and whose performance is almost as good as DOO optimally-fitted.
6 0.61059368 177 nips-2011-Multi-armed bandits on implicit metric spaces
7 0.57777023 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
8 0.52824062 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
9 0.52539951 274 nips-2011-Structure Learning for Optimization
10 0.50768602 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
11 0.50273001 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
12 0.46596411 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
13 0.46078879 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression
14 0.44275942 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
15 0.42533106 157 nips-2011-Learning to Search Efficiently in High Dimensions
16 0.38326168 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
17 0.38118869 256 nips-2011-Solving Decision Problems with Limited Information
18 0.37322554 78 nips-2011-Efficient Methods for Overlapping Group Lasso
19 0.3639282 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
20 0.36147538 229 nips-2011-Query-Aware MCMC
topicId topicWeight
[(0, 0.038), (4, 0.054), (20, 0.023), (26, 0.061), (28, 0.258), (31, 0.035), (33, 0.012), (43, 0.061), (45, 0.127), (57, 0.104), (74, 0.05), (83, 0.035), (99, 0.048)]
simIndex simValue paperId paperTitle
1 0.79270375 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
same-paper 2 0.78340429 226 nips-2011-Projection onto A Nonnegative Max-Heap
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
3 0.76355243 95 nips-2011-Fast and Accurate k-means For Large Datasets
Author: Michael Shindler, Alex Wong, Adam W. Meyerson
Abstract: Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant improvements to make it practical. Our approach greatly simplifies a recently developed algorithm, both in design and in analysis, and eliminates large constant factors in the approximation guarantee, the memory requirements, and the running time. We then incorporate approximate nearest neighbor search to compute k-means in o( nk) (where n is the number of data points; note that computing the cost, given a solution, takes 8(nk) time). We show that our algorithm compares favorably to existing algorithms - both theoretically and experimentally, thus providing state-of-the-art performance in both theory and practice.
4 0.70154727 156 nips-2011-Learning to Learn with Compound HD Models
Author: Antonio Torralba, Joshua B. Tenenbaum, Ruslan Salakhutdinov
Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets. 1
5 0.65392846 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing
Author: Shai Shalev-shwartz, Yonatan Wexler, Amnon Shashua
Abstract: Multiclass prediction is the problem of classifying an object into a relevant target class. We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sublinearly with the number of possible classes. This implies that features should be shared by several classes. We describe and analyze the ShareBoost algorithm for learning a multiclass predictor that uses few shared features. We prove that ShareBoost efficiently finds a predictor that uses few shared features (if such a predictor exists) and that it has a small generalization error. We also describe how to use ShareBoost for learning a non-linear predictor that has a fast evaluation time. In a series of experiments with natural data sets we demonstrate the benefits of ShareBoost and evaluate its success relatively to other state-of-the-art approaches. 1
6 0.62419146 171 nips-2011-Metric Learning with Multiple Kernels
7 0.59639478 157 nips-2011-Learning to Search Efficiently in High Dimensions
8 0.58878106 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
9 0.5872764 111 nips-2011-Hashing Algorithms for Large-Scale Learning
10 0.58712643 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
11 0.58330369 177 nips-2011-Multi-armed bandits on implicit metric spaces
12 0.58194911 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
13 0.58017939 29 nips-2011-Algorithms and hardness results for parallel large margin learning
14 0.58006287 22 nips-2011-Active Ranking using Pairwise Comparisons
15 0.58004206 265 nips-2011-Sparse recovery by thresholded non-negative least squares
16 0.5770728 150 nips-2011-Learning a Distance Metric from a Network
17 0.57408559 231 nips-2011-Randomized Algorithms for Comparison-based Search
18 0.57377785 274 nips-2011-Structure Learning for Optimization
19 0.57335162 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
20 0.57331061 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness