nips nips2011 nips2011-226 knowledge-graph by maker-knowledge-mining

226 nips-2011-Projection onto A Nonnegative Max-Heap

Source: pdf

Author: Jun Liu, Liang Sun, Jieping Ye

Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classiﬁcation task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to ﬁnd the so-called maximal root-tree of the subtree rooted at each node. A naive approach for ﬁnding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for eﬃciently ﬁnding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the eﬀectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). [sent-7, score-0.79]

2 Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classiﬁcation task only if its parent node is selected. [sent-9, score-0.571]

3 In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to ﬁnd the so-called maximal root-tree of the subtree rooted at each node. [sent-10, score-0.536]

4 A naive approach for ﬁnding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. [sent-11, score-0.234]

5 We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for eﬃciently ﬁnding the maximal roottree. [sent-12, score-0.497]

6 We report simulation results showing the eﬀectiveness of the max-heap for regression with an ordered tree structure. [sent-14, score-0.4]

7 Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. [sent-15, score-0.5]

8 In this paper, we consider an ordered tree structure: a given feature is selected for the given regression/classiﬁcation task only if its parent node is selected. [sent-18, score-0.571]

9 To incorporate such ordered tree structure, we assume that the model parameter x ∈ Rp follows the non-negative max-heap structure1 : P = {x ≥ 0, xi ≥ xj ∀(xi , xj ) ∈ E t }, (1) where T t = (V t , E t ) is a target tree with V t = {x1 , x2 , . [sent-19, score-0.802]

10 The constraint set P implies that if xi is the parent node of a child node xj then the value of xi is no less than the value of xj . [sent-23, score-0.544]

11 In other words, if a parent node xi is 0, then any of its child nodes xj is also 0. [sent-24, score-0.41]

12 Figure 1 illustrates three special tree structures: 1) a full binary tree, 2) a sequential list, and 3) a tree with depth 1. [sent-25, score-0.76]

13 Plots (a), (b), and (c) correspond to a full binary tree, a sequential list, and a tree with depth 1, respectively. [sent-28, score-0.473]

14 The ordered tree structure is a special case of the non-negative garrote discussed in [24] when the hierarchical relationship is depicted by a tree. [sent-32, score-0.434]

15 Therefore, the asymptotic properties established in [24] are applicable to the ordered tree structrue. [sent-33, score-0.374]

16 Several related approaches that can incorporate the ordered tree structure include the Wedge approach [17] and the hierarchical group Lasso [25]. [sent-34, score-0.458]

17 The Wedge approach incorporates such ordering information x2 p 1 i by designing a penalty for the model parameter x as Ω(x|P ) = inf t∈P 2 i=1 ( ti + ti ), with tree being a sequential list. [sent-35, score-0.701]

18 By imposing the mixed ℓ1 -ℓ2 norm on each group formed by the nodes in the subtree of a parent node, the hierarchical group Lasso is able to incorporate such ordering information. [sent-36, score-0.456]

19 The hierarchical group Lasso has been applied for multi-task learning in [11] with a tree structure, and the eﬃcient computation was discussed in [10, 15]. [sent-37, score-0.371]

20 Compared to Wedge and hierarchical group Lasso, the max-heap in (1) incorporates such ordering information in a direct way, and our simulation results show that the max-heap can achieve lower reconstruction error than both approaches. [sent-38, score-0.178]

21 In estimating the model parameter satisfying the ordered tree structure, one needs to solve the following constrained optimization problem: min f (x) (2) x∈P for some convex function f (·). [sent-39, score-0.374]

22 , hyperplane, halfspace, and rectangle), the Euclidean projection admits a simple analytical solution, see [2]. [sent-44, score-0.124]

23 In Section 2, we show that the Euclidean projection admits an analytical solution, and we develop a top-down algorithm where the key operation is to ﬁnd the so-called maximal root-tree of the subtree rooted at each node. [sent-49, score-0.536]

24 In Section 3, we design a bottom-up algorithm with merge for eﬃciently ﬁnding the maximal root-tree by using its properties. [sent-50, score-0.285]

25 With the target tree T t = (V t , E t ), we construct the input tree T = (V, E) with the input vector v, where V = {v1 , v2 , . [sent-53, score-0.62]

26 For a non-empty tree T = (V, E), we deﬁne its root-tree as any non-empty ˜ ˜ ˜ ˜ ˜ ˜ tree T = (V , E) that satisﬁes: 1) V ⊆ V , 2) E ⊆ E, and 3) T shares the same root as T . [sent-61, score-0.638]

27 For a non-empty tree T = (V, E), we deﬁne R(T ) as the root-tree set containing all its root-trees. [sent-63, score-0.287]

28 For a non-empty tree T = (V, E), we deﬁne vi ∈V m(T ) = max vi |V | ,0 , (4) which equals the mean of all the nodes in T if such mean is non-negative, and 0 otherwise. [sent-65, score-1.012]

29 For a non-empty tree T = (V, E), we deﬁne its maximal root-tree as: Mmax (T ) = arg where max ˜ ˜ ˜ ˜ ˜ T =(V ,E):T ∈R(T ),m(T )=mmax (T ) ˜ mmax (T ) = max m(T ) ˜ T ∈R(T ) ˜ |V |, (5) (6) is the maximal value of all the root-trees of the tree T . [sent-67, score-1.127]

30 Note that if two root-trees share the same maximal value, (5) selects the one with the largest tree size. [sent-68, score-0.499]

31 ˜ ˜ ˜ ˜ ˜ When T = (V , E) is a part of a “larger” tree T = (V, E), i. [sent-69, score-0.287]

32 , V ⊆ V and E ⊆ E, we ˜ as a “super-node” of the tree T with value m(T ). [sent-71, score-0.287]

33 For a non-empty tree T = (V, E), we deﬁne its super-tree as S = (VS , ES ), which satisﬁes: 1) each node in VS = {T1 , T2 , . [sent-73, score-0.39]

34 , Tn } is a non-empty tree with Ti = (Vi , Ei ), n 2) Vi ⊆ V and Ei ⊆ E, 3) Vi Vj = ∅, i = j and V = i=1 Vi , and 4) (Ti , Tj ) ∈ ES if and only if there exists a node in Tj whose parent node is in Ti . [sent-76, score-0.587]

35 The key idea of the proposed algorithm is that, in the i-th call, we ﬁnd Ti = Mmax (T ), the maximal root-tree of T , set x corresponding to the nodes of Ti to mi = mmax (T ) = m(Ti ), remove Ti from the tree T , ˜ and apply Atda to the resulting trees one by one recursively. [sent-79, score-0.809]

36 , ri apply Atda(T 6: else 7: Set xj = mi , ∀vj ∈ Vi ˜ 8: end if 2. [sent-86, score-0.164]

37 2 Illustration & Justiﬁcation For a better illustration and justiﬁcation of the proposed algorithm, we provide the analysis of Atda for a special case—the sequential list—in the supplementary ﬁle A. [sent-87, score-0.127]

38 Figure 2 illustrates solving (3) via Algorithm 1 for a tree with depth 3. [sent-90, score-0.351]

39 Plot (a) shows a target tree T t , and plot (b) denotes the input tree T . [sent-91, score-0.663]

40 Plot (a) shows the target tree T t , and plots (b-e) illustrate Atda. [sent-93, score-0.318]

41 Plot (c) depicts ˜ the resulting trees after removing the maximal root-tree in plot (b), and plot (d) shows the generated maximal root-trees (enclosed by dashed frame) by the algorithm. [sent-97, score-0.652]

42 When treating each generated maximal root-tree as a super-node with the value deﬁned in Deﬁnition 3, plot (d) is a super-tree of the input tree T . [sent-98, score-0.588]

43 , the value of the parent node is no less than the values of its child nodes. [sent-101, score-0.261]

44 The edges of plot (f) correspond to the values of the dual variables, from ˜ which we can also obtain the optimal solution x ∈ R15 . [sent-103, score-0.148]

45 ˜ We verify the correctness of Algorithm 1 for the general tree in the following theorem. [sent-105, score-0.321]

46 , k i=1 constitute a disjoint partition of the input tree T . [sent-114, score-0.31]

47 With the sequences {Ti }k and {mi }k , i=1 i=1 we can construct a super-tree of the input tree T as follows: 1) we treat Ti as a super-node with value mi , and 2) we put an edge between Ti and Tj if there is an edge between the nodes of Ti and Tj in the input tree T . [sent-115, score-0.779]

48 With Algorithm 1, we can verify that the resulting super-tree has the property that the value of the parent node is no less than its child nodes. [sent-116, score-0.295]

49 ˜ ˜ Let xl and vl denote a subset of x and v corresponding to the indices appearing in the subtree Tl , respectively. [sent-118, score-0.595]

50 Denote P l = {xl : xl ≥ 0, xi ≥ xj , (vi , vj ) ∈ El }, I1 = {l : ml > 0}, I2 = {l : ml = 0}. [sent-119, score-0.614]

51 Our proof is based on the following inequality: min x∈P 1 x−v 2 2 2 ≥ min l∈I1 xl ∈P l 1 l x − vl 2 2 2 + min l∈I2 xl ∈P l 1 l x − vl 2 2 2, (7) which holds as the left hand side has the additional inequality constraints compared to the right hand side. [sent-120, score-1.005]

52 , xl = arg min ˜ 1 l x − vl 2 2 2 , ∀l ∈ I1 , (8) xl = arg min ˜ 1 l x − vl 2 2 2 , ∀l ∈ I2 , (9) xl ∈P l xl ∈P l 4 1 which, together with the fact 1 x − v 2 ≥ minx∈P 2 x − v 2 , x ∈ P lead to our main 2 2 ˜ 2 ˜ argument. [sent-123, score-1.576]

53 Firstly, ∀l ∈ I1 , we introduce the dual variable yij for the edge (vi , vj ) ∈ El , and yii if vi ∈ Ll , where Ll contains all the leaf nodes of the tree Tl . [sent-125, score-1.225]

54 For all vi ∈ Vl , vi = vrl , we denote its parent node by vji , and for the root vrl , we denote jrl = rl . [sent-127, score-1.044]

55 We let l Ci = {j|vj is a child node of vi in the tree Tl }. [sent-128, score-0.787]

56 l Ri = {j|vj is in the subtree of Tl rooted at vi }. [sent-129, score-0.498]

57 ˜ y j i i = v i − ml + ˜ (17) l j∈Ci According to Algorithm 1, xi = ml > 0, ∀vi ∈ Vl , l ∈ I1 . [sent-131, score-0.143]

58 According to (16) and (17), we have l vj − |Ri |ml , ∀vi ∈ Vl , y ji i = ˜ (18) l j∈Ri l l where |Ri | denotes the number of elements in Ri , the subtree of Tl rooted at vi . [sent-134, score-0.621]

59 From l the nature of the maximal root-tree Tl , l ∈ I1 , we have j∈Rl vj ≥ |Ri |ml . [sent-135, score-0.335]

60 Otherwise, if i l ¯ l j∈Ri vj < |Ri |ml , we can construct from Tl a new root-tree Tl by removing the subtree ¯ of Tl rooted at vi , so that Tl achieves a larger value than Tl . [sent-136, score-0.654]

61 This contradicts with the argument that Tl , l ∈ I1 is the maximal root-tree of the working tree T . [sent-137, score-0.499]

62 Secondly, we prove (9) by verifying the following optimality condition: xl − xl , xl − vl ≥ 0, ∀xl ∈ P l , l ∈ I2 , ˜ ˜ (19) l which is the so-called variational inequality condition for x being the optimal solution to (9). [sent-139, score-1.108]

63 Thus, (19) is equivalent to ˜ xl , vl ≤ 0, ∀xl ∈ P l , l ∈ I2 . [sent-141, score-0.491]

64 (20) For a given xl ∈ P l , if xi = 0, ∀vi ∈ V l , (20) naturally holds. [sent-142, score-0.336]

65 Denote by xl the minimal nonzero element in xl , and Tl1 = (Vl1 , El1 ) a tree constructed by ¯1 removing the nodes corresponding to the indices in the set {i : xl = 0, vi ∈ Vl } from Tl . [sent-144, score-1.603]

66 It follows from Algorithm 1 that i:vi ∈V 1 vi ≤ 0. [sent-146, score-0.333]

67 l Thus, we have x l , v l = xl ¯1 (xi − xl )vi ≤ ¯1 vi + i:vi ∈Vl1 i:vi ∈Vl1 (xi − xl )vi . [sent-147, score-1.224]

68 ¯1 i:vi ∈Vl1 5 ¯r If xl = xl , ∀vi ∈ Vl1 , we arrive at (20). [sent-148, score-0.594]

69 Otherwise, we set r = 2; denote by xl the minimal ¯1 i r−1 l r−1 r r nonzero element in the set {xi − j=1 xj : vi ∈ Vl }, and Tl = (Vl , Elr ) a subtree of ¯ r−1 Tlr−1 by removing those nodes with the indices in the set {i : xl − j=1 xl = 0, vi ∈ Vlr−1 }. [sent-149, score-1.804]

70 ¯j i r−1 r and Tl as well, so that it follows from It is clear that Tl shares the same root as Tl Algorithm 1 that i:vi ∈V r vi ≤ 0. [sent-150, score-0.397]

71 Therefore, we have l r−1 i:vi ∈Vlr−1 r xl )vi = xl ¯j ¯r (xi − j=1 i:vi ∈Vlr Repeating the above process until Vlr i:vi ∈Vlr r xl )vi ≤ ¯j (xi − vi + j=1 xl )vi . [sent-151, score-1.521]

72 For a better understanding of the proof, we make use of the edges of Figure 2 (f) to show the dual variables, where the edge connecting vi and vj corresponds to the dual variable yij , ˜ and the edge starting from the leaf node vi corresponds to the dual variable yii . [sent-153, score-1.525]

73 We note that, for the maximal root-tree with a ˜ positive value, the associated dual variables are unique, but for the maximal root-tree with zero value, the associated dual variables may not be unique. [sent-155, score-0.588]

74 For example, in Figure 2 (f), we set yii = 1 for i = 12, yii = 0 for i = 13, yij = 2 for i = 6, j = 12, and yij = 2 for ˜ ˜ ˜ ˜ i = 6, j = 13. [sent-156, score-0.542]

75 It is easy to check that the dual variables can also be set as follows: yii = 0 ˜ for i = 12, yii = 1 for i = 13, yij = 1 for i = 6, j = 12, and yij = 3 for i = 6, j = 13. [sent-157, score-0.624]

76 ˜ ˜ ˜ 3 Finding the Maximal Root-Tree A key operation of Algorithm 1 is to ﬁnd the maximal root-tree used in Step 2. [sent-158, score-0.247]

77 A naive approach for ﬁnding the maximal root-tree of a tree T is to enumerate all possible roottrees in the root-tree set R(T ), and identify the maximal root-tree via (5). [sent-159, score-0.733]

78 The underlying idea is to make use of the special structure of the maximal root-tree deﬁned in (5) for avoiding the enumeration of all possible root-trees. [sent-164, score-0.212]

79 We begin the discussion with some key properties of the maximal root-tree, and the proof is given in the supplementary ﬁle A. [sent-165, score-0.212]

80 For a non-empty tree T = (V, E), denote its maximal root-tree as Tmax = ˜ ˜ ˜ (Vmax , Emax ). [sent-168, score-0.499]

81 , vin , which satisfy: 1) vij ∈ V , 2) vij ∈ V , and 3) the parent node of vij is in ˜ . [sent-173, score-0.591]

82 If n ≥ 1, we denote the subtree of T rooted at vi as T j = (V j , E j ), j = 1, 2, . [sent-174, score-0.498]

83 , n, V j j j j j Tmax = (Vmax , Emax ) as the maximal root-trees of T j , and m = maxj=1,2,. [sent-177, score-0.212]

84 Let T = (V, E) be a non-empty tree, and T1 = (V 1 , E 1 ) and T2 = (V 2 , E 2 ) be two trees that satisfy: 1) they are composed of a subset of the nodes and edges of T , i. [sent-183, score-0.098]

85 , V 1 V 2 = ∅, and E 1 E 2 = ∅; and 3) in the tree T , vi2 , the root node of T2 is a child of vi1 , a leaf node ˜ ˜ ˜ ˜ of T1 . [sent-187, score-0.618]

86 Next, we make use of Lemma 1 to eﬃciently compute the maximal root-tree, and present the pseudo code for Abuam in Algorithm 2. [sent-189, score-0.236]

87 , vin the n nodes that satisfy: 1) vij ∈ V0 , 2) vij ∈ V , ˜ and 3) the parent node of vij is in V0 , and denote by T j = (V j , E j ), j = 1, 2, . [sent-196, score-0.65]

88 Tmax returned by Algorithm 2 is the maximal root-tree of the input tree T . [sent-202, score-0.522]

89 Lasso makes no use of such ordering, while Wedge incorporates the structure by using an auxiliary ordered variable. [sent-213, score-0.113]

90 For Group Lasso and Max-Heap, we try binary-tree grouping and list-tree grouping, where the associated trees are a full binary tree and a sequential list, respectively. [sent-214, score-0.502]

91 The regression vector is put on the tree so that, the closer the node to the root, the larger the element is placed. [sent-215, score-0.416]

92 In Group Lasso, the nodes appearing in the same subtree form a group. [sent-216, score-0.163]

93 MH-binary performs better than GL-binary, and MH-list performs better than Wedge and GL-list, due to the direct usage of such ordering information. [sent-220, score-0.097]

94 In addition, the list-tree grouping performs better than the binary-tree grouping, as it makes better usage of such ordering information. [sent-221, score-0.151]

95 In plots (a) and (b), we show the average model error x − x∗ 2 over 50 independet runs by diﬀerent approaches with the full binary-tree ordering and the list-tree ordering. [sent-229, score-0.155]

96 Eﬃciency of the Proposed Projection We test the eﬃciency of the proposed Atda approach for solving the Euclidean projection onto the non-negative max-heap, equipped with our proposed Abuam approach for ﬁnding the maximal root-trees. [sent-232, score-0.362]

97 In Figure 3 (e) & (f), we report the computational time of Atda over 100 runs when the ordered tree structure is a full binary tree. [sent-239, score-0.459]

98 Empirical results show that: 1) the proposed approach deals with the ordering information better than existing approaches, and 2) the proposed algorithm has an expected linear time complexity for the sequential list, the full binary tree, and the tree of depth 1. [sent-244, score-0.595]

99 We plan to apply the proposed algorithms to real-world applications with an ordered tree structure. [sent-246, score-0.401]

100 Tree-guided group lasso for multi-task regression with structured sparsity. [sent-318, score-0.182]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vi', 0.333), ('tmax', 0.323), ('xl', 0.297), ('tree', 0.287), ('tl', 0.265), ('atda', 0.257), ('maximal', 0.212), ('vl', 0.194), ('yii', 0.166), ('wedge', 0.147), ('emax', 0.134), ('mmax', 0.129), ('ti', 0.127), ('vj', 0.123), ('vij', 0.119), ('abuam', 0.11), ('vlr', 0.11), ('yij', 0.105), ('subtree', 0.104), ('node', 0.103), ('parent', 0.094), ('vmax', 0.088), ('list', 0.088), ('ordered', 0.087), ('lasso', 0.084), ('dual', 0.082), ('merge', 0.073), ('ordering', 0.068), ('sequential', 0.066), ('plot', 0.066), ('depth', 0.064), ('child', 0.064), ('rooted', 0.061), ('projection', 0.059), ('nodes', 0.059), ('ri', 0.057), ('mi', 0.056), ('ectiveness', 0.055), ('vrl', 0.055), ('erent', 0.055), ('euclidean', 0.054), ('grouping', 0.054), ('ml', 0.052), ('xj', 0.051), ('heredity', 0.048), ('arizona', 0.047), ('group', 0.047), ('el', 0.045), ('isotonic', 0.045), ('tj', 0.044), ('tempe', 0.042), ('kkt', 0.042), ('admits', 0.042), ('trees', 0.039), ('xi', 0.039), ('hierarchical', 0.037), ('ll', 0.037), ('anae', 0.037), ('vin', 0.037), ('yjrl', 0.037), ('root', 0.037), ('onto', 0.037), ('operation', 0.035), ('az', 0.035), ('illustration', 0.034), ('rl', 0.034), ('verify', 0.034), ('di', 0.034), ('frame', 0.033), ('removing', 0.033), ('slep', 0.032), ('plots', 0.031), ('le', 0.029), ('binary', 0.029), ('usage', 0.029), ('runs', 0.029), ('presenting', 0.028), ('enclosed', 0.028), ('shares', 0.027), ('full', 0.027), ('proposed', 0.027), ('regression', 0.026), ('incorporates', 0.026), ('structured', 0.025), ('liu', 0.025), ('dashed', 0.024), ('pseudo', 0.024), ('factorial', 0.024), ('mh', 0.024), ('variable', 0.024), ('leaf', 0.024), ('analytical', 0.023), ('ei', 0.023), ('depicted', 0.023), ('gl', 0.023), ('input', 0.023), ('inequality', 0.023), ('edge', 0.022), ('enumerate', 0.022), ('jenatton', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 226 nips-2011-Projection onto A Nonnegative Max-Heap

Author: Jun Liu, Liang Sun, Jieping Ye

2 0.17221884 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li

Abstract: We present a novel approach to efﬁciently learn a label tree for large scale classiﬁcation with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classiﬁers for each node in the tree. This approach also allows ﬁne grained control over the efﬁciency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classiﬁcation with 10184 classes and 9 million images. We demonstrate signiﬁcant improvements in test accuracy and efﬁciency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1

3 0.14614394 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg

Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the ﬁrst algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the ﬁrst rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1

4 0.13386175 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, speciﬁed by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-speciﬁc approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efﬁciency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1

5 0.11891804 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

Author: Iasonas Kokkinos

Abstract: In this work we use Branch-and-Bound (BB) to efﬁciently detect objects with deformable part models. Instead of evaluating the classiﬁer score exhaustively over image locations and scales, we use BB to focus on promising image locations. The core problem is to compute bounds that accommodate part deformations; for this we adapt the Dual Trees data structure [7] to our problem. We evaluate our approach using Mixture-of-Deformable Part Models [4]. We obtain exactly the same results but are 10-20 times faster on average. We also develop a multiple-object detection variation of the system, where hypotheses for 20 categories are inserted in a common priority queue. For the problem of ﬁnding the strongest category in an image this results in a 100-fold speedup.

6 0.11821204 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

7 0.11626693 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

8 0.10407554 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

9 0.1000048 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization

10 0.095524788 177 nips-2011-Multi-armed bandits on implicit metric spaces

11 0.090148069 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

12 0.089290231 78 nips-2011-Efficient Methods for Overlapping Group Lasso

13 0.087810077 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

14 0.08412724 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

15 0.078869291 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation

16 0.074602485 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

17 0.072792791 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time

18 0.072404958 157 nips-2011-Learning to Search Efficiently in High Dimensions

19 0.069711141 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

20 0.068135008 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, 0.012), (2, -0.095), (3, -0.032), (4, -0.023), (5, -0.066), (6, -0.104), (7, -0.061), (8, 0.003), (9, -0.191), (10, -0.019), (11, 0.014), (12, -0.207), (13, 0.14), (14, -0.007), (15, 0.059), (16, -0.013), (17, -0.024), (18, 0.015), (19, -0.074), (20, 0.079), (21, -0.116), (22, -0.016), (23, -0.021), (24, -0.201), (25, -0.003), (26, 0.019), (27, 0.032), (28, -0.062), (29, -0.148), (30, -0.009), (31, 0.083), (32, -0.045), (33, -0.13), (34, -0.049), (35, 0.082), (36, -0.032), (37, -0.067), (38, 0.04), (39, 0.065), (40, -0.081), (41, -0.084), (42, 0.008), (43, 0.004), (44, 0.054), (45, -0.043), (46, 0.048), (47, -0.068), (48, -0.009), (49, -0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96976924 226 nips-2011-Projection onto A Nonnegative Max-Heap

Author: Jun Liu, Liang Sun, Jieping Ye

2 0.76960713 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg

3 0.64473307 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

Author: Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efﬁcient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our ﬁnite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics. 1

4 0.64357197 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsiﬁcation via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We ﬁll this hole and introduce an efﬁcient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets conﬁrm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1

5 0.63914603 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

Author: Rémi Munos

Abstract: We consider a global optimization problem of a deterministic function f in a semimetric space, given a ﬁnite budget of n evaluations. The function f is assumed to be locally smooth (around one of its global maxima) with respect to a semi-metric ℓ. We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A ﬁrst contribution is an algorithm, DOO, that requires the knowledge of ℓ. We report a ﬁnite-sample performance bound in terms of a measure of the quantity of near-optimal states. We then deﬁne a second algorithm, SOO, which does not require the knowledge of the semimetric ℓ under which f is smooth, and whose performance is almost as good as DOO optimally-ﬁtted.

6 0.61059368 177 nips-2011-Multi-armed bandits on implicit metric spaces

7 0.57777023 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

8 0.52824062 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

9 0.52539951 274 nips-2011-Structure Learning for Optimization

10 0.50768602 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

11 0.50273001 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

12 0.46596411 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

13 0.46078879 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

14 0.44275942 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

15 0.42533106 157 nips-2011-Learning to Search Efficiently in High Dimensions

16 0.38326168 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

17 0.38118869 256 nips-2011-Solving Decision Problems with Limited Information

18 0.37322554 78 nips-2011-Efficient Methods for Overlapping Group Lasso

19 0.3639282 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

20 0.36147538 229 nips-2011-Query-Aware MCMC

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.038), (4, 0.054), (20, 0.023), (26, 0.061), (28, 0.258), (31, 0.035), (33, 0.012), (43, 0.061), (45, 0.127), (57, 0.104), (74, 0.05), (83, 0.035), (99, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79270375 110 nips-2011-Group Anomaly Detection using Flexible Genre Models

Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider

Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1

same-paper 2 0.78340429 226 nips-2011-Projection onto A Nonnegative Max-Heap

Author: Jun Liu, Liang Sun, Jieping Ye

3 0.76355243 95 nips-2011-Fast and Accurate k-means For Large Datasets

Author: Michael Shindler, Alex Wong, Adam W. Meyerson

Abstract: Clustering is a popular problem with many applications. We consider the k-means problem in the situation where the data is too large to be stored in main memory and must be accessed sequentially, such as from a disk, and where we must use as little memory as possible. Our algorithm is based on recent theoretical results, with significant improvements to make it practical. Our approach greatly simplifies a recently developed algorithm, both in design and in analysis, and eliminates large constant factors in the approximation guarantee, the memory requirements, and the running time. We then incorporate approximate nearest neighbor search to compute k-means in o( nk) (where n is the number of data points; note that computing the cost, given a solution, takes 8(nk) time). We show that our algorithm compares favorably to existing algorithms - both theoretically and experimentally, thus providing state-of-the-art performance in both theory and practice.

4 0.70154727 156 nips-2011-Learning to Learn with Compound HD Models

Author: Antonio Torralba, Joshua B. Tenenbaum, Ruslan Salakhutdinov

Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Speciﬁcally we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efﬁcient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets. 1

5 0.65392846 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing

Author: Shai Shalev-shwartz, Yonatan Wexler, Amnon Shashua

Abstract: Multiclass prediction is the problem of classifying an object into a relevant target class. We consider the problem of learning a multiclass predictor that uses only few features, and in particular, the number of used features should increase sublinearly with the number of possible classes. This implies that features should be shared by several classes. We describe and analyze the ShareBoost algorithm for learning a multiclass predictor that uses few shared features. We prove that ShareBoost efﬁciently ﬁnds a predictor that uses few shared features (if such a predictor exists) and that it has a small generalization error. We also describe how to use ShareBoost for learning a non-linear predictor that has a fast evaluation time. In a series of experiments with natural data sets we demonstrate the beneﬁts of ShareBoost and evaluate its success relatively to other state-of-the-art approaches. 1

6 0.62419146 171 nips-2011-Metric Learning with Multiple Kernels

7 0.59639478 157 nips-2011-Learning to Search Efficiently in High Dimensions

8 0.58878106 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

9 0.5872764 111 nips-2011-Hashing Algorithms for Large-Scale Learning

10 0.58712643 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

11 0.58330369 177 nips-2011-Multi-armed bandits on implicit metric spaces

12 0.58194911 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

13 0.58017939 29 nips-2011-Algorithms and hardness results for parallel large margin learning

14 0.58006287 22 nips-2011-Active Ranking using Pairwise Comparisons

15 0.58004206 265 nips-2011-Sparse recovery by thresholded non-negative least squares

16 0.5770728 150 nips-2011-Learning a Distance Metric from a Network

17 0.57408559 231 nips-2011-Randomized Algorithms for Comparison-based Search

18 0.57377785 274 nips-2011-Structure Learning for Optimization

19 0.57335162 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

20 0.57331061 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness