nips nips2010 nips2010-88 knowledge-graph by maker-knowledge-mining

88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

Source: pdf

Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott

Abstract: Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. This interpretation is then used to extend GBS in two ways. First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu 1 Abstract Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. [sent-6, score-0.594]

2 Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. [sent-7, score-0.471]

3 First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. [sent-9, score-0.73]

4 Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. [sent-10, score-0.371]

5 In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. [sent-11, score-0.253]

6 An unknown object θ is generated from this set Θ with a certain prior probability distribution Π = (π1 , · · · , πM ), i. [sent-14, score-0.319]

7 , πi = Pr(θ = θi ), and the goal is to uniquely identify this unknown object through as few queries from Q as possible, where a query q ∈ Q returns a value 1 if θ ∈ q, and 0 otherwise. [sent-16, score-0.808]

8 For example, in active learning, the objects are classiﬁers and the queries are the labels for ﬁxed test points. [sent-17, score-0.446]

9 In active diagnosis, objects may correspond to faults, and queries to alarms. [sent-18, score-0.426]

10 We will refer to this problem as object identiﬁcation. [sent-20, score-0.238]

11 The goal in object identiﬁcation is to construct an optimal binary decision tree, where each internal node in the tree is associated with a query from Q, and each leaf node corresponds to an object from Θ. [sent-22, score-1.443]

12 Optimality is often with respect to the expected depth of the leaf node corresponding to the unknown object θ. [sent-23, score-0.627]

13 Hence, various greedy algorithms [5, 14] have been proposed to obtain a suboptimal binary decision tree. [sent-25, score-0.244]

14 This is the greedy algorithm which selects a query that most evenly divides the probability mass of the remaining objects [1, 2, 5, 15]. [sent-27, score-0.471]

15 However, in applications such as disease diagnosis, where Θ is a collection of possible diseases, it may only be necessary to identify the intervention or response to an object, rather than the object itself. [sent-29, score-0.347]

16 In these problems, the object set Θ is partitioned into groups and it is only necessary to identify the group to which the unknown object belongs. [sent-30, score-0.951]

17 To address this problem, we ﬁrst present a new interpretation of GBS from a coding-theoretic perspective by viewing the problem of object identiﬁcation as constrained source coding. [sent-32, score-0.238]

18 Speciﬁcally, we present an exact formula for the expected number of queries required to identify an unknown object in terms of Shannon entropy of the prior distribution Π, and show that GBS is a top-down algorithm that greedily minimizes this cost function. [sent-33, score-0.984]

19 We also extend the coding theoretic framework to the problem of object (or group) identiﬁcation where the cost of identifying an object grows exponentially in the number of queries, i. [sent-35, score-0.657]

20 , the cost of identifying an object using d queries is λd for some ﬁxed λ > 1. [sent-37, score-0.583]

21 However, there does not exist an algorithm to the best of our knowledge that constructs a good suboptimal decision tree for the problem of object/group identiﬁcation with exponential costs. [sent-40, score-0.296]

22 Once again, we show below that GBS is not necessarily efﬁcient for minimizing the exponential cost function, and propose an improved greedy algorithm that generalizes GBS. [sent-41, score-0.263]

23 1 Notation We denote an object identiﬁcation problem by a pair (B, Π) where B is a known M × N binary matrix with bij equal to 1 if θi ∈ qj , and 0 otherwise. [sent-43, score-0.292]

24 A decision tree T constructed on (B, Π) has a query from the set Q at each of its internal nodes, with the leaf nodes terminating in the objects from Θ. [sent-44, score-0.799]

25 For a decision tree with L leaves, the leaf nodes are indexed by the set L = {1, · · · , L} and the internal nodes are indexed by the set I = {L + 1, · · · , 2L − 1}. [sent-45, score-0.517]

26 At any node ‘a’, let Qa ⊆ Q denote the set of queries that have been performed along the path from the root node up to that node. [sent-46, score-0.592]

27 An object θi reaches node ‘a’ if it agrees with the true θ on all queries in Qa , i. [sent-47, score-0.632]

28 , the binary values in B for the rows corresponding to θi and θ are same over the columns corresponding to queries in Qa . [sent-49, score-0.286]

29 At any internal node a ∈ I, let l(a), r(a) denote the “left” and “right” child nodes, and let Θa ⊆ Θ denote the set of objects that reach node ‘a’. [sent-50, score-0.678]

30 Thus, the sets Θl(a) ⊆ Θa , Θr(a) ⊆ Θa correspond to the objects in Θa that respond 0 and 1 to the query at node ‘a’, respectively. [sent-51, score-0.504]

31 We denote by πΘa := {i:θi ∈Θa } πi , the probability mass of the objects reaching node ‘a’ in the tree. [sent-52, score-0.343]

32 π→0 2 GBS Greedily Minimizes the Expected Number of Queries We begin by noting that object identiﬁcation reduces to the standard source coding problem [19] in the special case when Q is complete, meaning, for any S ⊆ Θ there exists a query q ∈ Q such that either q = S or Θ \ q = S. [sent-54, score-0.505]

33 Here, the problem of constructing an optimal binary decision tree is equivalent to constructing optimal variable length binary preﬁx codes, for which there exists an efﬁcient optimal algorithm known as the Huffman algorithm [20]. [sent-55, score-0.355]

34 , expected depth of any binary decision tree) is bounded below by the Shannon entropy of the prior distribution Π [19]. [sent-58, score-0.351]

35 For the problem of object identiﬁcation, where Q is not complete, the entropy lower bound is still valid, but Huffman coding cannot be implemented. [sent-59, score-0.385]

36 We now show that GBS is actually greedily minimizing the expected number of queries required to identify an object. [sent-61, score-0.48]

37 2 First, we deﬁne a parameter called the reduction factor on the binary matrix/tree combination that provides a useful quantiﬁcation on the expected number of queries required to identify an object. [sent-62, score-0.51]

38 The reduction factor at any internal node ‘a’ in the tree is deﬁned by ρa = max{πΘl(a) , πΘr(a) }/πΘa . [sent-65, score-0.535]

39 Given an object identiﬁcation problem (B, Π), let T (B, Π) denote the set of decision trees that can uniquely identify all the objects in the set Θ. [sent-68, score-0.589]

40 For any decision tree T ∈ T (B, Π), let {ρa }a∈I denote the set of reduction factors and let di denote the number of queries required to identify object θi in the tree. [sent-70, score-0.872]

41 Then the expected number of queries required to identify an unknown object using a tree (or, the expected depth of a tree) is L1 (Π) = i πi di . [sent-71, score-0.972]

42 For any T ∈ T (B, Π), the expected number of queries required to identify an unknown object is given by L1 (Π) = H(Π) + πΘa [1 − H(ρa )]. [sent-75, score-0.685]

43 It also follows from the above result that a tree attains the entropy bound iff the reduction factors are equal to 0. [sent-80, score-0.286]

44 Instead, we may take a top down approach and minimize the objective function by minimizing the term Ca := πΘa [1 − H(ρa )] at each internal node, starting from the root node. [sent-85, score-0.262]

45 Note that the only term that depends on the query chosen at node ‘a’ in this cost function is ρa . [sent-86, score-0.388]

46 , choosing a split as balanced as possible) at each internal node a ∈ I. [sent-89, score-0.326]

47 In the next section, we show how this framework can be extended to derive greedy algorithms for the problems of group identiﬁcation and object identiﬁcation with exponential costs. [sent-91, score-0.636]

48 1 Extensions of GBS Group Identiﬁcation In group identiﬁcation1 , the goal is not to determine the unknown object θ ∈ Θ, rather the group to which it belongs, in as few queries as possible. [sent-93, score-0.987]

49 Here, in addition to B and Π, the group labels for the objects are also provided, where the groups are assumed to be disjoint. [sent-94, score-0.457]

50 We denote a group identiﬁcation problem by (B, Π, y), where y = (y1 , · · · , yM ) denotes the group labels of the objects, yi ∈ {1, · · · , K}. [sent-95, score-0.478]

51 It is important to note here that the group identiﬁcation problem cannot be simply reduced to an object identiﬁcation problem with groups {Θ1 , · · · , ΘK } as “meta objects,” since the objects within a group need not respond the same to each query. [sent-97, score-0.936]

52 For instance, consider the toy example shown in Figure 1 where the objects θ1 , θ2 and θ3 belonging to group 1 cannot be collapsed into a single meta object as these objects respond differently to queries q1 and q3 . [sent-98, score-1.07]

53 In this context, we also note that GBS can fail to produce a good solution for a group identiﬁcation problem as it does not take the group labels into consideration while choosing queries. [sent-99, score-0.478]

54 Once again, consider the toy example shown in Figure 1 where query q2 is sufﬁcient to identify the group of an unknown object, whereas GBS requires 2 queries to identify the group when the unknown object is either θ2 or θ4 . [sent-100, score-1.416]

55 [23] simultaneously studied the problem of group identiﬁcation in the context of object identiﬁcation with persistent noise. [sent-104, score-0.467]

56 25 0 Figure 2: Decision tree constructed using GBS Figure 1: Toy Example Note that when constructing a tree for group identiﬁcation, a greedy, top-down algorithm terminates splitting when all the objects at the node belong to the same group. [sent-113, score-0.88]

57 Hence, a tree constructed in this fashion can have multiple objects ending in the same leaf node and multiple leaves ending in the same group. [sent-114, score-0.595]

58 For a tree with L leaves, we denote by Lk ⊂ L = {1, · · · , L} the set of leaves that terminate in group k. [sent-115, score-0.402]

59 Similar to Θk ⊆ Θ, we denote by Θk ⊆ Θa the set of objects belonging to a group k that reach node ‘a’ in a tree. [sent-116, score-0.533]

60 Also, in addition to the reduction factor deﬁned in Section 2, we deﬁne a new parameter called the group reduction factor for each group k ∈ {1, · · · , K} at each internal node. [sent-117, score-0.758]

61 Let T be a decision tree constructed on a group identiﬁcation problem (B, Π, y). [sent-119, score-0.452]

62 The group reduction factor for any group k at an internal node ‘a’ is deﬁned by ρk = max{πΘk , πΘk }/πΘk . [sent-120, score-0.852]

63 a a l(a) r(a) Given (B, Π, y), let T (B, Π, y) denote the set of decision trees that can uniquely identify the groups of all objects in the set Θ. [sent-121, score-0.417]

64 For any decision tree T ∈ T (B, Π, y), let dj denote the depth of leaf node j ∈ L. [sent-122, score-0.556]

65 Let random variable X denote the number of queries required to identify the group of an unknown object θ. [sent-123, score-0.879]

66 Then, the expected number of queries required to identify the group of an unknown object using the given tree is equal to   K K πΘj L1 (Π) = Pr(θ ∈ Θk ) E[X|θ ∈ Θk ] = πΘk  dj  (3) πΘk k k=1 k=1 j∈L Theorem 2. [sent-124, score-1.113]

67 5 and the group reduction factors {ρk }K are equal to 1 at every internal node in the tree. [sent-129, score-0.601]

68 Also, a k=1 note that the result in Theorem 1 is a special case of this result where each group is of size 1 leading to ρk = 1 for all groups at every internal node. [sent-130, score-0.459]

69 Hence, we may take a top-down approach and minimize the objective function greedily by minimizing the term π Θk K πΘa [1 − H(ρa ) + k=1 πΘa H(ρk )] at each internal node, starting from the root node. [sent-133, score-0.318]

70 Note that a a the terms that depend on the query chosen at node ‘a’ are ρa and ρk . [sent-134, score-0.33]

71 Hence the algorithm reduces a π Θk K to minimizing Ca := 1 − H(ρa ) + k=1 πΘa H(ρk ) at each internal node ‘a’. [sent-135, score-0.413]

72 2 Exponential Costs Now assume the cost of identifying an object is deﬁned by Lλ (Π) := logλ ( i πi λdi ), where λ > 1 and di corresponds to the depth of object θi in a tree. [sent-141, score-0.7]

73 In the limiting case where λ tends to 1 and ∞, this cost function reduces to the average depth and worst case depth, respectively. [sent-142, score-0.254]

74 Here, we use a result by Campbell [26] which states that the exponential cost Lλ (Π) of any tree T is bounded below by the α-R´ nyi entropy, given by e 1 1 α Hα (Π) := 1−α log2 ( i πi ), where α = 1+log λ . [sent-146, score-0.344]

75 We consider a general object identiﬁcation 2 problem and derive an explicit formula for the gap in this lower bound. [sent-147, score-0.3]

76 We then use this formula to derive a family of greedy algorithms that minimize the exponential cost function Lλ (Π) for λ > 1. [sent-148, score-0.29]

77 Note that the entropy bound reduces to the Shannon entropy H(Π) and log2 M , in the limiting cases where λ tends to 1 and ∞, respectively. [sent-149, score-0.315]

78 This result suggests a top-down greedy approach to minimize Lλ (Π), πΘl(a) πΘr(a) which is to minimize the term (λ − 1)λda − Dα (Θa ) + πΘ Dα (Θl(a) ) + πΘ Dα (Θr(a) ) at a a each internal node, starting from the root node. [sent-153, score-0.354]

79 Noting that the terms that depend on the query chosen at node ‘a’ are πΘl(a) , πΘr(a) , Dα (Θl(a) ) and Dα (Θr(a) ), this reduces to minimizing Ca := πΘl(a) πΘ r(a) Dα (Θl(a) ) + πΘ Dα (Θr(a) ) at each internal node. [sent-154, score-0.581]

80 5, 1] Figure 6: Expected number of queries required to identify the group of an object using GBS and GGBS for different values of β when α = 1 3. [sent-169, score-0.82]

81 3 Group Identiﬁcation with Exponential Costs Finally, we complete our discussion by considering the problem of group identiﬁcation with exponential costs. [sent-170, score-0.296]

82 Here, the cost of identifying the group of an object given a tree T ∈ T (B, Π, y), is dj deﬁned to be Lλ (Π) = logλ , which reduces to (3) in the limiting case as λ → 1, j∈L πΘj λ and to maxj∈L dj , i. [sent-171, score-0.927]

83 This result also implies a top-down, greedy algorithm to minimize Lλ (Π), which is to choose a query that minimizes πΘl(a) πΘr(a) Ca := πΘ Dα (Θl(a) ) + πΘ Dα (Θr(a) ) at each internal node. [sent-178, score-0.487]

84 Once again, it can be shown by a a the application of L’Hˆ pital’s rule that in the limiting case where λ → 1, this reduces to GGBS, and o in the case where λ → ∞, this reduces to choosing a query that minimizes the maximum number of groups in the child nodes [27]. [sent-179, score-0.498]

85 Here, γw (q) reﬂects the correlation of the object responses within a group, and γb (q) captures the correlation of object responses between groups. [sent-184, score-0.58]

86 5, each object within a group is equally likely to exhibit 0 or 1 as its response to query q, whereas, when it is close to 1, most of the objects within a group are highly likely to exhibit the same query response. [sent-186, score-1.246]

87 Given these correlation values (γw (q), γb (q)) for a query q, the object responses to query q (i. [sent-189, score-0.626]

88 For each group k ∈ {1, · · · , K}, assign a binary label bk , where bk = x with probability γb (q) 3. [sent-193, score-0.333]

89 For each object in group k, assign bk as the object response to q with probability γw (q) Given the correlation parameters (γw (q), γb (q)), ∀q ∈ Q, a random dataset can be created by following the above procedure for each query. [sent-194, score-0.783]

90 We demonstrate the results on datasets of size N = 200 (# of queries) and M = 400 (# of objects), where we randomly partitioned the objects into 15 groups and assumed a uniform prior on the objects. [sent-196, score-0.294]

91 As the correlation parameter γw (q) tends to 1, choosing that query keeps the groups intact, i. [sent-210, score-0.292]

92 , the group reduction factors ρk tend to 1 for these queries. [sent-212, score-0.275]

93 Such queries offer signiﬁcant gains a in group identiﬁcation, but can be overlooked by GBS. [sent-213, score-0.461]

94 7 5 Conclusions In this paper, we show that generalized binary search (GBS) is a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. [sent-230, score-0.579]

95 First, we consider the case where the objects are partitioned into groups, and the goal is to identify only the group of the unknown object. [sent-232, score-0.551]

96 Second, we consider the problem where the cost of identifying an object grows exponentially in the number of queries. [sent-233, score-0.371]

97 Now, we note from Lemma 1 that λda πΘa =⇒ λLλ (Π) = 1 + Lλ = a∈I (λ − 1)λda πΘa (7) a∈I where da denotes the depth of internal node ‘a’ in the tree T . [sent-249, score-0.643]

98 The function Lλ can be decomposed over the internal nodes in a tree T , as Lλ = da a∈I λ πΘa , where da denotes the depth of internal node a ∈ I and πΘa is the probability mass of the objects at that node. [sent-253, score-1.133]

99 The function Hα can be decomposed over the internal nodes in a tree T , as 1 πΘa Dα (Θa ) − πΘl(a) Dα (Θl(a) ) − πΘr(a) Dα (Θr(a) ) Hα = 1 α K α a∈I k=1 πΘk where Dα (Θa ) := internal node a ∈ I. [sent-255, score-0.679]

100 K k=1 πΘk a πΘa α 1 α and πΘa denotes the probability mass of the objects at any The above two lemmas can be proved using induction over subtrees rooted at any internal node ‘a’ in the tree. [sent-256, score-0.507]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gbs', 0.663), ('object', 0.238), ('queries', 0.232), ('group', 0.229), ('ggbs', 0.176), ('query', 0.168), ('internal', 0.164), ('node', 0.162), ('objects', 0.142), ('tree', 0.141), ('identi', 0.133), ('shannon', 0.115), ('greedy', 0.102), ('entropy', 0.099), ('da', 0.097), ('identify', 0.087), ('depth', 0.079), ('nyi', 0.078), ('huffman', 0.07), ('exponential', 0.067), ('diagnosis', 0.067), ('beta', 0.066), ('groups', 0.066), ('bhavnani', 0.062), ('decision', 0.062), ('qa', 0.06), ('unknown', 0.059), ('cost', 0.058), ('dj', 0.058), ('greedily', 0.056), ('identifying', 0.055), ('binary', 0.054), ('leaf', 0.054), ('costs', 0.052), ('active', 0.052), ('reduces', 0.051), ('nodes', 0.048), ('child', 0.048), ('coding', 0.048), ('bellala', 0.046), ('golovin', 0.046), ('reduction', 0.046), ('campbell', 0.04), ('limiting', 0.039), ('mass', 0.039), ('formula', 0.037), ('ca', 0.037), ('root', 0.036), ('trees', 0.036), ('minimizing', 0.036), ('cation', 0.035), ('pital', 0.035), ('qroot', 0.035), ('toxic', 0.035), ('expected', 0.035), ('required', 0.034), ('partitioned', 0.034), ('chemical', 0.032), ('respond', 0.032), ('leaves', 0.032), ('di', 0.032), ('correlation', 0.031), ('gowtham', 0.031), ('ww', 0.031), ('uniform', 0.03), ('scott', 0.029), ('minq', 0.028), ('toy', 0.028), ('minimizes', 0.027), ('tends', 0.027), ('search', 0.027), ('generalized', 0.027), ('meta', 0.027), ('suboptimal', 0.026), ('minimize', 0.026), ('skewed', 0.025), ('bk', 0.025), ('theorem', 0.025), ('gap', 0.025), ('exhibit', 0.025), ('uniquely', 0.024), ('dd', 0.024), ('codes', 0.024), ('splitting', 0.023), ('krause', 0.022), ('response', 0.022), ('constructing', 0.022), ('prior', 0.022), ('factor', 0.022), ('tu', 0.022), ('ending', 0.022), ('responses', 0.021), ('supplemental', 0.02), ('texas', 0.02), ('labels', 0.02), ('exponentially', 0.02), ('constructed', 0.02), ('evenly', 0.02), ('tx', 0.02), ('lim', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott

2 0.31223473 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

Author: Daniel Golovin, Andreas Krause, Debajyoti Ray

Abstract: We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near–optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2 , a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the ﬁrst competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non–uniform cost and their noise is correlated. We also propose E FF ECXTIVE , a particularly fast approximation of EC 2 , and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty. 1

3 0.13982476 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

Author: Samy Bengio, Jason Weston, David Grangier

Abstract: Multi-class classiﬁcation becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classiﬁers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including One-vs-Rest while being orders of magnitude faster. 1

4 0.13769794 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classiﬁcation; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classiﬁers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efﬁcient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.

5 0.11869314 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

Author: Jun Liu, Jieping Ye

Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-deﬁned tree structure is based on a group-Lasso penalty, where one group is deﬁned for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efﬁcient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efﬁcient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efﬁciency and effectiveness of the proposed algorithm.

6 0.11751692 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

7 0.11559714 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

8 0.11067326 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

9 0.091837436 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

10 0.088846438 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

11 0.086294971 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

12 0.085114986 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

13 0.076994546 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

14 0.075310409 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

15 0.075000279 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

16 0.071823105 144 nips-2010-Learning Efficient Markov Networks

17 0.071230918 1 nips-2010-(RF)^2 -- Random Forest Random Field

18 0.070820816 149 nips-2010-Learning To Count Objects in Images

19 0.070087142 150 nips-2010-Learning concept graphs from text with stick-breaking priors

20 0.068106003 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, 0.045), (2, -0.002), (3, -0.068), (4, -0.09), (5, -0.039), (6, -0.088), (7, 0.006), (8, 0.073), (9, -0.046), (10, 0.077), (11, 0.131), (12, -0.262), (13, -0.043), (14, 0.007), (15, -0.108), (16, -0.052), (17, -0.088), (18, -0.067), (19, 0.038), (20, -0.032), (21, 0.1), (22, 0.036), (23, -0.17), (24, -0.095), (25, -0.141), (26, -0.036), (27, 0.12), (28, 0.027), (29, -0.044), (30, -0.088), (31, 0.047), (32, 0.087), (33, 0.085), (34, 0.164), (35, 0.02), (36, -0.006), (37, -0.114), (38, -0.244), (39, -0.076), (40, -0.116), (41, 0.068), (42, 0.056), (43, -0.056), (44, 0.06), (45, 0.018), (46, 0.048), (47, -0.063), (48, -0.047), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96045196 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott

2 0.74299479 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

Author: Daniel Golovin, Andreas Krause, Debajyoti Ray

3 0.46379 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

Author: Jun Liu, Jieping Ye

4 0.42996138 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

5 0.41977078 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems

Author: Han Liu, Xi Chen

Abstract: We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classiﬁcation and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1

6 0.40570125 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

7 0.39941663 144 nips-2010-Learning Efficient Markov Networks

8 0.39736116 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

9 0.39349902 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets

10 0.37221029 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

11 0.36904886 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

12 0.34988561 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

13 0.34657705 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

14 0.33481222 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

15 0.33293471 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

16 0.33258894 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

17 0.31657365 1 nips-2010-(RF)^2 -- Random Forest Random Field

18 0.30686286 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

19 0.30630529 168 nips-2010-Monte-Carlo Planning in Large POMDPs

20 0.30474821 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.04), (27, 0.061), (30, 0.116), (35, 0.019), (36, 0.174), (45, 0.264), (50, 0.032), (52, 0.042), (60, 0.058), (77, 0.038), (78, 0.013), (90, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91021395 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott

2 0.86001176 243 nips-2010-Smoothness, Low Noise and Fast Rates

Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari

Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1

3 0.85829806 155 nips-2010-Learning the context of a category

Author: Dan Navarro

Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is ﬁt to multiple data sets, and provides a parsimonious method for describing how humans learn context speciﬁc conceptual representations.

4 0.85560256 280 nips-2010-Unsupervised Kernel Dimension Reduction

Author: Meihong Wang, Fei Sha, Michael I. Jordan

Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classiﬁcation, our methods lead to lower classiﬁcation errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

5 0.85550368 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is conﬁrmed both by theoretical lower bounds and simulations for various networks. 1

6 0.8542906 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

7 0.85424453 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

8 0.85407847 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

9 0.85350412 233 nips-2010-Scrambled Objects for Least-Squares Regression

10 0.85302943 290 nips-2010-t-logistic regression

11 0.8530038 134 nips-2010-LSTD with Random Projections

12 0.85282809 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization

13 0.85282159 27 nips-2010-Agnostic Active Learning Without Constraints

14 0.85221165 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

15 0.85219491 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

16 0.85162741 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

17 0.85139811 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

18 0.85081542 148 nips-2010-Learning Networks of Stochastic Differential Equations

19 0.8497802 287 nips-2010-Worst-Case Linear Discriminant Analysis

20 0.84964502 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models