nips nips2010 nips2010-88 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott
Abstract: Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. This interpretation is then used to extend GBS in two ways. First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu 1 Abstract Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. [sent-6, score-0.594]
2 Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. [sent-7, score-0.471]
3 First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. [sent-9, score-0.73]
4 Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. [sent-10, score-0.371]
5 In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. [sent-11, score-0.253]
6 An unknown object θ is generated from this set Θ with a certain prior probability distribution Π = (π1 , · · · , πM ), i. [sent-14, score-0.319]
7 , πi = Pr(θ = θi ), and the goal is to uniquely identify this unknown object through as few queries from Q as possible, where a query q ∈ Q returns a value 1 if θ ∈ q, and 0 otherwise. [sent-16, score-0.808]
8 For example, in active learning, the objects are classifiers and the queries are the labels for fixed test points. [sent-17, score-0.446]
9 In active diagnosis, objects may correspond to faults, and queries to alarms. [sent-18, score-0.426]
10 We will refer to this problem as object identification. [sent-20, score-0.238]
11 The goal in object identification is to construct an optimal binary decision tree, where each internal node in the tree is associated with a query from Q, and each leaf node corresponds to an object from Θ. [sent-22, score-1.443]
12 Optimality is often with respect to the expected depth of the leaf node corresponding to the unknown object θ. [sent-23, score-0.627]
13 Hence, various greedy algorithms [5, 14] have been proposed to obtain a suboptimal binary decision tree. [sent-25, score-0.244]
14 This is the greedy algorithm which selects a query that most evenly divides the probability mass of the remaining objects [1, 2, 5, 15]. [sent-27, score-0.471]
15 However, in applications such as disease diagnosis, where Θ is a collection of possible diseases, it may only be necessary to identify the intervention or response to an object, rather than the object itself. [sent-29, score-0.347]
16 In these problems, the object set Θ is partitioned into groups and it is only necessary to identify the group to which the unknown object belongs. [sent-30, score-0.951]
17 To address this problem, we first present a new interpretation of GBS from a coding-theoretic perspective by viewing the problem of object identification as constrained source coding. [sent-32, score-0.238]
18 Specifically, we present an exact formula for the expected number of queries required to identify an unknown object in terms of Shannon entropy of the prior distribution Π, and show that GBS is a top-down algorithm that greedily minimizes this cost function. [sent-33, score-0.984]
19 We also extend the coding theoretic framework to the problem of object (or group) identification where the cost of identifying an object grows exponentially in the number of queries, i. [sent-35, score-0.657]
20 , the cost of identifying an object using d queries is λd for some fixed λ > 1. [sent-37, score-0.583]
21 However, there does not exist an algorithm to the best of our knowledge that constructs a good suboptimal decision tree for the problem of object/group identification with exponential costs. [sent-40, score-0.296]
22 Once again, we show below that GBS is not necessarily efficient for minimizing the exponential cost function, and propose an improved greedy algorithm that generalizes GBS. [sent-41, score-0.263]
23 1 Notation We denote an object identification problem by a pair (B, Π) where B is a known M × N binary matrix with bij equal to 1 if θi ∈ qj , and 0 otherwise. [sent-43, score-0.292]
24 A decision tree T constructed on (B, Π) has a query from the set Q at each of its internal nodes, with the leaf nodes terminating in the objects from Θ. [sent-44, score-0.799]
25 For a decision tree with L leaves, the leaf nodes are indexed by the set L = {1, · · · , L} and the internal nodes are indexed by the set I = {L + 1, · · · , 2L − 1}. [sent-45, score-0.517]
26 At any node ‘a’, let Qa ⊆ Q denote the set of queries that have been performed along the path from the root node up to that node. [sent-46, score-0.592]
27 An object θi reaches node ‘a’ if it agrees with the true θ on all queries in Qa , i. [sent-47, score-0.632]
28 , the binary values in B for the rows corresponding to θi and θ are same over the columns corresponding to queries in Qa . [sent-49, score-0.286]
29 At any internal node a ∈ I, let l(a), r(a) denote the “left” and “right” child nodes, and let Θa ⊆ Θ denote the set of objects that reach node ‘a’. [sent-50, score-0.678]
30 Thus, the sets Θl(a) ⊆ Θa , Θr(a) ⊆ Θa correspond to the objects in Θa that respond 0 and 1 to the query at node ‘a’, respectively. [sent-51, score-0.504]
31 We denote by πΘa := {i:θi ∈Θa } πi , the probability mass of the objects reaching node ‘a’ in the tree. [sent-52, score-0.343]
32 π→0 2 GBS Greedily Minimizes the Expected Number of Queries We begin by noting that object identification reduces to the standard source coding problem [19] in the special case when Q is complete, meaning, for any S ⊆ Θ there exists a query q ∈ Q such that either q = S or Θ \ q = S. [sent-54, score-0.505]
33 Here, the problem of constructing an optimal binary decision tree is equivalent to constructing optimal variable length binary prefix codes, for which there exists an efficient optimal algorithm known as the Huffman algorithm [20]. [sent-55, score-0.355]
34 , expected depth of any binary decision tree) is bounded below by the Shannon entropy of the prior distribution Π [19]. [sent-58, score-0.351]
35 For the problem of object identification, where Q is not complete, the entropy lower bound is still valid, but Huffman coding cannot be implemented. [sent-59, score-0.385]
36 We now show that GBS is actually greedily minimizing the expected number of queries required to identify an object. [sent-61, score-0.48]
37 2 First, we define a parameter called the reduction factor on the binary matrix/tree combination that provides a useful quantification on the expected number of queries required to identify an object. [sent-62, score-0.51]
38 The reduction factor at any internal node ‘a’ in the tree is defined by ρa = max{πΘl(a) , πΘr(a) }/πΘa . [sent-65, score-0.535]
39 Given an object identification problem (B, Π), let T (B, Π) denote the set of decision trees that can uniquely identify all the objects in the set Θ. [sent-68, score-0.589]
40 For any decision tree T ∈ T (B, Π), let {ρa }a∈I denote the set of reduction factors and let di denote the number of queries required to identify object θi in the tree. [sent-70, score-0.872]
41 Then the expected number of queries required to identify an unknown object using a tree (or, the expected depth of a tree) is L1 (Π) = i πi di . [sent-71, score-0.972]
42 For any T ∈ T (B, Π), the expected number of queries required to identify an unknown object is given by L1 (Π) = H(Π) + πΘa [1 − H(ρa )]. [sent-75, score-0.685]
43 It also follows from the above result that a tree attains the entropy bound iff the reduction factors are equal to 0. [sent-80, score-0.286]
44 Instead, we may take a top down approach and minimize the objective function by minimizing the term Ca := πΘa [1 − H(ρa )] at each internal node, starting from the root node. [sent-85, score-0.262]
45 Note that the only term that depends on the query chosen at node ‘a’ in this cost function is ρa . [sent-86, score-0.388]
46 , choosing a split as balanced as possible) at each internal node a ∈ I. [sent-89, score-0.326]
47 In the next section, we show how this framework can be extended to derive greedy algorithms for the problems of group identification and object identification with exponential costs. [sent-91, score-0.636]
48 1 Extensions of GBS Group Identification In group identification1 , the goal is not to determine the unknown object θ ∈ Θ, rather the group to which it belongs, in as few queries as possible. [sent-93, score-0.987]
49 Here, in addition to B and Π, the group labels for the objects are also provided, where the groups are assumed to be disjoint. [sent-94, score-0.457]
50 We denote a group identification problem by (B, Π, y), where y = (y1 , · · · , yM ) denotes the group labels of the objects, yi ∈ {1, · · · , K}. [sent-95, score-0.478]
51 It is important to note here that the group identification problem cannot be simply reduced to an object identification problem with groups {Θ1 , · · · , ΘK } as “meta objects,” since the objects within a group need not respond the same to each query. [sent-97, score-0.936]
52 For instance, consider the toy example shown in Figure 1 where the objects θ1 , θ2 and θ3 belonging to group 1 cannot be collapsed into a single meta object as these objects respond differently to queries q1 and q3 . [sent-98, score-1.07]
53 In this context, we also note that GBS can fail to produce a good solution for a group identification problem as it does not take the group labels into consideration while choosing queries. [sent-99, score-0.478]
54 Once again, consider the toy example shown in Figure 1 where query q2 is sufficient to identify the group of an unknown object, whereas GBS requires 2 queries to identify the group when the unknown object is either θ2 or θ4 . [sent-100, score-1.416]
55 [23] simultaneously studied the problem of group identification in the context of object identification with persistent noise. [sent-104, score-0.467]
56 25 0 Figure 2: Decision tree constructed using GBS Figure 1: Toy Example Note that when constructing a tree for group identification, a greedy, top-down algorithm terminates splitting when all the objects at the node belong to the same group. [sent-113, score-0.88]
57 Hence, a tree constructed in this fashion can have multiple objects ending in the same leaf node and multiple leaves ending in the same group. [sent-114, score-0.595]
58 For a tree with L leaves, we denote by Lk ⊂ L = {1, · · · , L} the set of leaves that terminate in group k. [sent-115, score-0.402]
59 Similar to Θk ⊆ Θ, we denote by Θk ⊆ Θa the set of objects belonging to a group k that reach node ‘a’ in a tree. [sent-116, score-0.533]
60 Also, in addition to the reduction factor defined in Section 2, we define a new parameter called the group reduction factor for each group k ∈ {1, · · · , K} at each internal node. [sent-117, score-0.758]
61 Let T be a decision tree constructed on a group identification problem (B, Π, y). [sent-119, score-0.452]
62 The group reduction factor for any group k at an internal node ‘a’ is defined by ρk = max{πΘk , πΘk }/πΘk . [sent-120, score-0.852]
63 a a l(a) r(a) Given (B, Π, y), let T (B, Π, y) denote the set of decision trees that can uniquely identify the groups of all objects in the set Θ. [sent-121, score-0.417]
64 For any decision tree T ∈ T (B, Π, y), let dj denote the depth of leaf node j ∈ L. [sent-122, score-0.556]
65 Let random variable X denote the number of queries required to identify the group of an unknown object θ. [sent-123, score-0.879]
66 Then, the expected number of queries required to identify the group of an unknown object using the given tree is equal to K K πΘj L1 (Π) = Pr(θ ∈ Θk ) E[X|θ ∈ Θk ] = πΘk dj (3) πΘk k k=1 k=1 j∈L Theorem 2. [sent-124, score-1.113]
67 5 and the group reduction factors {ρk }K are equal to 1 at every internal node in the tree. [sent-129, score-0.601]
68 Also, a k=1 note that the result in Theorem 1 is a special case of this result where each group is of size 1 leading to ρk = 1 for all groups at every internal node. [sent-130, score-0.459]
69 Hence, we may take a top-down approach and minimize the objective function greedily by minimizing the term π Θk K πΘa [1 − H(ρa ) + k=1 πΘa H(ρk )] at each internal node, starting from the root node. [sent-133, score-0.318]
70 Note that a a the terms that depend on the query chosen at node ‘a’ are ρa and ρk . [sent-134, score-0.33]
71 Hence the algorithm reduces a π Θk K to minimizing Ca := 1 − H(ρa ) + k=1 πΘa H(ρk ) at each internal node ‘a’. [sent-135, score-0.413]
72 2 Exponential Costs Now assume the cost of identifying an object is defined by Lλ (Π) := logλ ( i πi λdi ), where λ > 1 and di corresponds to the depth of object θi in a tree. [sent-141, score-0.7]
73 In the limiting case where λ tends to 1 and ∞, this cost function reduces to the average depth and worst case depth, respectively. [sent-142, score-0.254]
74 Here, we use a result by Campbell [26] which states that the exponential cost Lλ (Π) of any tree T is bounded below by the α-R´ nyi entropy, given by e 1 1 α Hα (Π) := 1−α log2 ( i πi ), where α = 1+log λ . [sent-146, score-0.344]
75 We consider a general object identification 2 problem and derive an explicit formula for the gap in this lower bound. [sent-147, score-0.3]
76 We then use this formula to derive a family of greedy algorithms that minimize the exponential cost function Lλ (Π) for λ > 1. [sent-148, score-0.29]
77 Note that the entropy bound reduces to the Shannon entropy H(Π) and log2 M , in the limiting cases where λ tends to 1 and ∞, respectively. [sent-149, score-0.315]
78 This result suggests a top-down greedy approach to minimize Lλ (Π), πΘl(a) πΘr(a) which is to minimize the term (λ − 1)λda − Dα (Θa ) + πΘ Dα (Θl(a) ) + πΘ Dα (Θr(a) ) at a a each internal node, starting from the root node. [sent-153, score-0.354]
79 Noting that the terms that depend on the query chosen at node ‘a’ are πΘl(a) , πΘr(a) , Dα (Θl(a) ) and Dα (Θr(a) ), this reduces to minimizing Ca := πΘl(a) πΘ r(a) Dα (Θl(a) ) + πΘ Dα (Θr(a) ) at each internal node. [sent-154, score-0.581]
80 5, 1] Figure 6: Expected number of queries required to identify the group of an object using GBS and GGBS for different values of β when α = 1 3. [sent-169, score-0.82]
81 3 Group Identification with Exponential Costs Finally, we complete our discussion by considering the problem of group identification with exponential costs. [sent-170, score-0.296]
82 Here, the cost of identifying the group of an object given a tree T ∈ T (B, Π, y), is dj defined to be Lλ (Π) = logλ , which reduces to (3) in the limiting case as λ → 1, j∈L πΘj λ and to maxj∈L dj , i. [sent-171, score-0.927]
83 This result also implies a top-down, greedy algorithm to minimize Lλ (Π), which is to choose a query that minimizes πΘl(a) πΘr(a) Ca := πΘ Dα (Θl(a) ) + πΘ Dα (Θr(a) ) at each internal node. [sent-178, score-0.487]
84 Once again, it can be shown by a a the application of L’Hˆ pital’s rule that in the limiting case where λ → 1, this reduces to GGBS, and o in the case where λ → ∞, this reduces to choosing a query that minimizes the maximum number of groups in the child nodes [27]. [sent-179, score-0.498]
85 Here, γw (q) reflects the correlation of the object responses within a group, and γb (q) captures the correlation of object responses between groups. [sent-184, score-0.58]
86 5, each object within a group is equally likely to exhibit 0 or 1 as its response to query q, whereas, when it is close to 1, most of the objects within a group are highly likely to exhibit the same query response. [sent-186, score-1.246]
87 Given these correlation values (γw (q), γb (q)) for a query q, the object responses to query q (i. [sent-189, score-0.626]
88 For each group k ∈ {1, · · · , K}, assign a binary label bk , where bk = x with probability γb (q) 3. [sent-193, score-0.333]
89 For each object in group k, assign bk as the object response to q with probability γw (q) Given the correlation parameters (γw (q), γb (q)), ∀q ∈ Q, a random dataset can be created by following the above procedure for each query. [sent-194, score-0.783]
90 We demonstrate the results on datasets of size N = 200 (# of queries) and M = 400 (# of objects), where we randomly partitioned the objects into 15 groups and assumed a uniform prior on the objects. [sent-196, score-0.294]
91 As the correlation parameter γw (q) tends to 1, choosing that query keeps the groups intact, i. [sent-210, score-0.292]
92 , the group reduction factors ρk tend to 1 for these queries. [sent-212, score-0.275]
93 Such queries offer significant gains a in group identification, but can be overlooked by GBS. [sent-213, score-0.461]
94 7 5 Conclusions In this paper, we show that generalized binary search (GBS) is a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. [sent-230, score-0.579]
95 First, we consider the case where the objects are partitioned into groups, and the goal is to identify only the group of the unknown object. [sent-232, score-0.551]
96 Second, we consider the problem where the cost of identifying an object grows exponentially in the number of queries. [sent-233, score-0.371]
97 Now, we note from Lemma 1 that λda πΘa =⇒ λLλ (Π) = 1 + Lλ = a∈I (λ − 1)λda πΘa (7) a∈I where da denotes the depth of internal node ‘a’ in the tree T . [sent-249, score-0.643]
98 The function Lλ can be decomposed over the internal nodes in a tree T , as Lλ = da a∈I λ πΘa , where da denotes the depth of internal node a ∈ I and πΘa is the probability mass of the objects at that node. [sent-253, score-1.133]
99 The function Hα can be decomposed over the internal nodes in a tree T , as 1 πΘa Dα (Θa ) − πΘl(a) Dα (Θl(a) ) − πΘr(a) Dα (Θr(a) ) Hα = 1 α K α a∈I k=1 πΘk where Dα (Θa ) := internal node a ∈ I. [sent-255, score-0.679]
100 K k=1 πΘk a πΘa α 1 α and πΘa denotes the probability mass of the objects at any The above two lemmas can be proved using induction over subtrees rooted at any internal node ‘a’ in the tree. [sent-256, score-0.507]
wordName wordTfidf (topN-words)
[('gbs', 0.663), ('object', 0.238), ('queries', 0.232), ('group', 0.229), ('ggbs', 0.176), ('query', 0.168), ('internal', 0.164), ('node', 0.162), ('objects', 0.142), ('tree', 0.141), ('identi', 0.133), ('shannon', 0.115), ('greedy', 0.102), ('entropy', 0.099), ('da', 0.097), ('identify', 0.087), ('depth', 0.079), ('nyi', 0.078), ('huffman', 0.07), ('exponential', 0.067), ('diagnosis', 0.067), ('beta', 0.066), ('groups', 0.066), ('bhavnani', 0.062), ('decision', 0.062), ('qa', 0.06), ('unknown', 0.059), ('cost', 0.058), ('dj', 0.058), ('greedily', 0.056), ('identifying', 0.055), ('binary', 0.054), ('leaf', 0.054), ('costs', 0.052), ('active', 0.052), ('reduces', 0.051), ('nodes', 0.048), ('child', 0.048), ('coding', 0.048), ('bellala', 0.046), ('golovin', 0.046), ('reduction', 0.046), ('campbell', 0.04), ('limiting', 0.039), ('mass', 0.039), ('formula', 0.037), ('ca', 0.037), ('root', 0.036), ('trees', 0.036), ('minimizing', 0.036), ('cation', 0.035), ('pital', 0.035), ('qroot', 0.035), ('toxic', 0.035), ('expected', 0.035), ('required', 0.034), ('partitioned', 0.034), ('chemical', 0.032), ('respond', 0.032), ('leaves', 0.032), ('di', 0.032), ('correlation', 0.031), ('gowtham', 0.031), ('ww', 0.031), ('uniform', 0.03), ('scott', 0.029), ('minq', 0.028), ('toy', 0.028), ('minimizes', 0.027), ('tends', 0.027), ('search', 0.027), ('generalized', 0.027), ('meta', 0.027), ('suboptimal', 0.026), ('minimize', 0.026), ('skewed', 0.025), ('bk', 0.025), ('theorem', 0.025), ('gap', 0.025), ('exhibit', 0.025), ('uniquely', 0.024), ('dd', 0.024), ('codes', 0.024), ('splitting', 0.023), ('krause', 0.022), ('response', 0.022), ('constructing', 0.022), ('prior', 0.022), ('factor', 0.022), ('tu', 0.022), ('ending', 0.022), ('responses', 0.021), ('supplemental', 0.02), ('texas', 0.02), ('labels', 0.02), ('exponentially', 0.02), ('constructed', 0.02), ('evenly', 0.02), ('tx', 0.02), ('lim', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott
Abstract: Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. This interpretation is then used to extend GBS in two ways. First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. 1
2 0.31223473 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
Author: Daniel Golovin, Andreas Krause, Debajyoti Ray
Abstract: We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near–optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2 , a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non–uniform cost and their noise is correlated. We also propose E FF ECXTIVE , a particularly fast approximation of EC 2 , and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty. 1
3 0.13982476 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
Author: Samy Bengio, Jason Weston, David Grangier
Abstract: Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible. This problem can be alleviated by imposing (or learning) a structure over the set of classes. We propose an algorithm for learning a treestructure of classifiers which, by optimizing the overall tree loss, provides superior accuracy to existing tree labeling methods. We also propose a method that learns to embed labels in a low dimensional space that is faster than non-embedding approaches and has superior accuracy to existing embedding approaches. Finally we combine the two ideas resulting in the label embedding tree that outperforms alternative methods including One-vs-Rest while being orders of magnitude faster. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
5 0.11869314 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
Author: Jun Liu, Jieping Ye
Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.
6 0.11751692 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
7 0.11559714 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
8 0.11067326 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
9 0.091837436 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
10 0.088846438 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
11 0.086294971 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
12 0.085114986 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
13 0.076994546 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
14 0.075310409 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
15 0.075000279 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
16 0.071823105 144 nips-2010-Learning Efficient Markov Networks
17 0.071230918 1 nips-2010-(RF)^2 -- Random Forest Random Field
18 0.070820816 149 nips-2010-Learning To Count Objects in Images
19 0.070087142 150 nips-2010-Learning concept graphs from text with stick-breaking priors
20 0.068106003 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
topicId topicWeight
[(0, 0.185), (1, 0.045), (2, -0.002), (3, -0.068), (4, -0.09), (5, -0.039), (6, -0.088), (7, 0.006), (8, 0.073), (9, -0.046), (10, 0.077), (11, 0.131), (12, -0.262), (13, -0.043), (14, 0.007), (15, -0.108), (16, -0.052), (17, -0.088), (18, -0.067), (19, 0.038), (20, -0.032), (21, 0.1), (22, 0.036), (23, -0.17), (24, -0.095), (25, -0.141), (26, -0.036), (27, 0.12), (28, 0.027), (29, -0.044), (30, -0.088), (31, 0.047), (32, 0.087), (33, 0.085), (34, 0.164), (35, 0.02), (36, -0.006), (37, -0.114), (38, -0.244), (39, -0.076), (40, -0.116), (41, 0.068), (42, 0.056), (43, -0.056), (44, 0.06), (45, 0.018), (46, 0.048), (47, -0.063), (48, -0.047), (49, -0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.96045196 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott
Abstract: Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. This interpretation is then used to extend GBS in two ways. First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. 1
2 0.74299479 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
Author: Daniel Golovin, Andreas Krause, Debajyoti Ray
Abstract: We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise–free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near–optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2 , a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non–uniform cost and their noise is correlated. We also propose E FF ECXTIVE , a particularly fast approximation of EC 2 , and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty. 1
3 0.46379 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
Author: Jun Liu, Jieping Ye
Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
5 0.41977078 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
Author: Han Liu, Xi Chen
Abstract: We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1
6 0.40570125 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
7 0.39941663 144 nips-2010-Learning Efficient Markov Networks
8 0.39736116 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
9 0.39349902 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets
10 0.37221029 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
11 0.36904886 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
12 0.34988561 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
13 0.34657705 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
14 0.33481222 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs
15 0.33293471 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem
16 0.33258894 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
17 0.31657365 1 nips-2010-(RF)^2 -- Random Forest Random Field
18 0.30686286 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
19 0.30630529 168 nips-2010-Monte-Carlo Planning in Large POMDPs
20 0.30474821 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
topicId topicWeight
[(13, 0.04), (27, 0.061), (30, 0.116), (35, 0.019), (36, 0.174), (45, 0.264), (50, 0.032), (52, 0.042), (60, 0.058), (77, 0.038), (78, 0.013), (90, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.91021395 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
Author: Gowtham Bellala, Suresh Bhavnani, Clayton Scott
Abstract: Generalized Binary Search (GBS) is a well known greedy algorithm for identifying an unknown object while minimizing the number of “yes” or “no” questions posed about that object, and arises in problems such as active learning and active diagnosis. Here, we provide a coding-theoretic interpretation for GBS and show that GBS can be viewed as a top-down algorithm that greedily minimizes the expected number of queries required to identify an object. This interpretation is then used to extend GBS in two ways. First, we consider the case where the objects are partitioned into groups, and the objective is to identify only the group to which the object belongs. Then, we consider the case where the cost of identifying an object grows exponentially in the number of queries. In each case, we present an exact formula for the objective function involving Shannon or R´ nyi entropy, and e develop a greedy algorithm for minimizing it. 1
2 0.86001176 243 nips-2010-Smoothness, Low Noise and Fast Rates
Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1
3 0.85829806 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
4 0.85560256 280 nips-2010-Unsupervised Kernel Dimension Reduction
Author: Meihong Wang, Fei Sha, Michael I. Jordan
Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.
5 0.85550368 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
6 0.8542906 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
7 0.85424453 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
8 0.85407847 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
9 0.85350412 233 nips-2010-Scrambled Objects for Least-Squares Regression
10 0.85302943 290 nips-2010-t-logistic regression
11 0.8530038 134 nips-2010-LSTD with Random Projections
12 0.85282809 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization
13 0.85282159 27 nips-2010-Agnostic Active Learning Without Constraints
14 0.85221165 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
15 0.85219491 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning
16 0.85162741 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
17 0.85139811 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
18 0.85081542 148 nips-2010-Learning Networks of Stochastic Differential Equations
19 0.8497802 287 nips-2010-Worst-Case Linear Discriminant Analysis
20 0.84964502 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models