nips nips2011 nips2011-177 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aleksandrs Slivkins
Abstract: The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Multi-armed bandits on implicit metric spaces Aleksandrs Slivkins Microsoft Research Silicon Valley Mountain View, CA 94043 slivkins at microsoft. [sent-1, score-0.201]
2 In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. [sent-3, score-0.24]
3 While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. [sent-4, score-0.273]
4 In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. [sent-5, score-0.417]
5 One standard way to evaluate the performance of a multi-armed bandit algorithm is regret, defined as the difference between the expected payoff of an optimal arm and that of the algorithm. [sent-17, score-0.366]
6 By now the multi-armed bandit problem with a small finite number of arms is quite well understood (e. [sent-18, score-0.366]
7 However, if the set of arms is exponentially or infinitely large, the problem becomes intractable, unless we make further assumptions about the problem instance. [sent-21, score-0.222]
8 The bandit problems with large sets of arms have received a considerable attention, e. [sent-23, score-0.366]
9 1 In particular, the line of work [1, 17, 4, 20, 7, 19] considers the Lipschitz MAB problem, a broad and natural bandit setting in which the structure is induced by a metric on the set of arms. [sent-30, score-0.225]
10 Payoffs are stochastic: the payoff from choosing arm x is an independent random sample with expectation µ(x). [sent-32, score-0.222]
11 3 Specifically, following [21, 24, 25] we assume that an algorithm is (only) given a taxonomy on arms: a tree-based classification modeled by a rooted tree T whose leaf set is X. [sent-37, score-0.39]
12 The idea is that any two arms in the same subtree are likely to have similar payoffs. [sent-38, score-0.366]
13 One natural way to quantify the extent of similarity between arms in a given subtree is via the maximum difference in expected payoffs. [sent-45, score-0.399]
14 Specifically, for each internal node v we define the width of the corresponding subtree T (v) to be W(v) = supx,y∈X(v) |µ(x) − µ(y)|, where X(v) is the set of leaves in T (v). [sent-46, score-0.36]
15 Note that the subtree widths are non-increasing from root to leaves. [sent-47, score-0.275]
16 A standard notion of distance induced by subtree widths, henceforth called implicit distance, is as follows: Dimp (x, y) is the width of the least common ancestor of leaves x, y. [sent-48, score-0.24]
17 Thus, an instance (X, T , µ) of Taxonomy MAB naturally induces an instance (X, Dimp , µ) of Lipschitz MAB which (assuming the widths are strictly decreasing) encodes all relevant information. [sent-52, score-0.191]
18 The crucial distinction is that in Taxonomy MAB the metric space (X, Dimp ) is implicit: the subtree widths are not revealed to the algorithm. [sent-53, score-0.38]
19 We answer this question in the affirmative as long as the implicit metric space (X, Dimp ) has a small doubling constant (see Section 2 for a milder condition). [sent-58, score-0.205]
20 4 Our algorithm proceeds by estimating subtree widths of near-optimal subtrees. [sent-60, score-0.275]
21 These complications aside, our algorithm is similar to those in [17, 20] in that it maintains a partition of the space of arms into regions (in this case, subtrees) so that each region is treated as a “meta-arm”, and this partition is adapted to the high-payoff regions. [sent-64, score-0.248]
22 We assume stochastic payoffs [2]: in each round t the algorithm chooses a point x = xt ∈ X and observes an independent random sample from a payoff distribution Ppayoff (x) with support [0, 1] and expectation µ(x). [sent-72, score-0.231]
23 The goal is to minimize regret with respect to the best expected arm: R(T ) µ∗ T − E T t=1 µ(xt ) = E T t=1 ∆(xt ) , where µ∗ supx∈X µ(x) is the maximal expected payoff, and ∆(x) of arm x. [sent-74, score-0.218]
24 (2) µ∗ − µ(x), is the “badness” We will assume that the number of arms is finite (but possibly very large). [sent-76, score-0.222]
25 Extension to infinitely many arms (which does not require new algorithmic ideas) is not included to simplify presentation. [sent-77, score-0.222]
26 Our guarantees are in terms of the zooming dimension [20] of (X, Dimp , µ), a concept that takes into account both the dimensionality of the metric space and the “goodness” of the payoff function. [sent-79, score-0.383]
27 The zooming dimension of a problem instance I = (X, T , µ), with multiplier c, is ZoomDim(I, c) cov inf{d ≥ 0 : Nδ/8 (Xδ ) ≤ c δ −d ∀δ > 0}. [sent-85, score-0.272]
28 (3) cov In other words, we consider a covering property Nδ/8 (Xδ ) ≤ c δ −d , and define the zooming dimension as the smallest d such that this covering property holds for all δ > 0. [sent-86, score-0.337]
29 The zooming dimension essentially coincides with the covering dimension of (X, D) 6 for the worst-case payoff function µ, but can be (much) smaller when µ is “benign”. [sent-87, score-0.352]
30 In particular, zooming dimension would “ignore” a subtree with high covering dimension but significantly sub-optimal payoffs. [sent-88, score-0.393]
31 (In our case, any subtree can be covered by k subtrees of half the width. [sent-90, score-0.216]
32 Note that the definition of zooming dimension allows a trade-off between c and d, and we obtain the optimal trade-off since (4) holds for all values of c at once. [sent-97, score-0.189]
33 Recall that in Taxonomy MAB subtree widths are not revealed, and the algorithm has to use sampling to estimate them. [sent-113, score-0.275]
34 Informally, the taxonomy is useful for our purposes if and only if subtree widths can be efficiently estimated using random sampling. [sent-114, score-0.498]
35 We use simple random sampling: start at a tree node v and choose a branch uniformly at random at each junction. [sent-116, score-0.292]
36 The quality of the taxonomy for a given problem instance is the largest number q ∈ (0, 1) with the following property: for each subtree T (v) containing an optimal arm there exist tree nodes u, u ∈ T (v) such that P(u|v) and P(u |v) are at least q and |µ(u) − µ(u )| ≥ 1 2 W(v). [sent-122, score-0.758]
37 The definition is flexible: it allows u and u to be at different depth (which is useful if node degrees are large and non-uniform) and the widths of other internal nodes in T (v) cannot adversely impact quality. [sent-126, score-0.367]
38 For a particularly simple example, consider a binary taxonomy such that for each subtree T (v) containing an optimal arm there exist grandchildren u, u of v that satisfy (5). [sent-128, score-0.486]
39 Then TaxonomyZoom achieves (4) with KI = deg log |X|, where deg is the degree of the taxonomy. [sent-134, score-0.206]
40 1 follows because, letting cDBL be the doubling constant of (X, Dimp ), all node degrees are at most cDBL and moreover quality ≥ 2−cDBL (we omit the proof from this version). [sent-136, score-0.258]
41 3 is instance-dependent: it depends on deg/quality and ZoomDim, and is meaningful only if these quantities are small compared to the number of arms (informally, we will call such problem instances “benign”). [sent-139, score-0.222]
42 For benign problem instances, the benefit of using the taxonomy is the vastly improved dependence on the number of arms. [sent-142, score-0.251]
43 Without a taxonomy or any other structure, regret of any algorithm for stochastic MAB scales linearly in the number of (near-optimal) arms, for a sufficiently large t. [sent-143, score-0.322]
44 δ Specifically, let Nδ be the number of arms x such that 2 < ∆(x) ≤ δ. [sent-144, score-0.222]
45 Here for each tree node v there is a separate, independent copy of UCB 1 [2] or a UCB 1-style index algorithm (call it Av ), so that the “arms” for Av correspond to children u of v, and selecting a child u in a given round corresponds to playing Au in this round. [sent-147, score-0.378]
46 However, regret bounds for these algorithms do not scale as well with the number of arms: even if the tree widths are given, then letting ∆min minx∈X: ∆(x)>0 ∆(x), the regret bound is proportional to |Xδ |/∆min (where Xδ is as in Definition 1. [sent-149, score-0.496]
47 In each round the algorithm selects one of the tree nodes, say v, and plays a randomly sampled arm x from T (v). [sent-154, score-0.347]
48 We say that a subtree T (u) is hit in this round if u ∈ T (v) and x ∈ T (u). [sent-155, score-0.251]
49 For each tree node v and time t, let nt (v) be the number of times the subtree T (v) has been hit by the algorithm before time t, and let µt (v) be the corresponding average reward. [sent-156, score-0.573]
50 Define the confidence radius of v at time t as radt (v) 8 log(Thor |X|) / (2 + nt (v)). [sent-158, score-0.351]
51 (6) The meaning of the confidence radius is that |µt (v) − µ(v)| ≤ radt (v) with high probability. [sent-159, score-0.235]
52 For each tree node v and time t, let us define the index of v at time t as It (v) µt (v) + (1 + 2 kA ) radt (v), where kA 4 2/q. [sent-160, score-0.527]
53 Throughout the phase, some tree nodes are designated active. [sent-165, score-0.206]
54 We maintain the following invariant: Wt (v) < kA radt (v) for each active internal node v. [sent-166, score-0.442]
55 (S2) Select an active tree node v with the maximal index (7), breaking ties arbitrarily. [sent-170, score-0.327]
56 Note that each arm is activated and deactivated at most once. [sent-172, score-0.234]
57 If an explicit representation of the taxonomy can be stored in memory, then the following simple implementation is possible. [sent-174, score-0.223]
58 For each tree node v, we store several statistics: nt , µt , Ut and Lt . [sent-175, score-0.408]
59 Suppose in a given round t, a subtree v is chosen, and an arm x is played. [sent-177, score-0.349]
60 TaxonomyZoom can be implemented with O(1) storage per each tree node, so that in each round the time complexity is O(N + depth(x)), where N is the number of arms activated in step (S1), and x is the arm chosen in step (S3). [sent-186, score-0.596]
61 Sometimes it may be feasible (and more space-efficient) to represent the taxonomy implicitly, so that a tree node is expanded only if needed. [sent-187, score-0.515]
62 Specifically, suppose the following interface is provided: given a tree node v, return all its children and an arbitrary arm x ∈ T (v). [sent-188, score-0.411]
63 Then TaxonomyZoom can be implemented so that it only stores the statistics for each node u such that P(u|v) ≥ q for some active node v (rather than for all tree nodes). [sent-189, score-0.477]
64 An execution of TaxonomyZoom is called clean if for each time t ≤ T and all tree nodes v the following two properties hold: (P1) |µt (v) − µ(v)| ≤ radt (v) as long as nt (v) > 0. [sent-204, score-0.707]
65 (P2) If u ∈ T (v) then nt (v) P(u|v) ≥ 8 log T ⇒ nt (u) ≥ 1 2 nt (v) P(u|v). [sent-205, score-0.348]
66 For part (P1), fix a tree node v and let ζj to be the payoff in the j-th round that v has been n n 1 ¯ hit. [sent-211, score-0.481]
67 Recall that by definition of W(·), µ(v) ≤ µ(u) + W(v) for any tree node u ∈ T (v). [sent-222, score-0.292]
68 Then the following holds: in any round t ≤ Thor , at any point in the execution such that the invariant (9) holds, there exists an active tree node v ∗ such that It (v ∗ ) ≥ µ∗ . [sent-228, score-0.513]
69 Note that in each round t, there exist an active tree node vt ∗ ∗ such that x ∈ T (v ). [sent-231, score-0.413]
70 ) Fix round t and the corresponding tree node v ∗ = vt . [sent-233, score-0.378]
71 Then for each tree node v radt (v) ≤ ∆/8 ⇐⇒ nt (v) ≥ f (∆). [sent-237, score-0.643]
72 By (13), this is equivalent to ∆ ≥ 2 kA radt (v ∗ ). [sent-239, score-0.235]
73 Note that nt (v ∗ ) ≥ (2/q) f (∆) by our assumption on kA , so by property (P2) in the definition of the clean execution, for each node vj , j ∈ {0, 1} we have nt (vj ) ≥ f (∆), which implies radt (vj ) ≤ ∆/8. [sent-240, score-0.732]
74 It follows that Wt (v ∗ ) ≥ kA radt (v ∗ ), which violates Invariant (9). [sent-242, score-0.235]
75 Using (12), we have ∆(v ∗ ) ≤ W(v ∗ ) < 2 kA radt (v ∗ ) and It (v ∗ ) ≥ µ(v ∗ ) + 2 kA radt (v ∗ ) ≥ µ∗ , (14) where the first inequality in (14) holds by definition (7) and property (P1) of a clean execution. [sent-244, score-0.594]
76 11 To make ζn well-defined for any n ≤ Thor , consider a hypothetical algorithm which coincides with TaxonomyZoom for the first Thor rounds and then proceeds so that each tree node is selected Thor times. [sent-245, score-0.292]
77 3 to show that the algorithm does not activate too many tree nodes with large badness ∆(·), and each such node is not played too often. [sent-247, score-0.406]
78 For each tree node v, let N (v) be the number of times node v was selected in step (S2) of the algorithm. [sent-248, score-0.442]
79 We partition all positive tree nodes and all deactivated tree nodes into sets Si = {positive tree nodes v : 2−i < ∆(v) ≤ 2−i+1 }, ∗ Si = {deactivated tree nodes v : 2−i < 4 W(v) ≤ 2−i+1 }. [sent-250, score-0.912]
80 (a) (b) (c) (d) 2 for each tree node v we have N (v) ≤ O(kA log Thor ) ∆−2 (v). [sent-254, score-0.292]
81 For part (a), fix an arbitrary tree node v and let t be the last time v was selected in step (S2) of the algorithm. [sent-259, score-0.292]
82 3, at that point in the execution there was a tree node v ∗ such that It (v ∗ ) ≥ µ∗ . [sent-261, score-0.366]
83 Then using the selection rule (step (S2)) and the definition of index (7), we have µ∗ ≤ It (v ∗ ) ≤ It (v) ≤ µ(v) + (2 + 2 kA ) radt (v), ∆(v) ≤ (2 + 2 kA ) radt (v). [sent-262, score-0.47]
84 For part (b), suppose tree node v was de-activated at time s. [sent-264, score-0.292]
85 Then W(v) ≥ Ws (v) ≥ kA rs (v) ≥ 1 3 (2 + 2 kA ) radt (v) ≥ 1 3 ∆(v). [sent-266, score-0.235]
86 (16) Indeed, the first inequality in (16) holds since we are in a clean execution, the second inequality in (16) holds because v was de-activated, the third inequality holds because ns (v) = nt (v) + 1, and the last inequality in (16) holds by (15). [sent-267, score-0.384]
87 For each arm x ∈ X in subtree T (v) we have, by part (b), ∆(x) ≤ ∆(v) + W(v) ≤ 4 W(v) ≤ 2−i+1 , so x ∈ Yi and therefore is contained in some T (vj ). [sent-275, score-0.263]
88 , vK }, where two nodes u, v are connected by a directed edge (u, v) if there is a path from u to v in the tree T . [sent-284, score-0.229]
89 Since in any directed tree of out-degree ≥ 2 the number of nodes is at most twice the number of leaves, G contains at most Ki internal nodes. [sent-289, score-0.251]
90 For part (d), let us fix i and consider a positive tree node u ∈ Si . [sent-291, score-0.292]
91 Since N (u) > 0, either u is active at time Thor , or it was deactivated in some round before Thor . [sent-292, score-0.209]
92 For each tree node v, define its family as the set which consists of u itself and all its children. [sent-296, score-0.292]
93 We have proved that each positive node u ∈ Si belongs to the family of some deactivated node ∗ v ∈ ∪i+1 Sj . [sent-297, score-0.388]
94 For any δ0 = 2−i0 we have R(T ) ≤ tree nodes v N (v) ∆(v) = v: ∆(v)<δ0 ≤ δ0 T + i≤i0 N (v) ∆(v) + v∈Si v: ∆(v)≥δ0 N (v) ∆(v) ≤ δ0 T + N (v) ∆(v) i≤i0 K 2(i+2)(1+d) ≤ δ0 T + O(K) ( δ8 )(1+d) . [sent-306, score-0.206]
95 For instance, if we take qi = 2−αi for some α ∈ (0, 1) then this meta-algorithm has regret R(T ) ≤ O(c deg log T )1/(2+d) × T 1−(1−α)/(2+d) where d = ZoomDim(I, c), for any given c > 0. [sent-313, score-0.202]
96 Further, we have shown how our approach results in stronger provable guarantees than alternative algorithms such as tree bandit algorithms [21, 24]. [sent-327, score-0.322]
97 Most interestingly, a number of applications recently posed as MAB problems over large sets of arms – including learning to rank online advertisements or web documents (e. [sent-331, score-0.257]
98 [26, 29]) – naturally involve choosing among arms (e. [sent-333, score-0.222]
99 Selecting among, or combining from, a set of available hierarchical representations of arms poses interesting challenges. [sent-339, score-0.222]
100 3 to other structures that implicitly define a metric space on arms (in the sense of (1)). [sent-341, score-0.303]
wordName wordTfidf (topN-words)
[('thor', 0.441), ('mab', 0.382), ('radt', 0.235), ('taxonomy', 0.223), ('arms', 0.222), ('taxonomyzoom', 0.221), ('ka', 0.192), ('dimp', 0.191), ('node', 0.15), ('subtree', 0.144), ('bandit', 0.144), ('tree', 0.142), ('widths', 0.131), ('zooming', 0.129), ('arm', 0.119), ('cdbl', 0.118), ('zoomdim', 0.118), ('nt', 0.116), ('ki', 0.114), ('deg', 0.103), ('payoff', 0.103), ('regret', 0.099), ('deactivated', 0.088), ('round', 0.086), ('metric', 0.081), ('lipschitz', 0.08), ('clean', 0.076), ('execution', 0.074), ('doubling', 0.072), ('nodes', 0.064), ('si', 0.06), ('covering', 0.052), ('aleksandrs', 0.052), ('implicit', 0.052), ('subtrees', 0.051), ('robert', 0.048), ('lt', 0.045), ('width', 0.044), ('cov', 0.044), ('payoffs', 0.042), ('vj', 0.039), ('slivkins', 0.037), ('ut', 0.036), ('guarantees', 0.036), ('quality', 0.036), ('multiplier', 0.035), ('active', 0.035), ('online', 0.035), ('dimension', 0.034), ('similarity', 0.033), ('nition', 0.032), ('bandits', 0.031), ('wt', 0.031), ('lemma', 0.031), ('csaba', 0.03), ('nicol', 0.03), ('instance', 0.03), ('badness', 0.029), ('deepayan', 0.029), ('ppayoff', 0.029), ('rads', 0.029), ('sandeep', 0.029), ('vanja', 0.029), ('vki', 0.029), ('benign', 0.028), ('horizon', 0.028), ('activated', 0.027), ('invariant', 0.026), ('deepak', 0.026), ('filip', 0.026), ('complications', 0.026), ('pandey', 0.026), ('sylvain', 0.026), ('varsha', 0.026), ('preliminary', 0.026), ('soda', 0.026), ('ucb', 0.026), ('holds', 0.026), ('leaf', 0.025), ('minx', 0.025), ('kleinberg', 0.025), ('bound', 0.025), ('revealed', 0.024), ('conjecture', 0.024), ('clip', 0.024), ('chakrabarti', 0.024), ('brendan', 0.024), ('gelly', 0.024), ('directed', 0.023), ('internal', 0.022), ('inequality', 0.022), ('stoc', 0.021), ('covered', 0.021), ('crude', 0.021), ('activate', 0.021), ('radlinski', 0.021), ('hit', 0.021), ('auer', 0.021), ('dence', 0.02), ('needs', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 177 nips-2011-Multi-armed bandits on implicit metric spaces
Author: Aleksandrs Slivkins
Abstract: The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1
2 0.14446883 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits
Author: Yasin Abbasi-yadkori, Csaba Szepesvári, David Tax
Abstract: We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm for the for linear stochastic bandit problem studied by Auer (2002), Dani et al. (2008), Rusmevichientong and Tsitsiklis (2010), Li et al. (2010). Our modification improves the regret bound by a logarithmic factor, though experiments show a vast improvement. In both cases, the improvement stems from the construction of smaller confidence sets. For their construction we use a novel tail inequality for vector-valued martingales. 1
3 0.14255308 175 nips-2011-Multi-Bandit Best Arm Identification
Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck
Abstract: We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.
4 0.13917005 56 nips-2011-Committing Bandits
Author: Loc X. Bui, Ramesh Johari, Shie Mannor
Abstract: We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.
5 0.13308056 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo
Author: Alexandra Carpentier, Rémi Munos
Abstract: We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. 1
6 0.1271264 220 nips-2011-Prediction strategies without loss
7 0.11703935 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
8 0.10967711 61 nips-2011-Contextual Gaussian Process Bandit Optimization
9 0.10567511 245 nips-2011-Selecting the State-Representation in Reinforcement Learning
10 0.10314418 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits
11 0.10252242 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
12 0.098343641 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
13 0.095524788 226 nips-2011-Projection onto A Nonnegative Max-Heap
14 0.093530811 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
15 0.093443751 32 nips-2011-An Empirical Evaluation of Thompson Sampling
16 0.084512085 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations
17 0.081495911 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
18 0.076274596 305 nips-2011-k-NN Regression Adapts to Local Intrinsic Dimension
19 0.07566072 272 nips-2011-Stochastic convex optimization with bandit feedback
20 0.075432688 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
topicId topicWeight
[(0, 0.157), (1, -0.18), (2, -0.044), (3, 0.029), (4, 0.158), (5, -0.086), (6, -0.02), (7, 0.037), (8, 0.044), (9, -0.114), (10, -0.12), (11, 0.07), (12, -0.207), (13, 0.07), (14, 0.018), (15, 0.048), (16, -0.08), (17, -0.061), (18, -0.01), (19, -0.008), (20, 0.016), (21, -0.097), (22, -0.007), (23, -0.006), (24, -0.119), (25, -0.033), (26, -0.085), (27, -0.055), (28, -0.017), (29, -0.069), (30, -0.047), (31, 0.049), (32, 0.011), (33, 0.081), (34, 0.014), (35, 0.003), (36, -0.013), (37, 0.119), (38, 0.019), (39, -0.011), (40, -0.035), (41, -0.01), (42, 0.006), (43, -0.019), (44, 0.009), (45, -0.03), (46, -0.011), (47, -0.095), (48, -0.048), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.94426286 177 nips-2011-Multi-armed bandits on implicit metric spaces
Author: Aleksandrs Slivkins
Abstract: The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1
2 0.77870542 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
Author: Rémi Munos
Abstract: We consider a global optimization problem of a deterministic function f in a semimetric space, given a finite budget of n evaluations. The function f is assumed to be locally smooth (around one of its global maxima) with respect to a semi-metric ℓ. We describe two algorithms based on optimistic exploration that use a hierarchical partitioning of the space at all scales. A first contribution is an algorithm, DOO, that requires the knowledge of ℓ. We report a finite-sample performance bound in terms of a measure of the quantity of near-optimal states. We then define a second algorithm, SOO, which does not require the knowledge of the semimetric ℓ under which f is smooth, and whose performance is almost as good as DOO optimally-fitted.
3 0.66933382 175 nips-2011-Multi-Bandit Best Arm Identification
Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck
Abstract: We study the problem of identifying the best arm in each of the bandits in a multibandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variations of these algorithms that estimate this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.
4 0.66005558 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg
Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the first algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the first rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1
5 0.64968228 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo
Author: Alexandra Carpentier, Rémi Munos
Abstract: We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata. We provide two regret analyses: a distribution� dependent bound O(n−3/2 ) that depends on a measure of the disparity of � the strata, and a distribution-free bound O(n−4/3 ) that does not. 1
6 0.6411795 56 nips-2011-Committing Bandits
7 0.62128377 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
8 0.56488341 226 nips-2011-Projection onto A Nonnegative Max-Heap
9 0.54079157 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits
10 0.50199181 245 nips-2011-Selecting the State-Representation in Reinforcement Learning
11 0.49780044 32 nips-2011-An Empirical Evaluation of Thompson Sampling
12 0.49256724 272 nips-2011-Stochastic convex optimization with bandit feedback
13 0.45072857 274 nips-2011-Structure Learning for Optimization
14 0.43439677 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
15 0.42609659 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
16 0.41562164 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
17 0.4152526 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits
18 0.4072592 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations
19 0.40696326 61 nips-2011-Contextual Gaussian Process Bandit Optimization
20 0.3976382 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
topicId topicWeight
[(0, 0.032), (4, 0.041), (10, 0.013), (20, 0.045), (26, 0.028), (31, 0.062), (33, 0.015), (43, 0.053), (45, 0.086), (52, 0.288), (57, 0.083), (74, 0.042), (79, 0.053), (83, 0.026), (87, 0.01), (99, 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.72138804 177 nips-2011-Multi-armed bandits on implicit metric spaces
Author: Aleksandrs Slivkins
Abstract: The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1
2 0.52643865 80 nips-2011-Efficient Online Learning via Randomized Rounding
Author: Nicolò Cesa-bianchi, Ohad Shamir
Abstract: Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines “random playout” and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with trace-norm constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning. 1
3 0.51320899 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1
Author: Matthias S. Keil
Abstract: Many species show avoidance reactions in response to looming object approaches. In locusts, the corresponding escape behavior correlates with the activity of the lobula giant movement detector (LGMD) neuron. During an object approach, its firing rate was reported to gradually increase until a peak is reached, and then it declines quickly. The η-function predicts that the LGMD activity is a product ˙ between an exponential function of angular size exp(−Θ) and angular velocity Θ, and that peak activity is reached before time-to-contact (ttc). The η-function has become the prevailing LGMD model because it reproduces many experimental observations, and even experimental evidence for the multiplicative operation was reported. Several inconsistencies remain unresolved, though. Here we address ˙ these issues with a new model (ψ-model), which explicitly connects Θ and Θ to biophysical quantities. The ψ-model avoids biophysical problems associated with implementing exp(·), implements the multiplicative operation of η via divisive inhibition, and explains why activity peaks could occur after ttc. It consistently predicts response features of the LGMD, and provides excellent fits to published experimental data, with goodness of fit measures comparable to corresponding fits with the η-function. 1 Introduction: τ and η Collision sensitive neurons were reported in species such different as monkeys [5, 4], pigeons [36, 34], frogs [16, 20], and insects [33, 26, 27, 10, 38]. This indicates a high ecological relevance, and raises the question about how neurons compute a signal that eventually triggers corresponding movement patterns (e.g. escape behavior or interceptive actions). Here, we will focus on visual stimulation. Consider, for simplicity, a circular object (diameter 2l), which approaches the eye at a collision course with constant velocity v. If we do not have any a priori knowledge about the object in question (e.g. its typical size or speed), then we will be able to access only two information sources. These information sources can be measured at the retina and are called optical variables (OVs). The first is the visual angle Θ, which can be derived from the number of stimulated photore˙ ˙ ceptors (spatial contrast). The second is its rate of change dΘ(t)/dt ≡ Θ(t). Angular velocity Θ is related to temporal contrast. ˙ How should we combine Θ and Θ in order to track an imminent collision? The perhaps simplest ˙ combination is τ (t) ≡ Θ(t)/Θ(t) [13, 18]. If the object hit us at time tc , then τ (t) ≈ tc − t will ∗ Also: www.ir3c.ub.edu, Research Institute for Brain, Cognition, and Behaviour (IR3C) Edifici de Ponent, Campus Mundet, Universitat de Barcelona, Passeig Vall d’Hebron, 171. E-08035 Barcelona 1 give us a running estimation of the time that is left until contact1 . Moreover, we do not need to know anything about the approaching object: The ttc estimation computed by τ is practically independent of object size and velocity. Neurons with τ -like responses were indeed identified in the nucleus retundus of the pigeon brain [34]. In humans, only fast interceptive actions seem to rely exclusively on τ [37, 35]. Accurate ttc estimation, however, seems to involve further mechanisms (rate of disparity change [31]). ˙ Another function of OVs with biological relevance is η ≡ Θ exp(−αΘ), with α = const. [10]. While η-type neurons were found again in pigeons [34] and bullfrogs [20], most data were gathered from the LGMD2 in locusts (e.g. [10, 9, 7, 23]). The η-function is a phenomenological model for the LGMD, and implies three principal hypothesis: (i) An implementation of an exponential function exp(·). Exponentation is thought to take place in the LGMD axon, via active membrane conductances [8]. Experimental data, though, seem to favor a third-power law rather than exp(·). (ii) The LGMD carries out biophysical computations for implementing the multiplicative operation. It has been suggested that multiplication is done within the LGMD itself, by subtracting the loga˙ rithmically encoded variables log Θ − αΘ [10, 8]. (iii) The peak of the η-function occurs before ˆ ttc, at visual angle Θ(t) = 2 arctan(1/α) [9]. It follows ttc for certain stimulus configurations (e.g. ˆ l/|v| 5ms). In principle, t > tc can be accounted for by η(t + δ) with a fixed delay δ < 0 (e.g. −27ms). But other researchers observed that LGMD activity continuous to rise after ttc even for l/|v| 5ms [28]. These discrepancies remain unexplained so far [29], but stimulation dynamics perhaps plays a role. We we will address these three issues by comparing the novel function “ψ” with the η-function. LGMD computations with the ψ-function: No multiplication, no exponentiation 2 A circular object which starts its approach at distance x0 and with speed v projects a visual angle Θ(t) = 2 arctan[l/(x0 − vt)] on the retina [34, 9]. The kinematics is hence entirely specified by the ˙ half-size-to-velocity ratio l/|v|, and x0 . Furthermore, Θ(t) = 2lv/((x0 − vt)2 + l2 ). In order to define ψ, we consider at first the LGMD neuron as an RC-circuit with membrane potential3 V [17] dV Cm = β (Vrest − V ) + gexc (Vexc − V ) + ginh (Vinh − V ) (1) dt 4 Cm = membrane capacity ; β ≡ 1/Rm denotes leakage conductance across the cell membrane (Rm : membrane resistance); gexc and ginh are excitatory and inhibitory inputs. Each conductance gi (i = exc, inh ) can drive the membrane potential to its associated reversal potential Vi (usually Vinh ≤ Vexc ). Shunting inhibition means Vinh = Vrest . Shunting inhibition lurks “silently” because it gets effective only if the neuron is driven away from its resting potential. With synaptic input, the neuron decays into its equilibrium state Vrest β + Vexc gexc + Vinh ginh V∞ ≡ (2) β + gexc + ginh according to V (t) = V∞ (1 − exp(−t/τm )). Without external input, V (t 1) → Vrest . The time scale is set by τm . Without synaptic input τm ≡ Cm /β. Slowly varying inputs gexc , ginh > 0 modify the time scale to approximately τm /(1 + (gexc + ginh )/β). For highly dynamic inputs, such as in late phase of the object approach, the time scale gets dynamical as well. The ψ-model assigns synaptic inputs5 ˙ ˙ ˙ ˙ gexc (t) = ϑ(t), ϑ(t) = ζ1 ϑ(t − ∆tstim ) + (1 − ζ1 )Θ(t) (3a) e ginh (t) = [γϑ(t)] , ϑ(t) = ζ0 ϑ(t − ∆tstim ) + (1 − ζ0 )Θ(t) 1 (3b) This linear approximation gets worse with increasing Θ, but turns out to work well until short before ttc (τ adopts a minimum at tc − 0.428978 · l/|v|). 2 LGMD activity is usually monitored via its postsynaptic neuron, the Descending Contralateral Movement Detector (DCMD) neuron. This represents no problem as LGMD spikes follow DCMD spikes 1:1 under visual stimulation [22] from 300Hz [21] to at least 400Hz [24]. 3 Here we assume that the membrane potential serves as a predictor for the LGMD’s mean firing rate. 4 Set to unity for all simulations. 5 LGMD receives also inhibition from a laterally acting network [21]. The η-function considers only direct feedforward inhibition [22, 6], and so do we. 2 Θ ∈ [7.63°, 180.00°[ temporal resolution ∆ tstim=1.0ms l/|v|=20.00ms, β=1.00, γ=7.50, e=3.00, ζ0=0.90, ζ1=0.99, nrelax=25 0.04 scaled dΘ/dt continuous discretized 0.035 0.03 Θ(t) (input) ϑ(t) (filtered) voltage V(t) (output) t = 56ms max t =300ms c 0.025 0 10 2 η(t): α=3.29, R =1.00 n =10 → t =37ms log Θ(t) amplitude relax max 0.02 0.015 0.01 0.005 0 −0.005 0 50 100 150 200 250 300 −0.01 0 350 time [ms] 50 100 150 200 250 300 350 time [ms] (b) ψ versus η (a) discretized optical variables Figure 1: (a) The continuous visual angle of an approaching object is shown along with its discretized version. Discretization transforms angular velocity from a continuous variable into a series of “spikes” (rescaled). (b) The ψ function with the inputs shown in a, with nrelax = 25 relaxation time steps. Its peak occurs tmax = 56ms before ttc (tc = 300ms). An η function (α = 3.29) that was fitted to ψ shows good agreement. For continuous optical variables, the peak would occur 4ms earlier, and η would have α = 4.44 with R2 = 1. For nrelax = 10, ψ is farther away from its equilibrium at V∞ , and its peak moves 19ms closer to ttc. t =500ms, dia=12.0cm, ∆t c =1.00ms, dt=10.00µs, discrete=1 stim 250 n relax = 50 2 200 α=4.66, R =0.99 [normal] n = 25 relax 2 α=3.91, R =1.00 [normal] n =0 relax tmax [ms] 150 2 α=1.15, R =0.99 [normal] 100 50 0 β=1.00, γ=7.50, e=3.00, V =−0.001, ζ =0.90, ζ =0.99 inh −50 5 10 15 20 25 30 0 35 1 40 45 50 l/|v| [ms] (a) different nrelax (b) different ∆tstim ˆ ˆ Figure 2: The figures plot the relative time tmax ≡ tc − t of the response peak of ψ, V (t), as a function of half-size-to-velocity ratio (points). Line fits with slope α and intercept δ were added (lines). The predicted linear relationship in all cases is consistent with experimental evidence [9]. (a) The stimulus time scale is held constant at ∆tstim = 1ms, and several LGMD time scales are defined by nrelax (= number of intercalated relaxation steps for each integration time step). Bigger values of nrelax move V (t) closer to its equilibrium V∞ (t), implying higher slopes α in turn. (b) LGMD time scale is fixed at nrelax = 25, and ∆tstim is manipulated. Because of the discretization of optical variables (OVs) in our simulation, increasing ∆tstim translates to an overall smaller number of jumps in OVs, but each with higher amplitude. Thus, we say ψ(t) ≡ V (t) if and only if gexc and ginh are defined with the last equation. The time ˙ scale of stimulation is defined by ∆tstim (by default 1ms). The variables ϑ and ϑ are lowpass filtered angular size and rate of expansion, respectively. The amount of filtering is defined by memory constants ζ0 and ζ1 (no filtering if zero). The idea is to continue with generating synaptic input ˙ after ttc, where Θ(t > tc ) = const and thus Θ(t > tc ) = 0. Inhibition is first weighted by γ, and then potentiated by the exponent e. Hodgkin-Huxley potentiates gating variables n, m ∈ [0, 1] instead (potassium ∝ n4 , sodium ∝ m3 , [12]) and multiplies them with conductances. Gabbiani and co-workers found that the function which transforms membrane potential to firing rate is better described by a power function with e = 3 than by exp(·) (Figure 4d in [8]). 3 Dynamics of the ψ-function 3 Discretization. In a typical experiment, a monitor is placed a short distance away from the insect’s eye, and an approaching object is displayed. Computer screens have a fixed spatial resolution, and as a consequence size increments of the displayed object proceed in discrete jumps. The locust retina is furthermore composed of a discrete array of ommatidia units. We therefore can expect a corresponding step-wise increment of Θ with time, although optical and neuronal filtering may ˙ smooth Θ to some extent again, resulting in ϑ (figure 1). Discretization renders Θ discontinuous, ˙ For simulating the dynamics of ψ, we discretized angular size what again will be alleviated in ϑ. ˙ with floor(Θ), and Θ(t) ≈ [Θ(t + ∆tstim ) − Θ(t)]/∆tstim . Discretized optical variables (OVs) were re-normalized to match the range of original (i.e. continuous) OVs. To peak, or not to peak? Rind & Simmons reject the hypothesis that the activity peak signals impending collision on grounds of two arguments [28]: (i) If Θ(t + ∆tstim ) − Θ(t) 3o in consecutively displayed stimulus frames, the illusion of an object approach would be lost. Such stimulation would rather be perceived as a sequence of rapidly appearing (but static) objects, causing reduced responses. (ii) After the last stimulation frame has been displayed (that is Θ = const), LGMD responses keep on building up beyond ttc. This behavior clearly depends on l/|v|, also according to their own data (e.g. Figure 4 in [26]): Response build up after ttc is typically observed for suffi˙ ciently small values of l/|v|. Input into ψ in situations where Θ = const and Θ = 0, respectively, ˙ is accommodated by ϑ and ϑ, respectively. We simulated (i) by setting ∆tstim = 5ms, thus producing larger and more infrequent jumps in discrete OVs than with ∆tstim = 1ms (default). As a consequence, ϑ(t) grows more slowly (deˆ layed build up of inhibition), and the peak occurs later (tmax ≡ tc − t = 10ms with everything else ˆ ˆ identical with figure 1b). The peak amplitude V = V (t) decreases nearly sixfold with respect to default. Our model thus predicts the reduced responses observed by Rind & Simmons [28]. Linearity. Time of peak firing rate is linearly related to l/|v| [10, 9]. The η-function is consistent ˆ with this experimental evidence: t = tc − αl/|v| + δ (e.g. α = 4.7, δ = −27ms). The ψ-function reproduces this relationship as well (figure 2), where α depends critically on the time scale of biophysical processes in the LGMD. We studied the impact of this time scale by choosing 10µs for the numerical integration of equation 1 (algorithm: 4th order Runge-Kutta). Apart from improving the numerical stability of the integration algorithm, ψ is far from its equilibrium V∞ (t) in every moment ˙ t, given the stimulation time scale ∆tstim = 1ms 6 . Now, at each value of Θ(t) and Θ(t), respectively, we intercalated nrelax iterations for integrating ψ. Each iteration takes V (t) asymptotically closer to V∞ (t), and limnrelax 1 V (t) = V∞ (t). If the internal processes in the LGMD cannot keep up with stimulation (nrelax = 0), we obtain slopes values that underestimate experimentally found values (figure 2a). In contrast, for nrelax 25 we get an excellent agreement with the experimentally determined α. This means that – under the reported experimental stimulation conditions (e.g. [9]) – the LGMD would operate relatively close to its steady state7 . Now we fix nrelax at 25 and manipulate ∆tstim instead (figure 2b). The default value ∆tstim = 1ms corresponds to α = 3.91. Slightly bigger values of ∆tstim (2.5ms and 5ms) underestimate the experimental α. In addition, the line fits also return smaller intercept values then. We see tmax < 0 up to l/|v| ≈ 13.5ms – LGMD activity peaks after ttc! Or, in other words, LGMD activity continues to increase after ttc. In the limit, where stimulus dynamics is extremely fast, and LGMD processes are kept far from equilibrium at each instant of the approach, α gets very small. As a consequence, tmax gets largely independent of l/|v|: The activity peak would cling to tmax although we varied l/|v|. 4 Freeze! Experimental data versus steady state of “psi” In the previous section, experimentally plausible values for α were obtained if ψ is close to equilibrium at each instant of time during stimulation. In this section we will thus introduce a steady-state 6 Assuming one ∆tstim for each integration time step. This means that by default stimulation and biophysical dynamics will proceed at identical time scales. 7 Notice that in this moment we can only make relative statements - we do not have data at hand for defining absolute time scales 4 tc=500ms, v=2.00m/s ψ∞ → (β varies), γ=3.50, e=3.00, Vinh=−0.001 tc=500ms, v=2.00m/s ψ∞ → β=2.50, γ=3.50, (e varies), Vinh=−0.001 300 tc=500ms, v=2.00m/s ψ∞ → β=2.50, (γ varies), e=3.00, Vinh=−0.001 350 300 β=10.00 β=5.00 norm. rmse = 0.058...0.153 correlation (β,α)=−0.90 (n=4) ∞ β=1.00 e=4.00 norm. |η−ψ | = 0.009...0.114 e=3.00 300 norm. rmse = 0.014...0.160 correlation (e,α)=0.98 (n=4) ∞ e=2.50 250 250 norm. |η−ψ | = 0.043...0.241 ∞ norm. rmse = 0.085...0.315 correlation (γ,α)=1.00 (n=5) 150 tmax [ms] 200 tmax [ms] 200 tmax [ms] γ=5.00 γ=2.50 γ=1.00 γ=0.50 γ=0.25 e=5.00 norm. |η−ψ | = 0.020...0.128 β=2.50 250 200 150 100 150 100 100 50 50 50 0 5 10 15 20 25 30 35 40 45 0 5 50 10 15 20 l/|v| [ms] 25 30 35 40 45 0 5 50 10 15 20 l/|v| [ms] (a) β varies 25 30 35 40 45 50 l/|v| [ms] (b) e varies (c) γ varies ˆ ˆ Figure 3: Each curve shows how the peak ψ∞ ≡ ψ∞ (t) depends on the half-size-to-velocity ratio. In each display, one parameter of ψ∞ is varied (legend), while the others are held constant (figure title). Line slopes vary according to parameter values. Symbol sizes are scaled according to rmse (see also figure 4). Rmse was calculated between normalized ψ∞ (t) & normalized η(t) (i.e. both functions ∈ [0, 1] with original minimum and maximum indicated by the textbox). To this end, the ˆ peak of the η-function was placed at tc , by choosing, at each parameter value, α = |v| · (tc − t)/l (for determining correlation, the mean value of α was taken across l/|v|). tc=500ms, v=2.00m/s ψ∞ → (β varies), γ=3.50, e=3.00, Vinh=−0.001 tc=500ms, v=2.00m/s ψ∞ → β=2.50, γ=3.50, (e varies), Vinh=−0.001 tc=500ms, v=2.00m/s ψ∞ → β=2.50, (γ varies), e=3.00, Vinh=−0.001 0.25 β=5.00 0.12 β=2.50 β=1.00 0.1 0.08 (normalized η, ψ∞) 0.12 β=10.00 (normalized η, ψ∞) (normalized η, ψ∞) 0.14 0.1 0.08 γ=5.00 γ=2.50 0.2 γ=1.00 γ=0.50 γ=0.25 0.15 0.06 0.04 0.02 0 5 10 15 20 25 30 35 40 45 50 meant |η(t)−ψ∞(t)| meant |η(t)−ψ∞(t)| meant |η(t)−ψ∞(t)| 0.06 0.04 e=5.00 e=4.00 e=3.00 0.02 e=2.50 10 l/|v| [ms] 15 20 25 30 35 40 45 50 l/|v| [ms] (a) β varies (b) e varies 0.1 0.05 0 5 10 15 20 25 30 35 40 45 50 l/|v| [ms] (c) γ varies Figure 4: This figure complements figure 3. It visualizes the time averaged absolute difference between normalized ψ∞ (t) & normalized η(t). For η, its value of α was chosen such that the maxima of both functions coincide. Although not being a fit, it gives a rough estimate on how the shape of both curves deviate from each other. The maximum possible difference would be one. version of ψ (i.e. equation 2 with Vrest = 0, Vexc = 1, and equations 3 plugged in), ψ∞ (t) ≡ e ˙ Θ(t) + Vinh [γΘ(t)] e ˙ β + Θ(t) + [γΘ(t)] (4) (Here we use continuous versions of angular size and rate of expansion). The ψ∞ -function makes life easier when it comes to fitting experimental data. However, it has its limitations, because we brushed the whole dynamic of ψ under the carpet. Figure 3 illustrates how the linˆ ear relationship (=“linearity”) between tmax ≡ tc − t and l/|v| is influenced by changes in parameter values. Changing any of the values of e, β, γ predominantly causes variation in line slopes. The smallest slope changes are obtained by varying Vinh (data not shown; we checked Vinh = 0, −0.001, −0.01, −0.1). For Vinh −0.01, linearity is getting slightly compromised, as slope increases with l/|v| (e.g. Vinh = −1 α ∈ [4.2, 4.7]). In order to get a notion about how well the shape of ψ∞ (t) matches η(t), we computed timeaveraged difference measures between normalized versions of both functions (details: figure 3 & 4). Bigger values of β match η better at smaller, but worse at bigger values of l/|v| (figure 4a). Smaller β cause less variation across l/|v|. As to variation of e, overall curve shapes seem to be best aligned with e = 3 to e = 4 (figure 4b). Furthermore, better matches between ψ∞ (t) and η(t) correspond to bigger values of γ (figure 4c). And finally, Vinh marches again to a different tune (data not shown). Vinh = −0.1 leads to the best agreement (≈ 0.04 across l/|v|) of all Vinh , quite different from the other considered values. For the rest, ψ∞ (t) and η(t) align the same (all have maximum 0.094), 5 ˙ (a) Θ = 126o /s ˙ (b) Θ = 63o /s Figure 5: The original data (legend label “HaGaLa95”) were resampled from ref. [10] and show ˙ DCMD responses to an object approach with Θ = const. Thus, Θ increases linearly with time. The η-function (fitting function: Aη(t+δ)+o) and ψ∞ (fitting function: Aψ∞ (t)+o) were fitted to these data: (a) (Figure 3 Di in [10]) Good fits for ψ∞ are obtained with e = 5 or higher (e = 3 R2 = 0.35 and rmse = 0.644; e = 4 R2 = 0.45 and rmse = 0.592). “Psi” adopts a sigmoid-like curve form which (subjectively) appears to fit the original data better than η. (b) (Figure 3 Dii in [10]) “Psi” yields an excellent fit for e = 3. RoHaTo10 gregarious locust LV=0.03s Θ(t), lv=30ms e011pos014 sgolay with 100 t =107ms max ttc=5.00s ψ adj.R2 0.95 (LM:3) ∞ η(t) adj.R2 1 (TR::1) 2 ψ : R =0.95, rmse=0.004, 3 coefficients ∞ → β=2.22, γ=0.70, e=3.00, V =−0.001, A=0.07, o=0.02, δ=0.00ms inh η: R2=1.00, rmse=0.001 → α=3.30, A=0.08, o=0.0, δ=−10.5ms 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 time [s] (b) α versus β (a) spike trace Figure 6: (a) DCMD activity in response to a black square (l/|v| = 30ms, legend label “e011pos14”, ref. [30]) approaching to the eye center of a gregarious locust (final visual angle 50o ). Data show the first stimulation so habituation is minimal. The spike trace (sampled at 104 Hz) was full wave rectified, lowpass filtered, and sub-sampled to 1ms resolution. Firing rate was estimated with Savitzky-Golay filtering (“sgolay”). The fits of the η-function (Aη(t + δ) + o; 4 coefficients) and ψ∞ -function (Aψ∞ (t) with fixed e, o, δ, Vinh ; 3 coefficients) provide both excellent fits to firing rate. (b) Fitting coefficient α (→ η-function) inversely correlates with β (→ ψ∞ ) when fitting firing rates of another 5 trials as just described (continuous line = line fit to the data points). Similar correlation values would be obtained if e is fixed at values e = 2.5, 4, 5 c = −0.95, −0.96, −0.91. If o was determined by the fitting algorithm, then c = −0.70. No clear correlations with α were obtained for γ. despite of covering different orders of magnitude with Vinh = 0, −0.001, −0.01. Decelerating approach. Hatsopoulos et al. [10] recorded DCMD activity in response to an ap˙ proaching object which projected image edges on the retina moving at constant velocity: Θ = const. ˙ This “linear approach” is perceived as if the object is getting increasingly implies Θ(t) = Θ0 + Θt. slower. But what appears a relatively unnatural movement pattern serves as a test for the functions η & ψ∞ . Figure 5 illustrates that ψ∞ passes the test, and consistently predicts that activity sharply rises in the initial approach phase, and subsequently declines (η passed this test already in the year 1995). 6 Spike traces. We re-sampled about 30 curves obtained from LGMD recordings from a variety of publications, and fitted η & ψ∞ -functions. We cannot show the results here, but in terms of goodness of fit measures, both functions are in the same ballbark. Rather, figure 6a shows a representative example [30]. When α and β are plotted against each other for five trials, we see a strong inverse correlation (figure 6b). Although five data points are by no means a firm statistical sample, the strong correlation could indicate that β and α play similar roles in both functions. Biophysically, β is the leakage conductance, which determines the (passive) membrane time constant τm ∝ 1/β of the neuron. Voltage drops within τm to exp(−1) times its initial value. Bigger values of β mean shorter τm (i.e., “faster neurons”). Getting back to η, this would suggest α ∝ τm , such that higher (absolute) values for α would possibly indicate a slower dynamic of the underlying processes. 5 Discussion (“The Good, the Bad, and the Ugly”) Up to now, mainly two classes of LGMD models existed: The phenomenological η-function on the one hand, and computational models with neuronal layers presynaptic to the LGMD on the other (e.g. [25, 15]; real-world video sequences & robotics: e.g. [3, 14, 32, 2]). Computational models predict that LGMD response features originate from excitatory and inhibitory interactions in – and between – presynaptic neuronal layers. Put differently, non-linear operations are generated in the presynaptic network, and can be a function of many (model) parameters (e.g. synaptic weights, time constants, etc.). In contrast, the η-function assigns concrete nonlinear operations to the LGMD [7]. The η-function is accessible to mathematical analysis, whereas computational models have to be probed with videos or artificial stimulus sequences. The η-function is vague about biophysical parameters, whereas (good) computational models need to be precise at each (model) parameter value. The η-function establishes a clear link between physical stimulus attributes and LGMD activity: It postulates what is to be computed from the optical variables (OVs). But in computational models, such a clear understanding of LGMD inputs cannot always be expected: Presynaptic processing may strongly transform OVs. The ψ function thus represents an intermediate model class: It takes OVs as input, and connects them with biophysical parameters of the LGMD. For the neurophysiologist, the situation could hardly be any better. Psi implements the multiplicative operation of the η-function by shunting inhibition (equation 1: Vexc ≈ Vrest and Vinh ≈ Vrest ). The η-function fits ψ very well according to our dynamical simulations (figure 1), and satisfactory by the approximate criterion of figure 4. We can conclude that ψ implements the η-function in a biophysically plausible way. However, ψ does neither explicitly specify η’s multiplicative operation, nor its exponential function exp(·). Instead we have an interaction between shunting inhibition and a power law (·)e , with e ≈ 3. So what about power laws in neurons? Because of e > 1, we have an expansive nonlinearity. Expansive power-law nonlinearities are well established in phenomenological models of simple cells of the primate visual cortex [1, 11]. Such models approximate a simple cell’s instantaneous firing rate r from linear filtering of a stimulus (say Y ) by r ∝ ([Y ]+ )e , where [·]+ sets all negative values to zero and lets all positive pass. Although experimental evidence favors linear thresholding operations like r ∝ [Y − Ythres ]+ , neuronal responses can behave according to power law functions if Y includes stimulus-independent noise [19]. Given this evidence, the power-law function of the inhibitory input into ψ could possibly be interpreted as a phenomenological description of presynaptic processes. The power law would also be the critical feature by means of which the neurophysiologist could distinguish between the η function and ψ. A study of Gabbiani et al. aimed to provide direct evidence for a neuronal implementation of the η-function [8]. Consequently, the study would be an evidence ˙ for a biophysical implementation of “direct” multiplication via log Θ − αΘ. Their experimental evidence fell somewhat short in the last part, where “exponentation through active membrane conductances” should invert logarithmic encoding. Specifically, the authors observed that “In 7 out of 10 neurons, a third-order power law best described the data” (sixth-order in one animal). Alea iacta est. Acknowledgments MSK likes to thank Stephen M. Rogers for kindly providing the recording data for compiling figure 6. MSK furthermore acknowledges support from the Spanish Government, by the Ramon and Cajal program and the research grant DPI2010-21513. 7 References [1] D.G. Albrecht and D.B. Hamilton, Striate cortex of monkey and cat: contrast response function, Journal of Neurophysiology 48 (1982), 217–237. [2] S. Bermudez i Badia, U. Bernardet, and P.F.M.J. Verschure, Non-linear neuronal responses as an emergent property of afferent networks: A case study of the locust lobula giant movemement detector, PLoS Computational Biology 6 (2010), no. 3, e1000701. [3] M. Blanchard, F.C. Rind, and F.M.J. Verschure, Collision avoidance using a model of locust LGMD neuron, Robotics and Autonomous Systems 30 (2000), 17–38. [4] D.F. Cooke and M.S.A. Graziano, Super-flinchers and nerves of steel: Defensive movements altered by chemical manipulation of a cortical motor area, Neuron 43 (2004), no. 4, 585–593. [5] L. Fogassi, V. Gallese, L. Fadiga, G. Luppino, M. Matelli, and G. Rizzolatti, Coding of peripersonal space in inferior premotor cortex (area f4), Journal of Neurophysiology 76 (1996), 141–157. [6] F. Gabbiani, I. Cohen, and G. Laurent, Time-dependent activation of feed-forward inhibition in a looming sensitive neuron, Journal of Neurophysiology 94 (2005), 2150–2161. [7] F. Gabbiani, H.G. Krapp, N. Hatsopolous, C.H. Mo, C. Koch, and G. Laurent, Multiplication and stimulus invariance in a looming-sensitive neuron, Journal of Physiology - Paris 98 (2004), 19–34. [8] F. Gabbiani, H.G. Krapp, C. Koch, and G. Laurent, Multiplicative computation in a visual neuron sensitive to looming, Nature 420 (2002), 320–324. [9] F. Gabbiani, H.G. Krapp, and G. Laurent, Computation of object approach by a wide-field, motionsensitive neuron, Journal of Neuroscience 19 (1999), no. 3, 1122–1141. [10] N. Hatsopoulos, F. Gabbiani, and G. Laurent, Elementary computation of object approach by a wide-field visual neuron, Science 270 (1995), 1000–1003. [11] D.J. Heeger, Modeling simple-cell direction selectivity with normalized, half-squared, linear operators, Journal of Neurophysiology 70 (1993), 1885–1898. [12] A.L. Hodkin and A.F. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve, Journal of Physiology 117 (1952), 500–544. [13] F. Hoyle, The black cloud, Pinguin Books, London, 1957. [14] M.S. Keil, E. Roca-Morena, and A. Rodr´guez-V´ zquez, A neural model of the locust visual system for ı a detection of object approaches with real-world scenes, Proceedings of the Fourth IASTED International Conference (Marbella, Spain), vol. 5119, 6-8 September 2004, pp. 340–345. [15] M.S. Keil and A. Rodr´guez-V´ zquez, Towards a computational approach for collision avoidance with ı a real-world scenes, Proceedings of SPIE: Bioengineered and Bioinspired Systems (Maspalomas, Gran Canaria, Canary Islands, Spain) (A. Rodr´guez-V´ zquez, D. Abbot, and R. Carmona, eds.), vol. 5119, ı a SPIE - The International Society for Optical Engineering, 19-21 May 2003, pp. 285–296. [16] J.G. King, J.Y. Lettvin, and E.R. Gruberg, Selective, unilateral, reversible loss of behavioral responses to looming stimuli after injection of tetrodotoxin or cadmium chloride into the frog optic nerve, Brain Research 841 (1999), no. 1-2, 20–26. [17] C. Koch, Biophysics of computation: information processing in single neurons, Oxford University Press, New York, 1999. [18] D.N. Lee, A theory of visual control of braking based on information about time-to-collision, Perception 5 (1976), 437–459. [19] K.D. Miller and T.W. Troyer, Neural noise can explain expansive, power-law nonlinearities in neuronal response functions, Journal of Neurophysiology 87 (2002), 653–659. [20] Hideki Nakagawa and Kang Hongjian, Collision-sensitive neurons in the optic tectum of the bullfrog, rana catesbeiana, Journal of Neurophysiology 104 (2010), no. 5, 2487–2499. [21] M. O’Shea and C.H.F. Rowell, Projection from habituation by lateral inhibition, Nature 254 (1975), 53– 55. [22] M. O’Shea and J.L.D. Williams, The anatomy and output connection of a locust visual interneurone: the lobula giant movement detector (lgmd) neurone, Journal of Comparative Physiology 91 (1974), 257–266. [23] S. Peron and F. Gabbiani, Spike frequency adaptation mediates looming stimulus selectivity, Nature Neuroscience 12 (2009), no. 3, 318–326. [24] F.C. Rind, A chemical synapse between two motion detecting neurones in the locust brain, Journal of Experimental Biology 110 (1984), 143–167. [25] F.C. Rind and D.I. Bramwell, Neural network based on the input organization of an identified neuron signaling implending collision, Journal of Neurophysiology 75 (1996), no. 3, 967–985. 8 [26] F.C. Rind and P.J. Simmons, Orthopteran DCMD neuron: a reevaluation of responses to moving objects. I. Selective responses to approaching objects, Journal of Neurophysiology 68 (1992), no. 5, 1654–1666. [27] , Orthopteran DCMD neuron: a reevaluation of responses to moving objects. II. Critical cues for detecting approaching objects, Journal of Neurophysiology 68 (1992), no. 5, 1667–1682. [28] , Signaling of object approach by the dcmd neuron of the locust, Journal of Neurophysiology 77 (1997), 1029–1033. [29] , Reply, Trends in Neuroscience 22 (1999), no. 5, 438. [30] S.M. Roger, G.W.J. Harston, F. Kilburn-Toppin, T. Matheson, M. Burrows, F. Gabbiani, and H.G. Krapp, Spatiotemporal receptive field properties of a looming-sensitive neuron in solitarious and gregarious phases of desert locust, Journal of Neurophysiology 103 (2010), 779–792. [31] S.K. Rushton and J.P. Wann, Weighted combination of size and disparity: a computational model for timing ball catch, Nature Neuroscience 2 (1999), no. 2, 186–190. [32] Yue. S., Rind. F.C., M.S. Keil, J. Cuadri, and R. Stafford, A bio-inspired visual collision detection mechanism for cars: Optimisation of a model of a locust neuron to a novel environment, Neurocomputing 69 (2006), 1591–1598. [33] G.R. Schlotterer, Response of the locust descending movement detector neuron to rapidly approaching and withdrawing visual stimuli, Canadian Journal of Zoology 55 (1977), 1372–1376. [34] H. Sun and B.J. Frost, Computation of different optical variables of looming objects in pigeon nucleus rotundus neurons, Nature Neuroscience 1 (1998), no. 4, 296–303. [35] J.R. Tresilian, Visually timed action: time-out for ’tau’?, Trends in Cognitive Sciences 3 (1999), no. 8, 1999. [36] Y. Wang and B.J. Frost, Time to collision is signalled by neurons in the nucleus rotundus of pigeons, Nature 356 (1992), 236–238. [37] J.P. Wann, Anticipating arrival: is the tau-margin a specious theory?, Journal of Experimental Psychology and Human Perceptual Performance 22 (1979), 1031–1048. [38] M. Wicklein and N.J. Strausfeld, Organization and significance of neurons that detect change of visual depth in the hawk moth manduca sexta, The Journal of Comparative Neurology 424 (2000), no. 2, 356– 376. 9
5 0.47528619 171 nips-2011-Metric Learning with Multiple Kernels
Author: Jun Wang, Huyen T. Do, Adam Woznica, Alexandros Kalousis
Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection. 1
6 0.47513092 128 nips-2011-Improved Algorithms for Linear Stochastic Bandits
7 0.47309902 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
8 0.47293895 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
9 0.46914795 175 nips-2011-Multi-Bandit Best Arm Identification
10 0.46879825 32 nips-2011-An Empirical Evaluation of Thompson Sampling
11 0.46838349 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo
12 0.46808794 24 nips-2011-Active learning of neural response functions with Gaussian processes
13 0.46791098 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
14 0.46783382 157 nips-2011-Learning to Search Efficiently in High Dimensions
15 0.46765372 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
16 0.46699691 111 nips-2011-Hashing Algorithms for Large-Scale Learning
17 0.46618652 231 nips-2011-Randomized Algorithms for Comparison-based Search
18 0.46596938 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
19 0.46581948 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
20 0.46491486 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data