nips nips2013 nips2013-54 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ali Borji, Laurent Itti
Abstract: Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the function’s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploits all the x values tried and all the f (x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Bayesian optimization explains human active search Ali Borji Department of Computer Science USC, Los Angeles, 90089 borji@usc. [sent-1, score-0.463]
2 Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? [sent-5, score-0.303]
3 Their task is to find the function’s maximum in as few clicks as possible. [sent-8, score-0.509]
4 Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. [sent-10, score-0.333]
5 Bayesian Optimization based on Gaussian Processes, which exploits all the x values tried and all the f (x) values obtained so far to pick the next x, predicts human performance and searched locations better. [sent-11, score-0.371]
6 Here, inspired by the optimization literature, we design and conduct a series of experiments to understand human search and active learning behavior. [sent-19, score-0.444]
7 We compare and contrast humans with standard optimization algorithms, to learn how well humans perform 1D function optimization and to discover which algorithm best approaches or explains human search strategies. [sent-20, score-0.948]
8 We aim to decipher how humans choose the next x to be queried when attempting to locate the maximum of an unknown 1D function. [sent-22, score-0.301]
9 We focus on the following questions: Do humans perform local search (for instance by randomly choosing a location and following the gradient of the function, e. [sent-23, score-0.405]
10 Do the sets of sample x locations queried by humans resemble those of some algorithms more than others? [sent-28, score-0.342]
11 Can Gaussian processes [9] offer a unifying theory of human function learning and active search? [sent-31, score-0.335]
12 1 2 Experiments and Results We seek to study human search behavior directly on 1D function optimization, for the first time systematically and explicitly. [sent-32, score-0.323]
13 8 Original function Histogram of clicks Histogram of first clicks 0. [sent-49, score-0.906]
14 During search for the function’s maximum, a red was drawn for each cases and to investigate the generaliza- Right: A sample dot at (x, f (x))pdf of human clicks. [sent-56, score-0.323]
15 In the majority of cases, the distribution of clicks starts with a strong leftward bias for the first clicks, then progressively focusing around the true function maximum as subjects make more clicks and approach the maximum. [sent-68, score-1.22]
16 They were asked to “find the maximum value (highest point) of a function in as few clicks as possible”. [sent-73, score-0.522]
17 Starting from a blank screen, subjects could click on any abscissa x location, and we would show them the corresponding f (x) ordinate. [sent-75, score-0.391]
18 , as opposed to automatically terminating a trial if humans happened to click near the maximum). [sent-81, score-0.434]
19 We highlighted to subjects that they should decide carefully where to click next, to minimize the number of clicks before hitting the maximum location. [sent-88, score-0.843]
20 Before the experiment, we had a training session in which subjects completed 5 trials on a different set of functions than those used in the real experiment. [sent-90, score-0.302]
21 The purpose was for subjects to understand the task and familiarize themselves to the general complexity and shapes of functions (i. [sent-91, score-0.299]
22 To prohibit subjects from using the vertical extent of the screen to guess the maximum location, we randomly elevated or lowered the function plot. [sent-96, score-0.337]
23 To measure to what degree human search behavior deviates from a random process, we devise a Random search algorithm which chooses the next point uniformly random from [−300 300] without replacement. [sent-176, score-0.416]
24 To generate a starting point for algorithms, we randomly sampled from 3 the distribution of human first clicks (over all subjects and functions, p(x1 ); see Fig. [sent-192, score-0.928]
25 As in the behavioral experiment, after termination of an algorithm, a hit is declared when: ∃ xi ∈ B : |xi − argmaxx f (x)| ≤ xTolh , where set B includes the history of searched locations in a search trial. [sent-194, score-0.311]
26 As shown, humans are better than all algorithms tested, if hit rate and function calls are weighed equally (i. [sent-197, score-0.391]
27 It can achieve better-than-human hit rate, with a number of function calls which is smaller than BO algorithms, though still higher than humans (it was not able to reach human performance with equal number of function calls). [sent-204, score-0.621]
28 These algorithms were chosen because their performance curve in the first analysis intersected a window where accuracy is half of humans and function call is twice as humans (black rectangle in Fig. [sent-210, score-0.538]
29 , point matching), 3) agreement between probability distributions of searched locations by all humans and an algorithm, and 4) agreement between pdfs of normalized step sizes 1 Human 10 (to [0 1] on each trial). [sent-216, score-0.499]
30 These two algorithms also show higher agreement to humans in terms of searched locations (Fig. [sent-240, score-0.446]
31 The clearest pattern happens over step size agreement with BO methods (except GP-MV) being closest to humans (Fig. [sent-243, score-0.305]
32 GP-MPI and GP-UCB show higher resemblance to human search behavior over all scores. [sent-246, score-0.323]
33 Further, we measure the regret of algorithms and humans defined as fmax (·) − f ∗ where f ∗ is the best value found so far for up to 15 function calls averaged over all trials. [sent-247, score-0.359]
34 Furthermore, the sequential nature of the BO and updating the GP posterior after each function call seems a natural strategy humans might be employing, and 3) GP models explain function learning in humans over simple functional relationships (linear and quadratic) [24]. [sent-254, score-0.538]
35 Panel (e) shows average regret of algorithms and humans (f ∗ is normalized to fmax for each trial separately). [sent-282, score-0.366]
36 We thus designed 6 additional controlled experiments to further explore GP as a unified computational principle guiding human function learning and active search. [sent-283, score-0.317]
37 In experiments 2 to 5, subjects performed interpolation and extrapolation tasks, as well as active versions of these tasks by choosing points to help them learn about the functions. [sent-291, score-0.607]
38 In experiments 6 and 7, we then return to the optimization task, for a detailed model-based analysis of human search behavior over functions from the same family. [sent-292, score-0.387]
39 In experiments 2 to 6, we fitted a GP to different types of functions using the same set of (x, y) points shown to subjects during training (Fig. [sent-301, score-0.297]
40 In the interpolation task, on each function, subjects were shown 4 points x ∈ {−300, a, b, 300} along with their f (x) values. [sent-310, score-0.392]
41 In the active interpolation task, the same 4 points as in interpolation were shown to subjects. [sent-313, score-0.359]
42 a shows mean distance of human clicks from the GP mean at x = 0 over test trials (averaged over absolute pairwise distances between clicks and the GP) in the interpolation task. [sent-321, score-1.422]
43 Distances of the GP and the actual function from humans are the same over Deg2 and Deg3 functions (no significant difference in medians, Wilcoxon signed-rank test, p > 0. [sent-323, score-0.329]
44 Interestingly, on Deg5 functions, GP is closer to human clicks than the actual function (signed-rank test, p = 0. [sent-325, score-0.733]
45 053) implying that GP captures clicks well in this case. [sent-326, score-0.453]
46 GP did fit the human data even better than the actual function, thereby lending support to our hypothesis that GP may be a reasonable approximation to human function estimation. [sent-327, score-0.49]
47 Could it be that subjects locally fit a line to the two middle points to guess f (0)? [sent-328, score-0.314]
48 To evaluate this hypothesis, we measured the distance, at x = 0, from human clicks to a line passing through (a, f (a)) and (b, f (b)). [sent-329, score-0.702]
49 We then calculate the average of the pairwise distances between human clicks and 200 draws from each random model. [sent-342, score-0.701]
50 Both random models fail to predict human answers on all types of functions (significantly worse than GP, signed-rank test ps < 0. [sent-343, score-0.3]
51 a (inset) demonstrates similar uncertainty patterns for humans and the GP, showing that both uncertainties (at x = 0) rise as functions become more complicated. [sent-348, score-0.317]
52 If this is correct, we expect that humans will tend to click on high uncertainty regions (according to GP std) in the active interpolation task (see Fig. [sent-350, score-0.636]
53 b shows the average of GP standard deviation at locations of human selections. [sent-354, score-0.322]
54 Another possibility is because subjects had slight preference to click toward center. [sent-359, score-0.358]
55 However, GP std at human-selected locations was significantly higher than the GP std at random and SRand points, over all types of functions (signed-rank test, ps < 1e–4; non-significant on Deg2 vs. [sent-360, score-0.6]
56 This result suggests that since humans did not know in advance where a follow-up query might happen, they chose high-uncertainty locations according to GP, as clicking at those locations would most shrink the overall uncertainty about the function. [sent-363, score-0.433]
57 In the extrapolation task, subjects were asked to guess the function value at x = 200 as accurately as possible (Fig. [sent-372, score-0.438]
58 In the active extrapolation task, subjects were asked to choose the most informative 4th point in [−300 100] regarding estimating f (200). [sent-374, score-0.497]
59 c, in alignment with interpolation results, humans are good at Deg2 and Deg3 but fail on Deg5, and so does the GP model. [sent-379, score-0.394]
60 Here again, with Deg5, GP and humans are closer to each other than to the actual function, further suggesting that their behaviors and errors are similar. [sent-380, score-0.319]
61 SRand performs better than uniform random, indicating 16 14 human random shuffled Random max std min std 12 10 8 6 4 2 0 Deg2 Deg3 Deg5 Figure 7: a) Mean distance of human clicks from models. [sent-387, score-1.496]
62 6 a) distance of human clicks from location of max Q b) distance of random clicks from location of max Q c) left: mean selected function values right: mean selected GP values 0. [sent-395, score-1.386]
63 8 100 100 180 d) mean selected GP std values 180 rand − mean human1 − mean 140 human1 − std rand − std 140 0. [sent-396, score-0.871]
64 a and b) distance of human and random clicks from locations of max Q (i. [sent-403, score-0.802]
65 As in the interpolation task, GP and human standard deviations rise as functions become more complex (Fig. [sent-414, score-0.385]
66 d), similar to active interpolation, shows that humans tended to choose locations with significantly higher uncertainty than uniform and SRand points, for all function types (ps < 0. [sent-419, score-0.469]
67 Some subjects in this task tended to click toward the right (close to 100), maybe to obtain a better idea of the curvature between 100 and 200. [sent-421, score-0.404]
68 This is perhaps why the ratio of human std to max std is lower in active extrapolation compared to active interpolation (0. [sent-422, score-1.15]
69 They were allowed to make two equally important clicks and were shown the function value after each one. [sent-431, score-0.453]
70 In the first one, we measure the mean distance of human clicks (1st and 2nd clicks) from the location of the maximum GP mean and maximum GP standard deviation (Fig. [sent-436, score-0.908]
71 We hypothesized that the human first click would be at a location of high GP variance (to reduce uncertainty about the function), while the second click would be close to the location of highest GP mean (estimated function maximum). [sent-440, score-0.596]
72 However, results showed that human 1st clicks were close to the max GP mean and not very close to the max GP std. [sent-441, score-0.757]
73 Human 2nd clicks were even closer (signedrank test, p < 0. [sent-442, score-0.473]
74 001) to the max GP mean and further away from the max GP std (p < 0. [sent-443, score-0.311]
75 Repeating this analysis for random clicks (uniform and SRand) shows quite the opposite trend (Fig. [sent-446, score-0.453]
76 Random locations are further apart from maximum of the GP mean (compared to human clicks) while being closer to the maximum of the GP std point (compared to human clicks). [sent-449, score-0.89]
77 This cross pattern between human and random clicks (wrt. [sent-450, score-0.683]
78 Distances of human clicks from the max GP mean and max GP std rise as functions become more complicated. [sent-452, score-1.024]
79 d shows that humans chose points with significantly less std (normalized to the entire function) in their 2nd clicks compared to random and first clicks. [sent-462, score-0.981]
80 Human 1st clicks have higher std than uniform random clicks. [sent-463, score-0.69]
81 7 mean distance from max GP mean and std 1 b) Results. [sent-475, score-0.355]
82 8 subjects and functions, and average clicks 200 of 8. [sent-481, score-0.698]
83 4 −2 clicks on each trial, and exploited this GP 100 10 0 5 10 15 to evaluate the next click of the same sub0. [sent-488, score-0.566]
84 Renormalized GP std human− max GP std 0 0 sults are shown in Fig. [sent-491, score-0.493]
85 The regret of the 1 3 5 7 9 11 13 15 0 5 10 15 number of clicks number of clicks GP model and humans decline with more Figure 9: Exploration vs. [sent-493, score-1.192]
86 clicks, implying that humans chose informative clicks regarding optimization (figure inset). [sent-495, score-0.756]
87 a shows that subjects get closer to the location of maximum GP mean and further away from max GP std (for 15 clicks). [sent-499, score-0.632]
88 b shows the normalized mean and standard deviation of human clicks (from the GP model), averaged over all trials. [sent-502, score-0.738]
89 Interestingly, we observe that humans tended to click on higher uncertainty regions (according to GP) in their first 6 clicks (average over all subjects and functions), then gradually relying more on the GP mean (i. [sent-505, score-1.156]
90 Results of optimization tasks suggest that human clicks during search for a maximum of a 1D function can be predicted by a Gaussian process model. [sent-509, score-0.842]
91 3 regret human random shuffled random GP Discussion and Conclusion Our contributions are twofold: First, we found a striking capability of humans in 1D function optimization. [sent-510, score-0.579]
92 In spite of the relative naivety of our subjects (not maths or engineering majors), the high human efficiency in our search task does open the challenge that even more efficient optimization algorithms must be possible. [sent-511, score-0.647]
93 Additional pilot investigations not shown here suggest that humans may perform even better in optimization when provided with first and second derivatives. [sent-512, score-0.303]
94 Second, we showed that Gaussian processes provide a reasonable (though not perfect) unifying theoretical account of human function learning, active learning, and search (GP plus a selection strategy). [sent-515, score-0.428]
95 Results of experiments 2 to 5 lead to an interesting conclusion: In interpolation and extrapolation tasks, subjects try to minimize the error between their estimation and the actual function, while in active tasks they change their objective function to explore uncertain regions. [sent-516, score-0.615]
96 In the optimization task, subjects progressively sample the function, update their belief and use this belief again to find the location of maximum. [sent-517, score-0.359]
97 Yet, while they showed that Gaussian processes can predict human errors and difficulty in function learning, here we focused on explaining human active behavior with GP, thus extending explanatory power of GP one step ahead. [sent-523, score-0.565]
98 Najemnik and Geisler [6, 28, 29] proposed a Bayesian ideal-observer model to explain human eye movement strategies during visual search for a small Gabor patch hidden in noise. [sent-525, score-0.374]
99 [30] studied human active learning on the well-understood problem of finding the threshold in a binary search problem. [sent-531, score-0.41]
100 Geisler, “Eye movement statistics in humans are consistent with an optimal search strategy,” Journal of Vision, vol. [sent-693, score-0.379]
wordName wordTfidf (topN-words)
[('gp', 0.572), ('clicks', 0.453), ('humans', 0.269), ('subjects', 0.245), ('std', 0.237), ('human', 0.23), ('extrapolation', 0.128), ('interpolation', 0.125), ('click', 0.113), ('bo', 0.109), ('pso', 0.105), ('xtolm', 0.105), ('search', 0.093), ('active', 0.087), ('srand', 0.084), ('hit', 0.077), ('locations', 0.073), ('osborne', 0.069), ('searched', 0.068), ('fminbnd', 0.063), ('fminsearch', 0.063), ('shuffled', 0.063), ('maxitr', 0.053), ('minfunc', 0.053), ('trial', 0.052), ('calls', 0.045), ('location', 0.043), ('asked', 0.037), ('progressively', 0.037), ('agreement', 0.036), ('mean', 0.036), ('optimization', 0.034), ('clicked', 0.034), ('geisler', 0.034), ('yr', 0.034), ('maximum', 0.032), ('screen', 0.032), ('kalish', 0.032), ('majors', 0.032), ('xtolh', 0.032), ('functions', 0.03), ('actual', 0.03), ('borji', 0.028), ('fmax', 0.028), ('najemnik', 0.028), ('guess', 0.028), ('ga', 0.027), ('stimuli', 0.027), ('trials', 0.027), ('distance', 0.027), ('inset', 0.026), ('rand', 0.026), ('direct', 0.025), ('task', 0.024), ('itti', 0.024), ('exploitation', 0.023), ('func', 0.023), ('ph', 0.023), ('ps', 0.023), ('tended', 0.022), ('points', 0.022), ('bayesian', 0.021), ('fminunc', 0.021), ('foraging', 0.021), ('gen', 0.021), ('maths', 0.021), ('qnewton', 0.021), ('randy', 0.021), ('shuffledrandy', 0.021), ('swarm', 0.021), ('undergraduates', 0.021), ('closer', 0.02), ('tted', 0.02), ('line', 0.019), ('max', 0.019), ('explains', 0.019), ('participants', 0.019), ('pdf', 0.019), ('deviation', 0.019), ('usc', 0.019), ('gd', 0.018), ('processes', 0.018), ('distances', 0.018), ('uncertainty', 0.018), ('regret', 0.017), ('eye', 0.017), ('aged', 0.017), ('participated', 0.017), ('pdfs', 0.017), ('abscissa', 0.017), ('deg', 0.017), ('polynomial', 0.017), ('test', 0.017), ('age', 0.017), ('visual', 0.017), ('movement', 0.017), ('global', 0.017), ('criteria', 0.016), ('tries', 0.016), ('blank', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 54 nips-2013-Bayesian optimization explains human active search
Author: Ali Borji, Laurent Itti
Abstract: Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the function’s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploits all the x values tried and all the f (x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans. 1
2 0.29597008 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
Author: Michalis Titsias, Miguel Lazaro-Gredilla
Abstract: We introduce a novel variational method that allows to approximately integrate out kernel hyperparameters, such as length-scales, in Gaussian process regression. This approach consists of a novel variant of the variational framework that has been recently developed for the Gaussian process latent variable model which additionally makes use of a standardised representation of the Gaussian process. We consider this technique for learning Mahalanobis distance metrics in a Gaussian process regression setting and provide experimental evaluations and comparisons with existing methods by considering datasets with high-dimensional inputs. 1
3 0.20210426 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
Author: Roger Frigola, Fredrik Lindsten, Thomas B. Schon, Carl Rasmussen
Abstract: State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference and learning (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and, instead, infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity. 1
4 0.18799256 201 nips-2013-Multi-Task Bayesian Optimization
Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams
Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1
Author: Andreas Ruttor, Philipp Batz, Manfred Opper
Abstract: We introduce a nonparametric approach for estimating drift functions in systems of stochastic differential equations from sparse observations of the state vector. Using a Gaussian process prior over the drift as a function of the state vector, we develop an approximate EM algorithm to deal with the unobserved, latent dynamics between observations. The posterior over states is approximated by a piecewise linearized process of the Ornstein-Uhlenbeck type and the MAP estimation of the drift is facilitated by a sparse Gaussian process regression. 1
6 0.13189921 241 nips-2013-Optimizing Instructional Policies
7 0.12512103 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
8 0.12135666 69 nips-2013-Context-sensitive active sensing in humans
9 0.10697913 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
10 0.10167393 105 nips-2013-Efficient Optimization for Sparse Gaussian Process Regression
11 0.10049599 137 nips-2013-High-Dimensional Gaussian Process Bandits
12 0.097596586 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
13 0.088451393 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
14 0.066717453 309 nips-2013-Statistical Active Learning Algorithms
15 0.062555037 149 nips-2013-Latent Structured Active Learning
16 0.060566694 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
18 0.052595086 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
19 0.052562963 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
20 0.051767021 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
topicId topicWeight
[(0, 0.127), (1, -0.004), (2, -0.059), (3, -0.035), (4, -0.006), (5, 0.026), (6, 0.064), (7, 0.023), (8, -0.007), (9, -0.005), (10, -0.163), (11, -0.086), (12, -0.208), (13, 0.021), (14, -0.077), (15, 0.007), (16, -0.22), (17, 0.072), (18, -0.035), (19, -0.202), (20, 0.065), (21, -0.064), (22, -0.219), (23, 0.078), (24, 0.03), (25, -0.024), (26, -0.098), (27, 0.021), (28, -0.129), (29, 0.153), (30, -0.234), (31, 0.09), (32, 0.014), (33, -0.039), (34, 0.186), (35, 0.029), (36, 0.065), (37, -0.007), (38, -0.044), (39, -0.02), (40, 0.016), (41, -0.063), (42, -0.036), (43, -0.005), (44, 0.015), (45, -0.007), (46, 0.048), (47, 0.011), (48, 0.02), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.969356 54 nips-2013-Bayesian optimization explains human active search
Author: Ali Borji, Laurent Itti
Abstract: Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the function’s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploits all the x values tried and all the f (x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans. 1
2 0.72726697 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
Author: Michalis Titsias, Miguel Lazaro-Gredilla
Abstract: We introduce a novel variational method that allows to approximately integrate out kernel hyperparameters, such as length-scales, in Gaussian process regression. This approach consists of a novel variant of the variational framework that has been recently developed for the Gaussian process latent variable model which additionally makes use of a standardised representation of the Gaussian process. We consider this technique for learning Mahalanobis distance metrics in a Gaussian process regression setting and provide experimental evaluations and comparisons with existing methods by considering datasets with high-dimensional inputs. 1
3 0.70418125 201 nips-2013-Multi-Task Bayesian Optimization
Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams
Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1
4 0.67261171 105 nips-2013-Efficient Optimization for Sparse Gaussian Process Regression
Author: Yanshuai Cao, Marcus A. Brubaker, David Fleet, Aaron Hertzmann
Abstract: We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-ofart performance in discrete cases and competitive results in the continuous case. 1
5 0.5195483 69 nips-2013-Context-sensitive active sensing in humans
Author: Sheeraz Ahmad, He Huang, Angela J. Yu
Abstract: Humans and animals readily utilize active sensing, or the use of self-motion, to focus sensory and cognitive resources on the behaviorally most relevant stimuli and events in the environment. Understanding the computational basis of natural active sensing is important both for advancing brain sciences and for developing more powerful artificial systems. Recently, we proposed a goal-directed, context-sensitive, Bayesian control strategy for active sensing, C-DAC (ContextDependent Active Controller) (Ahmad & Yu, 2013). In contrast to previously proposed algorithms for human active vision, which tend to optimize abstract statistical objectives and therefore cannot adapt to changing behavioral context or task goals, C-DAC directly minimizes behavioral costs and thus, automatically adapts itself to different task conditions. However, C-DAC is limited as a model of human active sensing, given its computational/representational requirements, especially for more complex, real-world situations. Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a significant reduction in complexity by looking only one step ahead. We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior. We find that C-DAC and its myopic variant both achieve better fit to human data than Infomax (Butko & Movellan, 2010), which maximizes expected cumulative future information gain. In summary, this work provides novel experimental results that differentiate theoretical models for human active sensing, as well as a novel active sensing algorithm that retains the context-sensitivity of the optimal controller while achieving significant computational savings. 1
6 0.46841586 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
7 0.43722048 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
8 0.40187171 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
9 0.37595186 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
10 0.37397149 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
11 0.36888346 241 nips-2013-Optimizing Instructional Policies
12 0.30893618 183 nips-2013-Mapping paradigm ontologies to and from the brain
13 0.30067223 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
14 0.29106548 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
15 0.27735102 137 nips-2013-High-Dimensional Gaussian Process Bandits
16 0.26357341 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
17 0.25798032 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
18 0.24269678 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
19 0.23639488 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
20 0.23455732 149 nips-2013-Latent Structured Active Learning
topicId topicWeight
[(2, 0.013), (16, 0.028), (33, 0.13), (34, 0.125), (41, 0.058), (49, 0.028), (56, 0.086), (70, 0.028), (85, 0.023), (89, 0.029), (90, 0.273), (93, 0.038), (95, 0.027)]
simIndex simValue paperId paperTitle
1 0.77298486 225 nips-2013-One-shot learning and big data with n=2
Author: Lee H. Dicker, Dean P. Foster
Abstract: We model a “one-shot learning” situation, where very few observations y1 , ..., yn ∈ R are available. Associated with each observation yi is a very highdimensional vector xi ∈ Rd , which provides context for yi and enables us to predict subsequent observations, given their own context. One of the salient features of our analysis is that the problems studied here are easier when the dimension of xi is large; in other words, prediction becomes easier when more context is provided. The proposed methodology is a variant of principal component regression (PCR). Our rigorous analysis sheds new light on PCR. For instance, we show that classical PCR estimators may be inconsistent in the specified setting, unless they are multiplied by a scalar c > 1; that is, unless the classical estimator is expanded. This expansion phenomenon appears to be somewhat novel and contrasts with shrinkage methods (c < 1), which are far more common in big data analyses. 1
same-paper 2 0.76845378 54 nips-2013-Bayesian optimization explains human active search
Author: Ali Borji, Laurent Itti
Abstract: Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to find the function’s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploits all the x values tried and all the f (x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further confirm that Gaussian Processes provide a general and unified theoretical account to explain passive and active function learning and search in humans. 1
3 0.71180844 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising
Author: Forest Agostinelli, Michael R. Anderson, Honglak Lee
Abstract: Stacked sparse denoising autoencoders (SSDAs) have recently been shown to be successful at removing noise from corrupted images. However, like most denoising techniques, the SSDA is not robust to variation in noise types beyond what it has seen during training. To address this limitation, we present the adaptive multi-column stacked sparse denoising autoencoder (AMC-SSDA), a novel technique of combining multiple SSDAs by (1) computing optimal column weights via solving a nonlinear optimization program and (2) training a separate network to predict the optimal weights. We eliminate the need to determine the type of noise, let alone its statistics, at test time and even show that the system can be robust to noise not seen in the training set. We show that state-of-the-art denoising performance can be achieved with a single system on a variety of different noise types. Additionally, we demonstrate the efficacy of AMC-SSDA as a preprocessing (denoising) algorithm by achieving strong classification performance on corrupted MNIST digits. 1
4 0.65835929 292 nips-2013-Sequential Transfer in Multi-armed Bandit with Finite Set of Models
Author: Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill
Abstract: Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi–armed bandit framework, where the objective is to minimize the total regret over a sequence of tasks by transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for estimating the possible tasks and derive regret bounds for it. 1
Author: Paul Wagner
Abstract: Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods. As our first main result, we show that an important subset of the latter methodology is, in fact, a limiting special case of a general formulation of the former methodology; optimistic policy iteration encompasses not only most of the greedy value function methods but also natural actor-critic methods, and permits one to directly interpolate between them. The resulting continuum adjusts the strength of the Markov assumption in policy improvement and, as such, can be seen as dual in spirit to the continuum in TD(λ)-style algorithms in policy evaluation. As our second main result, we show for a substantial subset of softgreedy value function approaches that, while having the potential to avoid policy oscillation and policy chattering, this subset can never converge toward an optimal policy, except in a certain pathological case. Consequently, in the context of approximations (either in state estimation or in value function representation), the majority of greedy value function methods seem to be deemed to suffer either from the risk of oscillation/chattering or from the presence of systematic sub-optimality. 1
6 0.61254895 173 nips-2013-Least Informative Dimensions
7 0.61133724 201 nips-2013-Multi-Task Bayesian Optimization
8 0.61053652 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
9 0.60731357 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
10 0.60576463 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
11 0.60550231 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
12 0.60465175 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
13 0.60458452 294 nips-2013-Similarity Component Analysis
14 0.60412711 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
15 0.60371739 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
16 0.6032787 318 nips-2013-Structured Learning via Logistic Regression
17 0.60244834 350 nips-2013-Wavelets on Graphs via Deep Learning
18 0.6022653 5 nips-2013-A Deep Architecture for Matching Short Texts
19 0.60208893 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
20 0.6020177 11 nips-2013-A New Convex Relaxation for Tensor Completion