nips nips2013 nips2013-69 knowledge-graph by maker-knowledge-mining

69 nips-2013-Context-sensitive active sensing in humans

Source: pdf

Author: Sheeraz Ahmad, He Huang, Angela J. Yu

Abstract: Humans and animals readily utilize active sensing, or the use of self-motion, to focus sensory and cognitive resources on the behaviorally most relevant stimuli and events in the environment. Understanding the computational basis of natural active sensing is important both for advancing brain sciences and for developing more powerful artiﬁcial systems. Recently, we proposed a goal-directed, context-sensitive, Bayesian control strategy for active sensing, C-DAC (ContextDependent Active Controller) (Ahmad & Yu, 2013). In contrast to previously proposed algorithms for human active vision, which tend to optimize abstract statistical objectives and therefore cannot adapt to changing behavioral context or task goals, C-DAC directly minimizes behavioral costs and thus, automatically adapts itself to different task conditions. However, C-DAC is limited as a model of human active sensing, given its computational/representational requirements, especially for more complex, real-world situations. Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a signiﬁcant reduction in complexity by looking only one step ahead. We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior. We ﬁnd that C-DAC and its myopic variant both achieve better ﬁt to human data than Infomax (Butko & Movellan, 2010), which maximizes expected cumulative future information gain. In summary, this work provides novel experimental results that differentiate theoretical models for human active sensing, as well as a novel active sensing algorithm that retains the context-sensitivity of the optimal controller while achieving signiﬁcant computational savings. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Context-sensitive active sensing in humans Sheeraz Ahmad Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive La Jolla, CA 92093 sahmad@cs. [sent-1, score-0.425]

2 edu Abstract Humans and animals readily utilize active sensing, or the use of self-motion, to focus sensory and cognitive resources on the behaviorally most relevant stimuli and events in the environment. [sent-6, score-0.172]

3 Understanding the computational basis of natural active sensing is important both for advancing brain sciences and for developing more powerful artiﬁcial systems. [sent-7, score-0.441]

4 However, C-DAC is limited as a model of human active sensing, given its computational/representational requirements, especially for more complex, real-world situations. [sent-10, score-0.245]

5 Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a signiﬁcant reduction in complexity by looking only one step ahead. [sent-11, score-0.6]

6 We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior. [sent-12, score-0.516]

7 We ﬁnd that C-DAC and its myopic variant both achieve better ﬁt to human data than Infomax (Butko & Movellan, 2010), which maximizes expected cumulative future information gain. [sent-13, score-0.489]

8 In summary, this work provides novel experimental results that differentiate theoretical models for human active sensing, as well as a novel active sensing algorithm that retains the context-sensitivity of the optimal controller while achieving signiﬁcant computational savings. [sent-14, score-0.686]

9 1 Introduction Both artiﬁcial and natural sensing systems face the challenge of making sense out of a continuous stream of noisy sensory inputs. [sent-15, score-0.299]

10 One critical tool the brain has at its disposal is active sensing, a goaldirected, context-sensitive control strategy that prioritizes sensing and processing resources toward the most rewarding or informative aspects of the environment (Yarbus, 1967). [sent-16, score-0.407]

11 Having a formal understanding of active sensing is not only important for advancing neuroscientiﬁc progress but also developing context-sensitive, interactive artiﬁcial agents. [sent-17, score-0.405]

12 1 The most well-studied aspect of human active sensing is saccadic eye movements. [sent-18, score-0.606]

13 Early work suggested that saccades are attracted to salient targets that differ from surround in one or more of feature dimensions (Koch & Ullman, 1985; Itti & Koch, 2000); however, saliency has been found to only account for a small fraction of human saccadic eye movement (Itti, 2005). [sent-19, score-0.278]

14 However, these are generic statistical objectives that do not naturally adapt to behavioral context, such as changes in the relative cost of speed versus error, or the energetic or temporal cost associated with switching from one sensing location/conﬁguration to another. [sent-21, score-0.71]

15 We compare C-DAC and Infomax performance to human data, in terms of ﬁxation choice and duration, from a visual search experiment. [sent-24, score-0.271]

16 We exclude greedy MAP from this comparison, based on the results from our recent work showing that it is an almost random, and thus highly suboptimal strategy for the well-structured visual search task presented here. [sent-25, score-0.179]

17 Humans seem capable of planning and decision-making in very high-dimensional settings, while readily adapting to different behavioral context. [sent-28, score-0.166]

18 Here, we consider an approximate algorithm that chooses actions online and myopically, by considering the behavioral cost of looking only one step ahead (instead of an inﬁnite horizon as in the optimal C-DAC policy). [sent-30, score-0.314]

19 2, we brieﬂy summarize C-DAC and Infomax, as well as introduce the myopic approximation to C-DAC. [sent-32, score-0.366]

20 3, we describe the experiment, present the human behavioral data, and compare the performance of different models to the human data. [sent-34, score-0.412]

21 4, we simulate scenarios where CDAC and myopic C-DAC achieve a ﬂexible trade-off between speed, accuracy and effort depending on the task demands, whereas Infomax falls short – this forms experimentally testable predictions for future investigations. [sent-36, score-0.397]

22 2 The Models In the following, we assume a basic active sensing scenario, which formally translates to a sequential decision making process based on noisy inputs, where the observer can control both the sampling location and duration. [sent-39, score-0.715]

23 For example, in a visual search task, the observer controls where to look, when to switch to a different sensing location, and when to stop searching and report the answer. [sent-40, score-0.601]

24 Although the framework discussed below applies to a broad range of active sensing problems, we will use language speciﬁc to visual search for concreteness. [sent-41, score-0.519]

25 For inference, we assume the observer starts with a prior belief over the latent variable (true target location), and then updates her beliefs via Bayes rule upon receiving each new observation. [sent-44, score-0.368]

26 The observer maintains a probability distribution over the k possible target locations, representing the corresponding belief about the presence of the target in that location (belief state). [sent-45, score-0.696]

27 Thus, if s is the target location (latent), λt := {λ1 , . [sent-46, score-0.364]

28 , xt } is the sequence of observations up to time t (observed), the belief state and the belief update rule are: pt := (P (s = 1|xt ; λt ), . [sent-52, score-0.377]

29 , P (s = k|xt ; λt )) pi = P (s = i|xt ; λt ) ∝ p(xt |s = i; λt )P (s = i|xt−1 ; λt−1 ) = fs,λt (xt )pi t t−1 (1) where fs,λ (xt ) is the likelihood function, and p0 the prior belief distribution over target location. [sent-55, score-0.286]

30 For the decision component, C-DAC optimizes the mapping from the belief state to the action space (continue, switch to one of the other sensing locations, stop and report the target location) with respect to a behavioral cost function. [sent-56, score-1.01]

31 For any given policy π (mapping belief state to action), the expected cost is Lπ := cE[τ ] + cs E[ns ] + P (δ = s). [sent-58, score-0.503]

32 At any time t, the observer can either choose to stop and declare one of the locations to be the target, or choose to continue and look at location λt+1 . [sent-59, score-0.516]

33 2 Infomax policy Infomax (Butko & Movellan, 2010) presents a similar formulation in terms of belief state representation and Bayesian inference, however, for the control part, the goal is to maximize long term information gain (or minimize cumulative future entropy of the posterior belief state). [sent-64, score-0.371]

34 A general heuristic used for such strategies is to stop when the conﬁdence in one of the locations being the target (the belief about that location) exceeds a certain threshold, which is a 3 free parameter challenging to set for any speciﬁc problem. [sent-66, score-0.397]

35 In our recent work we used an optimistic strategy for comparing Infomax with C-DAC by giving Infomax a stopping boundary that is ﬁt to the one computed by C-DAC. [sent-67, score-0.232]

36 Here we present a novel theoretical result that gives an inner bound of the stopping region, obviating the need to do a manual ﬁt. [sent-68, score-0.174]

37 The bound is sensitive to the sampling cost c and the signal-to-noise ratio of the sensory input, and underestimates the size of the stopping region. [sent-69, score-0.305]

38 1, then for all pi > p∗ , the optimal action is to stop and declare location i under the cost formulation of C-DAC. [sent-78, score-0.565]

39 Therefore stopping is optimal when the improvement in belief from collecting another sample is less than the cost incurred to collect that sample. [sent-81, score-0.411]

40 Formally, stopping and choosing i is optimal for the corresponding belief pi = p when: max(p ) − p ≤ c p ∈P where P is the set of achievable beliefs starting from p. [sent-82, score-0.357]

41 Furthermore, if we solve the above equation for equality, to ﬁnd p∗ , then by problem construction, it is always optimal to stop for p > p∗ ( stopping cost (1 − p) < (1 − p∗ )). [sent-83, score-0.337]

42 In other words, the planning is based on the inherent assumption that the next action is the last action permissible, and so the goal is to minimize the cost incurred in this single step. [sent-88, score-0.287]

43 The actions thus available are, stop and declare the current location as the target, or choose another sensing location before stopping. [sent-89, score-0.876]

44 6, we can write the value function as: V (p, k) = min 1 − pk , min c + cs 1{j=k} + min 1 − E[plj ] j lj (8) where j indexes the possible sensing locations, and lj indexes the possible stopping actions for the sensing location j. [sent-91, score-1.198]

45 It can be seen, therefore, that this myopic policy overestimates the size of the stopping region: if there is only step left, it is never optimal to continue looking at the same location, since such an action would not lead to any improvement in expected accuracy, but incur a unit cost of time c. [sent-94, score-0.856]

46 Therefore, in the simulations below, just like for Infomax, we set the stopping boundary for myopic C-DAC using the bound presented in Theorem 1. [sent-95, score-0.598]

47 4 3 Case Study: Visual Search In this section, we apply the different active sensing models discussed above to a simple visual search task, and compare their performance with the observed human behavior in terms of accuracy and ﬁxation duration. [sent-96, score-0.676]

48 1 Visual search experiment The task involves ﬁnding a target (the patch with dots moving to the left) amongst two distractors (the patches with dots moving to the right), where a patch is a stimulus location possibly containing the target. [sent-98, score-0.783]

49 The deﬁnition of target versus distractor is counter-balanced across subjects. [sent-99, score-0.173]

50 The display is gaze contingent, such that only the location currently ﬁxated is visible on the screen, allowing exact measurement of where a subject obtains sensory input. [sent-102, score-0.309]

51 At any time, the subject can declare the current ﬁxation location to be the target by pressing space bar. [sent-103, score-0.424]

52 Target location for each trial is drawn independently from the ﬁxed underlying distribution (1/13, 3/13, 9/13), with the spatial conﬁguration ﬁxed during a block and counter-balanced across blocks. [sent-104, score-0.268]

53 Subjects were rewarded points based on their performance, more if they got the answer correct (less if they got it wrong), and penalized for total search time as well as the number of switches in sensing location. [sent-108, score-0.46]

54 Figure 1: Simple visual search task, with gaze contingent display. [sent-109, score-0.216]

55 7), which are more likely to be 1 if the location contains the target, and more likely to be 0 if it contains a distractor (the probabilities sum to 1, since the left and right-moving stimuli are statistically/perceptually symmetric). [sent-112, score-0.259]

56 We assume that within a block of trials, subjects learn about the spatial distribution of target location in that block by inverting a Bayesian hidden Markov model, related to the Dynamic Belief Model (DBM) (Yu & Cohen, 2009). [sent-113, score-0.437]

57 This implies that the target location on each trial is generated from a categorical distribution, whose underlying rates at the three locations are, with probability α, the same as last trial and, probability 1 − α, redrawn from a prior Dirichlet distribution. [sent-114, score-0.516]

58 8 to capture the general tendency of human subjects to typically rely more on recent observations than distant ones in anticipating upcoming stimuli. [sent-116, score-0.245]

59 We assume that subjects choose the ﬁrst ﬁxation location on each trial as the option with the highest prior probability of containing the target. [sent-117, score-0.341]

60 We investigate how well these policies explain the emergence of a certain conﬁrmation bias in humans – the tendency to favor the more likely (privileged) location when making a decision about target location. [sent-119, score-0.614]

61 We focus on this particular aspect of behavioral data because of two reasons: (1) The more obvious aspects (e. [sent-120, score-0.166]

62 where each policy would choose to ﬁxate ﬁrst) are also the more trivial ones that all reasonable policies would display (e. [sent-122, score-0.16]

63 the most probable one); (2) Conﬁrmation 5 bias is a well studied, psychologically important phenomenon exhibited by humans in a variety of choice and decision behavior (see (Nickerson, 1998), for a review), and is, therefore, important to capture in its own right. [sent-124, score-0.226]

64 Figure 2: Conﬁrmation bias in human data and model simulations. [sent-125, score-0.17]

65 The parameters used for C-DAC policy are (c, cs , β) = (0. [sent-126, score-0.284]

66 The stopping thresholds for both Infomax and myopic C-DAC are set using the bound developed in Theorem 1. [sent-130, score-0.54]

67 This is not due to a potential motor bias (tendency to assume the ﬁrst ﬁxation location contains the target, combined with ﬁrst ﬁxating the 9 patch most often), as we only consider trials where the subject ﬁrst ﬁxates the relevant patch. [sent-135, score-0.404]

68 The conﬁrmation bias is also apparent in ﬁxation duration (right column), as subjects ﬁxate the 9 patch shorter than the 1 & 3 patches when it is the target (as though faster to conﬁrm), and longer when it is not the target (as though slower to be dissuaded). [sent-136, score-0.568]

69 As shown in Figure 2, these conﬁrmation bias phenomena are captured by both C-DAC and myopic C-DAC, but not by Infomax. [sent-138, score-0.413]

70 6 Our results show that human behavior is best modeled by a control strategy (C-DAC or myopic CDAC) that takes into account behavior costs, e. [sent-139, score-0.557]

71 This is because C-DAC requires using dynamic programming (recursing Bellman’s optimal equation) ofﬂine to compute a globally optimal policy over the continuous state space (belief state), so that the discretized state space scales exponentially in the number of hypotheses. [sent-143, score-0.211]

72 On the other hand, myopic C-DAC incurs just a constant cost to compute the policy online for only the current belief state, is consequently psychologically more plausible, and provides a qualitative ﬁt to the data with a simple threshold bound. [sent-145, score-0.8]

73 We believe its performance can be improved by using a tighter bound to approximate the stopping region. [sent-146, score-0.174]

74 However, one scenario where Infomax does not catch up to the full context sensitivity of C-DAC, is when cost of switching from one sensing location to another comes in to play. [sent-150, score-0.637]

75 In contrast, myopic C-DAC can adjust its switching boundary depending on context. [sent-152, score-0.506]

76 We illustrate the same for the case when (c, cs , β) = (0. [sent-153, score-0.161]

77 Figure 3: Different policies for the environment (c, cs , β) = (0. [sent-158, score-0.198]

78 4 how the differences in policy space translate to behavioral differences in terms of accuracy, search time, number of switches, and total behavioral cost (eq. [sent-167, score-0.601]

79 Note that, as expected, the performance of Infomax and Myopic C-DAC are closely matched on all measures for the case cs = 0. [sent-170, score-0.161]

80 The accuracy of C-DAC is poorer as compared to the other two, because the threshold used for the other policies is more conservative (thus stopping and declaration happens at higher conﬁdence, leading to higher accuracy), but C-DAC takes less time to reach the decision. [sent-171, score-0.245]

81 Looking at the overall behavioral costs, we can see that although C-DAC loses in accuracy, it makes up at other measures, leading to a comparable net cost. [sent-172, score-0.166]

82 This case exempliﬁes the context-sensitivity of C-DAC and Myopic C-DAC, as they both reduce number of switches when switching becomes costly. [sent-176, score-0.16]

83 When all these costs are combined we see that C-DAC incurs the minimum overall cost, followed by Myopic C-DAC, and Infomax incurs the highest cost due to its lack of ﬂexibility for a changed context. [sent-177, score-0.179]

84 7 Figure 4: Comparison between C-DAC, Infomax and Myopic C-DAC (MC-DAC) for two environments (c, cs , β) = (0. [sent-179, score-0.161]

85 For cs > 0, the performance of C-DAC is better than MC-DAC which in turn is better than Infomax. [sent-185, score-0.161]

86 5 Discussion In this paper, we presented a novel visual search experiment that involves ﬁnding a target amongst a set of distractors differentiated only by the stimulus characteristics. [sent-186, score-0.377]

87 We found that the ﬁxation and choice behavior of subjects is modulated by top-down factors, speciﬁcally the likelihood of a particular location containing the target. [sent-187, score-0.332]

88 This suggests that any purely bottom-up, saliency based model would be unable to fully explain human behavior. [sent-188, score-0.166]

89 Subjects were found to exhibit a certain conﬁrmation bias – the tendency to systematically favor a location that is a priori judged more likely to contain the target, compared to another location less likely to contain the target, even in the face of identical sensory input and motor state. [sent-189, score-0.623]

90 In contrast, a policy that aims to optimize statistical objectives of task demands and ignores behavioral constraints (e. [sent-191, score-0.371]

91 We proposed a bound on the stopping threshold that allows us to set the decision boundary for Infomax, by taking into account the time or sampling cost c, but that still does not sufﬁciently alleviate the context-insensitivity of Infomax. [sent-194, score-0.383]

92 This is most likely due to both a sub-optimal incorporation of sampling cost and an intrinsic lack of sensitivity toward switching cost, because there is no natural way to compare a unit of switching cost with a unit of information gain. [sent-195, score-0.326]

93 While C-DAC does a good job of matching human behavior, at least based on the behavioral metrics considered here, we note that this does not necessarily imply that the brain implements C-DAC exactly. [sent-197, score-0.325]

94 the number of possible target locations), thus making it an impractical solution for more natural and complex problems faced daily by humans and animals. [sent-200, score-0.193]

95 For this reason, we proposed a myopic approximation to C-DAC that scales linearly with search dimensionality, by eschewing a globally optimal solution that must be computed and maintained ofﬂine, in favor of an online, approximately and locally optimal solution. [sent-201, score-0.458]

96 This myopic C-DAC algorithm, by retaining context-sensitivity, was found to nevertheless reproduce critical ﬁxation choice and duration patterns, such as the conﬁrmation bias, seen in human behavior. [sent-202, score-0.528]

97 However, exact C-DAC was still better than myopic C-DAC at reproducing human data, leaving room for ﬁnding other approximations that explain brain computations even better. [sent-203, score-0.525]

98 We proposed one such bound on the stopping boundary here, and other approximate bounds have been proposed for similar problems (Naghshvar & Javidi, 2012). [sent-205, score-0.232]

99 Further investigations are needed to ﬁnd more inexpensive, yet context-sensitive active sensing policies, that would not only provide a better explanation for brain computations, but yield better practical algorithms for active sensing in engineering applications. [sent-206, score-0.778]

100 Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. [sent-225, score-0.255]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('infomax', 0.457), ('myopic', 0.366), ('sensing', 0.249), ('location', 0.225), ('xation', 0.211), ('stopping', 0.174), ('behavioral', 0.166), ('cs', 0.161), ('rmation', 0.147), ('target', 0.139), ('policy', 0.123), ('human', 0.123), ('active', 0.122), ('itti', 0.111), ('belief', 0.11), ('patch', 0.102), ('butko', 0.092), ('xate', 0.092), ('ahmad', 0.092), ('movellan', 0.085), ('pt', 0.083), ('observer', 0.083), ('visual', 0.083), ('stop', 0.082), ('switching', 0.082), ('cost', 0.081), ('action', 0.08), ('switches', 0.078), ('subjects', 0.073), ('locations', 0.066), ('search', 0.065), ('yu', 0.063), ('gilman', 0.062), ('qim', 0.062), ('declare', 0.06), ('fs', 0.06), ('koch', 0.058), ('boundary', 0.058), ('eye', 0.057), ('saccadic', 0.055), ('psychologically', 0.055), ('humans', 0.054), ('objectives', 0.051), ('baldi', 0.051), ('sensory', 0.05), ('tendency', 0.049), ('jolla', 0.048), ('bias', 0.047), ('incurred', 0.046), ('xt', 0.046), ('saliency', 0.043), ('dbm', 0.043), ('trial', 0.043), ('controller', 0.042), ('cdac', 0.042), ('javidi', 0.042), ('naghshvar', 0.042), ('nickerson', 0.042), ('yarbus', 0.042), ('lj', 0.039), ('duration', 0.039), ('switch', 0.039), ('policies', 0.037), ('pi', 0.037), ('najemnik', 0.037), ('beliefs', 0.036), ('drive', 0.036), ('costs', 0.036), ('decision', 0.036), ('brain', 0.036), ('diego', 0.035), ('actions', 0.035), ('behavior', 0.034), ('gaze', 0.034), ('distractors', 0.034), ('geisler', 0.034), ('advancing', 0.034), ('contingent', 0.034), ('distractor', 0.034), ('got', 0.034), ('ullman', 0.034), ('threshold', 0.034), ('bellman', 0.033), ('dynamic', 0.032), ('looking', 0.032), ('con', 0.031), ('task', 0.031), ('ine', 0.031), ('incurs', 0.031), ('trials', 0.03), ('san', 0.03), ('la', 0.029), ('patches', 0.029), ('state', 0.028), ('stimulus', 0.028), ('inexpensive', 0.028), ('amongst', 0.028), ('retains', 0.028), ('pk', 0.027), ('favor', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 69 nips-2013-Context-sensitive active sensing in humans

Author: Sheeraz Ahmad, He Huang, Angela J. Yu

2 0.17486063 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

Author: Stefan Mathe, Cristian Sminchisescu

Abstract: Human eye movements provide a rich source of information into the human visual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difﬁcult to develop reliable eye movement prediction systems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected under the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million ﬁxations recorded in 9157 images) and different task controls. Second, we propose Markov models to automatically discover areas of interest (AOI) and introduce novel sequential consistency metrics based on them. Our methods can automatically determine the number, the spatial support and the transitions between AOIs, in addition to their locations. Based on such encodings, we quantitatively show that given unconstrained read-world stimuli, task instructions have signiﬁcant inﬂuence on the human visual search patterns and are stable across subjects. Finally, we leverage powerful machine learning techniques and computer vision features in order to learn task-sensitive reward functions from eye movement data within models that allow to effectively predict the human visual search patterns based on inverse optimal control. The methodology achieves state of the art scanpath modeling results. 1

3 0.16732731 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

Author: Zhuo Wang, Alan Stocker, Daniel Lee

Abstract: In many neural systems, information about stimulus variables is often represented in a distributed manner by means of a population code. It is generally assumed that the responses of the neural population are tuned to the stimulus statistics, and most prior work has investigated the optimal tuning characteristics of one or a small number of stimulus variables. In this work, we investigate the optimal tuning for diffeomorphic representations of high-dimensional stimuli. We analytically derive the solution that minimizes the L2 reconstruction loss. We compared our solution with other well-known criteria such as maximal mutual information. Our solution suggests that the optimal weights do not necessarily decorrelate the inputs, and the optimal nonlinearity differs from the conventional equalization solution. Results illustrating these optimal representations are shown for some input distributions that may be relevant for understanding the coding of perceptual pathways. 1

4 0.1464891 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

Author: Shunan Zhang, Angela J. Yu

Abstract: How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observations, is an important problem in cognitive science. We investigate this behavior in the context of a multi-armed bandit task. We compare human behavior to a variety of models that vary in their representational and computational complexity. Our result shows that subjects’ choices, on a trial-totrial basis, are best captured by a “forgetful” Bayesian iterative learning model [21] in combination with a partially myopic decision policy known as Knowledge Gradient [7]. This model accounts for subjects’ trial-by-trial choice better than a number of other previously proposed models, including optimal Bayesian learning and risk minimization, ε-greedy and win-stay-lose-shift. It has the added beneﬁt of being closest in performance to the optimal Bayesian model than all the other heuristic models that have the same computational complexity (all are signiﬁcantly less complex than the optimal model). These results constitute an advancement in the theoretical understanding of how humans negotiate the tension between exploration and exploitation in a noisy, imperfectly known environment. 1

5 0.13738705 241 nips-2013-Optimizing Instructional Policies

Author: Robert Lindsey, Michael Mozer, William J. Huggins, Harold Pashler

Abstract: Psychologists are interested in developing instructional policies that boost student learning. An instructional policy speciﬁes the manner and content of instruction. For example, in the domain of concept learning, a policy might specify the nature of exemplars chosen over a training sequence. Traditional psychological studies compare several hand-selected policies, e.g., contrasting a policy that selects only diﬃcult-to-classify exemplars with a policy that gradually progresses over the training sequence from easy exemplars to more diﬃcult (known as fading). We propose an alternative to the traditional methodology in which we deﬁne a parameterized space of policies and search this space to identify the optimal policy. For example, in concept learning, policies might be described by a fading function that speciﬁes exemplar diﬃculty over time. We propose an experimental technique for searching policy spaces using Gaussian process surrogate-based optimization and a generative model of student performance. Instead of evaluating a few experimental conditions each with many human subjects, as the traditional methodology does, our technique evaluates many experimental conditions each with a few subjects. Even though individual subjects provide only a noisy estimate of the population mean, the optimization method allows us to determine the shape of the policy space and to identify the global optimum, and is as eﬃcient in its subject budget as a traditional A-B comparison. We evaluate the method via two behavioral studies, and suggest that the method has broad applicability to optimization problems involving humans outside the educational arena. 1

6 0.12376348 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

7 0.12135666 54 nips-2013-Bayesian optimization explains human active search

8 0.11669081 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

9 0.11090176 348 nips-2013-Variational Policy Search via Trajectory Optimization

10 0.10918546 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods

11 0.10026415 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

12 0.099165514 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

13 0.094985589 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

14 0.091409422 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

15 0.091159403 149 nips-2013-Latent Structured Active Learning

16 0.08912915 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

17 0.087407731 309 nips-2013-Statistical Active Learning Algorithms

18 0.085904174 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

19 0.082322352 257 nips-2013-Projected Natural Actor-Critic

20 0.081003428 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.195), (1, -0.118), (2, -0.136), (3, 0.002), (4, -0.02), (5, -0.033), (6, -0.032), (7, -0.002), (8, -0.042), (9, 0.072), (10, -0.077), (11, -0.021), (12, -0.052), (13, 0.0), (14, -0.094), (15, -0.041), (16, -0.114), (17, -0.049), (18, -0.016), (19, -0.07), (20, -0.066), (21, 0.021), (22, -0.074), (23, -0.027), (24, 0.002), (25, -0.048), (26, -0.105), (27, 0.1), (28, -0.084), (29, 0.019), (30, -0.054), (31, -0.028), (32, -0.012), (33, -0.042), (34, 0.163), (35, -0.148), (36, 0.002), (37, 0.084), (38, -0.038), (39, 0.05), (40, 0.069), (41, 0.026), (42, -0.098), (43, -0.061), (44, -0.028), (45, -0.053), (46, -0.017), (47, -0.069), (48, 0.039), (49, -0.056)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95348537 69 nips-2013-Context-sensitive active sensing in humans

Author: Sheeraz Ahmad, He Huang, Angela J. Yu

2 0.72676587 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

Author: Stefan Mathe, Cristian Sminchisescu

3 0.61359835 54 nips-2013-Bayesian optimization explains human active search

Author: Ali Borji, Laurent Itti

Abstract: Many real-world problems have complicated objective functions. To optimize such functions, humans utilize sophisticated sequential decision-making strategies. Many optimization algorithms have also been developed for this same purpose, but how do they compare to humans in terms of both performance and behavior? We try to unravel the general underlying algorithm people may be using while searching for the maximum of an invisible 1D function. Subjects click on a blank screen and are shown the ordinate of the function at each clicked abscissa location. Their task is to ﬁnd the function’s maximum in as few clicks as possible. Subjects win if they get close enough to the maximum location. Analysis over 23 non-maths undergraduates, optimizing 25 functions from different families, shows that humans outperform 24 well-known optimization algorithms. Bayesian Optimization based on Gaussian Processes, which exploits all the x values tried and all the f (x) values obtained so far to pick the next x, predicts human performance and searched locations better. In 6 follow-up controlled experiments over 76 subjects, covering interpolation, extrapolation, and optimization tasks, we further conﬁrm that Gaussian Processes provide a general and uniﬁed theoretical account to explain passive and active function learning and search in humans. 1

4 0.57198215 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

Author: Shunan Zhang, Angela J. Yu

5 0.56942409 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

Author: Matjaz Jogan, Alan Stocker

Abstract: How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model ﬁts the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. 1

6 0.53981495 241 nips-2013-Optimizing Instructional Policies

7 0.51976246 183 nips-2013-Mapping paradigm ontologies to and from the brain

8 0.5110032 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

9 0.46593171 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?

10 0.45831928 322 nips-2013-Symbolic Opportunistic Policy Iteration for Factored-Action MDPs

11 0.45473015 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

12 0.44271213 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

13 0.43840644 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

14 0.43724096 149 nips-2013-Latent Structured Active Learning

15 0.43523061 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

16 0.43491828 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

17 0.42601073 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

18 0.4078618 323 nips-2013-Synthesizing Robust Plans under Incomplete Domain Models

19 0.40427223 38 nips-2013-Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

20 0.39719433 309 nips-2013-Statistical Active Learning Algorithms

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.013), (16, 0.017), (27, 0.223), (33, 0.108), (34, 0.122), (41, 0.043), (49, 0.046), (56, 0.112), (70, 0.035), (75, 0.019), (85, 0.034), (89, 0.041), (93, 0.079), (95, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94655162 85 nips-2013-Deep content-based music recommendation

Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen

Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative ﬁltering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks signiﬁcantly outperforming the traditional approach. 1

same-paper 2 0.83790082 69 nips-2013-Context-sensitive active sensing in humans

Author: Sheeraz Ahmad, He Huang, Angela J. Yu

3 0.77381194 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

Author: Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang

Abstract: Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the exact one. These approximate LP solutions can be computed efﬁciently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation of the LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude faster than Cplex (a commercial LP solver) while producing solutions of similar quality. 1

4 0.69182795 99 nips-2013-Dropout Training as Adaptive Regularization

Author: Stefan Wager, Sida Wang, Percy Liang

Abstract: Dropout and other feature noising schemes control overﬁtting by artiﬁcially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is ﬁrst-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and ﬁnd that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classiﬁcation tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. 1

5 0.68828845 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

Author: Xiaoxiao Guo, Satinder Singh, Richard L. Lewis

Abstract: We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a ﬁnite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent’s behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Speciﬁcally, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent’s performance relative to other approaches, including an approach that transfers policies. 1

6 0.68641263 5 nips-2013-A Deep Architecture for Matching Short Texts

7 0.68551689 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

8 0.68101096 215 nips-2013-On Decomposing the Proximal Map

9 0.680143 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies

10 0.67605215 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

11 0.67560554 249 nips-2013-Polar Operators for Structured Sparse Estimation

12 0.6750648 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

13 0.67506069 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

14 0.67423749 280 nips-2013-Robust Data-Driven Dynamic Programming

15 0.67391193 318 nips-2013-Structured Learning via Logistic Regression

16 0.67385316 201 nips-2013-Multi-Task Bayesian Optimization

17 0.67291105 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms

18 0.67250997 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

19 0.67202121 347 nips-2013-Variational Planning for Graph-based MDPs

20 0.6718632 79 nips-2013-DESPOT: Online POMDP Planning with Regularization