nips nips2005 nips2005-4 knowledge-graph by maker-knowledge-mining

4 nips-2005-A Bayesian Spatial Scan Statistic

Source: pdf

Author: Daniel B. Neill, Andrew W. Moore, Gregory F. Cooper

Abstract: We propose a new Bayesian method for spatial cluster detection, the “Bayesian spatial scan statistic,” and compare this method to the standard (frequentist) scan statistic approach. We demonstrate that the Bayesian statistic has several advantages over the frequentist approach, including increased power to detect clusters and (since randomization testing is unnecessary) much faster runtime. We evaluate the Bayesian and frequentist methods on the task of prospective disease surveillance: detecting spatial clusters of disease cases resulting from emerging disease outbreaks. We demonstrate that our Bayesian methods are successful in rapidly detecting outbreaks while keeping number of false positives low. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose a new Bayesian method for spatial cluster detection, the “Bayesian spatial scan statistic,” and compare this method to the standard (frequentist) scan statistic approach. [sent-8, score-0.945]

2 We demonstrate that the Bayesian statistic has several advantages over the frequentist approach, including increased power to detect clusters and (since randomization testing is unnecessary) much faster runtime. [sent-9, score-0.699]

3 We evaluate the Bayesian and frequentist methods on the task of prospective disease surveillance: detecting spatial clusters of disease cases resulting from emerging disease outbreaks. [sent-10, score-1.361]

4 We demonstrate that our Bayesian methods are successful in rapidly detecting outbreaks while keeping number of false positives low. [sent-11, score-0.298]

5 1 Introduction Here we focus on the task of spatial cluster detection: ﬁnding spatial regions where some quantity is signiﬁcantly higher than expected. [sent-12, score-0.425]

6 For example, our goal may be to detect clusters of disease cases, which may be indicative of a naturally occurring epidemic (e. [sent-13, score-0.329]

7 Thus we compare the null hypothesis H0 of no clusters against some set of alternative hypotheses H1 (S), each representing a cluster in some region or regions S. [sent-22, score-0.43]

8 In the standard frequentist setting, we do this by signiﬁcance testing, computing the p-values of potential clusters by randomization; here we propose a Bayesian framework, in which we compute posterior probabilities of each potential cluster. [sent-23, score-0.569]

9 Our primary motivating application is prospective disease surveillance: detecting spatial clusters of disease cases resulting from a disease outbreak. [sent-24, score-0.949]

10 We must then detect those increases which are indicative of emerging outbreaks, as close to the start of the outbreak as possible, while keeping the number of false positives low. [sent-32, score-0.636]

11 In this spatial surveillance setting, each day we have data collected for a set of discrete spatial locations si . [sent-34, score-0.505]

12 number of disease cases), and an underlying baseline bi . [sent-37, score-0.359]

13 Our goal, then, is to ﬁnd if there is any spatial region S (set of locations si ) for which the counts are signiﬁcantly higher than expected, given the baselines. [sent-41, score-0.353]

14 For simplicity, we assume here (as in [2]) that the locations si are aggregated to a uniform, two-dimensional, N × N grid G, and we search over the set of rectangular regions S ⊆ G. [sent-42, score-0.291]

15 This allows us to search both compact and elongated regions, allowing detection of elongated disease clusters resulting from dispersal of pathogens by wind or water. [sent-43, score-0.397]

16 1 The frequentist scan statistic One of the most important statistical tools for cluster detection is Kulldorff’s spatial scan statistic [3-4]. [sent-45, score-1.376]

17 This method searches over a given set of spatial regions, ﬁnding those regions which maximize a likelihood ratio statistic and thus are most likely to be generated under the alternative hypothesis of clustering rather than the null hypothesis of no clustering. [sent-46, score-0.587]

18 Kulldorff’s framework assumes that counts ci are Poisson distributed with ci ∼ Po(qbi ), where bi represents the (known) census population of cell si and q is the (unknown) underlying disease rate. [sent-48, score-0.618]

19 Then the goal of the scan statistic is to ﬁnd regions where the disease rate is higher inside the region than 1 outside. [sent-49, score-0.752]

20 The statistic used for this is the likelihood ratio F(S) = P(Data | HH(S)) , where the P(Data | 0 ) null hypothesis H0 assumes a uniform disease rate q = qall . [sent-50, score-0.66]

21 Under H1 (S), we assume that q = qin for all si ∈ S, and q = qout for all si ∈ G − S, for some constants qin > qout . [sent-51, score-0.622]

22 From this, we can derive an expression for F(S) using maximum likelihood estimates of qin , qout , and qall : F(S) = ( Cin )Cin ( Cout )Cout ( Call )−Call , if Cin > Cout , and F(S) = 1 otherwise. [sent-52, score-0.395]

23 Bin Bout Ball Bin Bout In this expression, we have Cin = ∑S ci , Cout = ∑G−S ci , Call = ∑G ci , and similarly for the baselines Bin = ∑S bi , Bout = ∑G−S bi , and Ball = ∑G bi . [sent-53, score-0.613]

24 Once we have found the highest scoring region S∗ = arg maxS F(S) of grid G, and its score F ∗ = F(S∗ ), we must still determine the statistical signiﬁcance of this region by randomization testing. [sent-54, score-0.376]

25 To do so, we randomly create a large number R of replica grids by sampling under the null hypothesis ci ∼ Po(qall bi ), and ﬁnd the highest scoring region and its score for each replica grid. [sent-55, score-0.51]

26 05), we can conclude that the discovered region is unlikely to have occurred by chance, and is thus a signiﬁcant spatial cluster; otherwise, no signiﬁcant clusters exist. [sent-60, score-0.329]

27 The frequentist scan statistic is a useful tool for cluster detection, and is commonly used in the public health community for detection of disease outbreaks. [sent-61, score-1.122]

28 First, it is difﬁcult to make use of any prior information that we may have, for example, our prior beliefs about the size of a potential outbreak and its impact on disease rate. [sent-63, score-0.784]

29 Finally, the frequentist scan statistic is very time consuming, and may be computationally infeasible for large datasets. [sent-66, score-0.731]

30 A naive approach requires searching over all rectangular regions, both for the original grid and for each replica grid. [sent-67, score-0.234]

31 In past work [5, 2, 6], we have shown how to reduce this computation time by a factor of 20-2000x through use of the “fast spatial scan” algorithm; nevertheless, we must still perform this faster search both for the original grid and for each replica. [sent-69, score-0.299]

32 We propose to remedy these problems through the use of a Bayesian spatial scan statistic. [sent-70, score-0.374]

33 If these priors are chosen well, we should achieve better detection power than the frequentist approach. [sent-72, score-0.49]

34 Second, the Bayesian method uses a marginal likelihood approach, averaging over possible values of the model parameters qin , qout , and qall , rather than relying on maximum likelihood estimates of these parameters. [sent-73, score-0.424]

35 We now present the Bayesian spatial scan statistic, and then compare it to the frequentist approach on the task of detecting simulated disease epidemics. [sent-76, score-1.025]

36 2 The Bayesian scan statistic Here we consider the natural Bayesian extension of Kulldorff’s scan statistic, moving from a Poisson to a conjugate Gamma-Poisson model. [sent-77, score-0.588]

37 Bayesian Gamma-Poisson models are a common representation for count data in epidemiology, and have been used in disease mapping by Clayton and Kaldor [7], Molli´ [8], and others. [sent-78, score-0.291]

38 For the Bayesian spatial scan, as in the frequentist approach, we wish to compare the null hypothesis H0 of no clusters to the set of alternative hypotheses H1 (S), each representing a cluster in some region S. [sent-80, score-0.879]

39 The difference is that we assume a hierarchical Bayesian model where the disease rates qin , qout , and qall are themselves drawn from Gamma distributions. [sent-82, score-0.596]

40 Thus, under the null hypothesis H0 , we have q = qall for all si ∈ G, where qall ∼ Ga(αall , βall ). [sent-83, score-0.446]

41 Under the alternative hypothesis H1 (S), we have q = qin for all si ∈ S and q = qout for all si ∈ G − S, where we independently draw qin ∼ Ga(αin , βin ) and qout ∼ Ga(αout , βout ). [sent-84, score-0.681]

42 To compute the marginal likelihood of the data given each hypothesis, we must integrate over all possible values of the parameters (qin , qout , qall ) weighted by their respective probabilities. [sent-88, score-0.265]

43 αall +Call out +Cout Γ(α in +C +B (βall +Ball ) Γ(αall ) in in in out out out The Bayesian spatial scan statistic can be computed simply by ﬁrst calculating the score P(D | H1 (S))P(H1 (S)) for each spatial region S, maintaining a list of regions ordered by score. [sent-91, score-0.846]

44 For any region S that we examine, we must have values of the parameter priors αin (S), βin (S), αout (S), and βout (S), as well as the region prior probability P(H1 (S)). [sent-100, score-0.26]

45 Here we consider the simple case of a uniform region prior, with a known prior probability of an outbreak P1 . [sent-102, score-0.593]

46 For the parameter priors, we assume that we have access to a large number of days of past data, during which no outbreaks are known to have occurred. [sent-107, score-0.305]

47 Varsample Call Ball The calculation of priors αin (S), βin (S), αout (S), and βout (S) is identical except for two differences: ﬁrst, we must condition on the region S, and second, we must assume the alternative hypothesis H1 (S) rather than the null hypothesis H0 . [sent-111, score-0.345]

48 Note that an outbreak in some region S does not affect the disease rate outside region S. [sent-113, score-0.886]

49 We assume that the outbreak will increase qin by a multiplicative factor m, thus multiplying the mean and variance of Cin by m. [sent-116, score-0.602]

50 In this “blind Bayesian” (BBayes) case, we assume that counts are randomly generated under the null hypothesis ci ∼ Po(q0 bi ), where q0 is the expected ratio of count to baseline under the null (for example, q0 = 1 if baselines are obtained by estimating the expected value of the count). [sent-126, score-0.671]

51 3 Results: detection power We evaluated the Bayesian and frequentist methods on two types of simulated respiratory outbreaks, injected into real Emergency Department and over-the-counter drug sales data for Allegheny County, Pennsylvania. [sent-132, score-0.583]

52 All data were aggregated to the zip code level to ensure anonymity, giving the daily counts of respiratory ED cases and sales of OTC cough and cold medication in each of 88 zip codes for one year. [sent-133, score-0.307]

53 The BARD simulator uses a Bayesian network model to determine the number of spores inhaled by individuals in affected areas, the resulting number and severity of anthrax cases, and the resulting number of respiratory ED cases on each day of the outbreak in each affected zip code. [sent-139, score-0.755]

54 Our second type of outbreak was a simulated “Fictional Linear Onset Outbreak” (or “FLOO”), as in [10]. [sent-140, score-0.499]

55 A FLOO(∆, T ) outbreak is a simple simulated outbreak with duration T , which generates t∆ cases in each affected zip code on day t of the outbreak (0 < t ≤ T /2), then generates T ∆/2 cases per day for the remainder of the outbreak. [sent-141, score-1.719]

56 Thus we have an outbreak where the number of cases ramps up linearly and then levels off. [sent-142, score-0.472]

57 While this is clearly a less realistic outbreak than the BARD-simulated anthrax attack, it does have several advantages: most importantly, it allows us to precisely control the slope of the outbreak curve and examine how this affects our methods’ detection ability. [sent-143, score-1.089]

58 To test detection power, a semi-synthetic testing framework similar to [10] was used: we ﬁrst run our spatial scan statistic for each day of the last nine months of the year (the ﬁrst three months are used only to estimate baselines and priors), and obtain the score F ∗ for each day. [sent-144, score-0.836]

59 Then for each outbreak we wish to test, we inject that outbreak into the data, and obtain the score F ∗ (t) for each day t of the outbreak. [sent-145, score-1.077]

60 By ﬁnding the proportion of baseline days with scores higher than F ∗ (t), we can determine the proportion of false positives we would have to accept to detect the outbreak on day t. [sent-146, score-0.908]

61 This allows us to compute, for any given level of false positives, what proportion of outbreaks can be detected, and the mean number of days to detection. [sent-147, score-0.353]

62 In Table 1, we compare these methods with respect to proportion of outbreaks detected and Table 1: Days to detect and proportion of outbreaks detected, 1 false positive/month method frequentist Bayes max BBayes max Bayes tot BBayes tot FLOO ED (4,14) 1. [sent-152, score-1.126]

63 Methods were evaluated on seven types of simulated outbreaks: three FLOO outbreaks on ED data, two FLOO outbreaks on OTC data, and two BARD outbreaks (with different amounts of anthrax release) on ED data. [sent-190, score-0.665]

64 For each outbreak type, each method’s performance was averaged over 100 or 250 simulated outbreaks for BARD or FLOO respectively. [sent-191, score-0.688]

65 For the ﬁve runs on ED data, all four Bayesian methods consistently detected outbreaks faster than the frequentist method. [sent-193, score-0.583]

66 43 days (BBayes max) as compared to the frequentist approach; “max” methods performed substantially better than “tot” methods, and “BBayes” methods performed slightly better than “Bayes” methods. [sent-197, score-0.459]

67 For the two runs on OTC data, on the other hand, most of the Bayesian methods performed much worse (over 1 day slower) than the frequentist method. [sent-198, score-0.474]

68 The exception was the Bayes tot method, which again outperformed the frequentist method by an average of 0. [sent-199, score-0.475]

69 On the other hand, the Bayes methods (which instead learn the priors from previous counts and baselines) can adjust for consistent misestimation of baselines and thus more accurately account for these seasonal trends. [sent-204, score-0.226]

70 The total posterior probability of an outbreak, on the other hand, will still be higher for outbreak than non-outbreak days, so the “tot” methods can perform well on OTC as well as ED data. [sent-206, score-0.535]

71 Thus, our main result is that the Bayes tot method, which infers baselines from past counts and uses total posterior probability of an outbreak to decide when to sound the alarm, consistently outperforms the frequentist method for both ED and OTC datasets. [sent-207, score-1.18]

72 Thus, as long as the search time per region is comparable for the Bayesian and frequentist methods, we expect the Bayesian approach to be approximately 1000x faster. [sent-209, score-0.492]

73 In Table 2, we compare the run times of the Bayes, BBayes, and frequen- Table 2: Comparison of run times for varying grid size N method Bayes (naive) BBayes (naive) frequentist (naive) frequentist (fast) N = 16 0. [sent-210, score-0.832]

74 7 min N = 128 44 min 37 min ∼31 days 77 min N = 256 12 hrs 10 hrs ∼500 days 10 hrs tist methods for searching a single grid and calculating signiﬁcance (p-values or posterior probabilities for the frequentist and Bayesian methods respectively), as a function of the grid size N. [sent-219, score-1.014]

75 All rectangles up to size N/2 were searched, and for the frequentist method R = 1000 replications were performed. [sent-220, score-0.404]

76 The results conﬁrm our intuition: the Bayesian methods are 900-1200x faster than the frequentist approach, for all values of N tested. [sent-221, score-0.369]

77 However, the frequentist approach can be accelerated dramatically using our “fast spatial scan” algorithm [2], a multiresolution search method which can ﬁnd the highest scoring region of a grid while searching only a small subset of regions. [sent-222, score-0.768]

78 Comparing the fast spatial scan to the Bayesian approach, we see that the fast spatial scan is slower than the Bayesian method for grid sizes up to N = 128, but slightly faster for N = 256. [sent-223, score-0.894]

79 Thus we now have two options for making the spatial scan statistic computationally feasible for large grid sizes: to use the fast spatial scan to speed up the frequentist scan statistic, or to use the Bayesian scan statistics framework (in which case the naive algorithm is typically fast enough). [sent-224, score-1.887]

80 For even larger grid sizes, it may be possible to extend the fast spatial scan to the Bayesian approach: this would give us the best of both worlds, searching only one grid, and using a fast algorithm to do so. [sent-225, score-0.554]

81 5 Discussion We have presented a Bayesian spatial scan statistic, and demonstrated several ways in which this method is preferable to the standard (frequentist) scan statistics approach. [sent-227, score-0.6]

82 In Section 3, we demonstrated that the Bayesian method, with a relatively non-informative prior distribution, consistently outperforms the frequentist method with respect to detection power. [sent-228, score-0.472]

83 In Section 4, we demonstrated that the Bayesian spatial scan can be computed in much less time than the frequentist method, since randomization testing is unnecessary. [sent-232, score-0.838]

84 This allows us to search large grid sizes using a naive search algorithm, and even larger grids might be searched by extending the fast spatial scan to the Bayesian framework. [sent-233, score-0.618]

85 First, the Bayesian method has easily interpretable results: it outputs the posterior probability that an outbreak has occurred, and the distribution of this probability over possible outbreak regions. [sent-235, score-1.007]

86 public health ofﬁcial) to decide whether to investigate each potential outbreak based on the costs of false positives and false negatives; this type of decision analysis cannot be done easily in the frequentist framework. [sent-238, score-1.018]

87 Another useful result of the Bayesian method is that we can compute a “map” of the posterior probabilities of an outbreak in each grid cell, by summing the posterior probabilities P(H1 (S) | D) of all regions containing that cell. [sent-239, score-0.814]

88 We give an example of such a map below: Figure 1: Output of Bayesian spatial scan on baseline OTC data, 1/30/05. [sent-241, score-0.432]

89 Cell shading is based on posterior probability of an outbreak in that cell, ranging from white (0%) to black (100%). [sent-242, score-0.535]

90 Second, calibration of the Bayesian statistic is easier than calibration of the frequentist statistic. [sent-248, score-0.505]

91 As noted above, it is simple to adjust the sensitivity and speciﬁcity of the Bayesian method by setting the prior probability of an outbreak P1 , and then we can “sound the alarm” whenever posterior probability of an outbreak exceeds some threshold. [sent-249, score-1.036]

92 In the frequentist method, on the other hand, many regions in the baseline data have sufﬁciently high likelihood ratios that no replicas beat the original grid; thus we cannot distinguish the p-values of outbreak and non-outbreak days. [sent-250, score-0.996]

93 In conclusion, we note that, though both Bayesian modeling [7-8] and (frequentist) spatial scanning [3-4] are common in the spatial statistics literature, this is (to the best of our knowledge) the ﬁrst model which combines the two techniques into a single framework. [sent-256, score-0.296]

94 In fact, very little work exists on Bayesian methods for spatial cluster detection. [sent-257, score-0.209]

95 One notable exception is the literature on spatial cluster modeling [11-12], which attempts to infer the location of cluster centers by inferring parameters of a Bayesian process model. [sent-258, score-0.27]

96 Thus we believe that this is the ﬁrst Bayesian spatial cluster detection method which is powerful and useful, yet computationally tractable. [sent-260, score-0.283]

97 We are currently running the Bayesian and frequentist scan statistics on daily OTC sales data from over 10000 stores, searching for emerging disease outbreaks on a daily basis nationwide. [sent-261, score-1.212]

98 We believe that the Bayesian approach has the potential to improve both speed and detection power of the spatial scan in this domain as well. [sent-263, score-0.472]

99 A fast multi-resolution method for detection of signiﬁcant spatial disease clusters. [sent-300, score-0.478]

100 Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. [sent-317, score-0.23]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('outbreak', 0.472), ('frequentist', 0.369), ('disease', 0.23), ('scan', 0.226), ('outbreaks', 0.189), ('otc', 0.153), ('spatial', 0.148), ('bayesian', 0.147), ('bbayes', 0.142), ('statistic', 0.136), ('cin', 0.13), ('cout', 0.13), ('floo', 0.13), ('qin', 0.13), ('bout', 0.118), ('qall', 0.118), ('qout', 0.118), ('ga', 0.106), ('tot', 0.106), ('day', 0.105), ('po', 0.103), ('ci', 0.102), ('baselines', 0.094), ('grid', 0.094), ('region', 0.092), ('days', 0.09), ('null', 0.088), ('esample', 0.083), ('varsample', 0.083), ('detection', 0.074), ('ball', 0.073), ('bin', 0.071), ('anthrax', 0.071), ('bard', 0.071), ('bi', 0.071), ('randomization', 0.07), ('regions', 0.068), ('zip', 0.066), ('si', 0.063), ('posterior', 0.063), ('clusters', 0.062), ('count', 0.061), ('cluster', 0.061), ('neill', 0.059), ('qbi', 0.059), ('hypothesis', 0.059), ('baseline', 0.058), ('bayes', 0.057), ('hrs', 0.051), ('counts', 0.05), ('dq', 0.047), ('kulldorff', 0.047), ('sales', 0.047), ('priors', 0.047), ('emerging', 0.043), ('false', 0.043), ('positives', 0.041), ('respiratory', 0.041), ('surveillance', 0.041), ('daily', 0.037), ('detect', 0.037), ('naive', 0.036), ('aerosol', 0.035), ('emergency', 0.035), ('seasonal', 0.035), ('call', 0.035), ('rectangles', 0.035), ('rectangular', 0.035), ('replica', 0.035), ('sec', 0.035), ('searching', 0.034), ('gamma', 0.033), ('proportion', 0.031), ('search', 0.031), ('release', 0.03), ('prior', 0.029), ('likelihood', 0.029), ('score', 0.028), ('probabilities', 0.027), ('cance', 0.027), ('occurred', 0.027), ('simulated', 0.027), ('alarm', 0.026), ('attack', 0.026), ('public', 0.026), ('searched', 0.026), ('past', 0.026), ('fast', 0.026), ('detecting', 0.025), ('detected', 0.025), ('drug', 0.025), ('testing', 0.025), ('potential', 0.024), ('biometrics', 0.024), ('bioterrorist', 0.024), ('lawson', 0.024), ('molli', 0.024), ('nreg', 0.024), ('prospective', 0.024), ('rbeat', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 4 nips-2005-A Bayesian Spatial Scan Statistic

Author: Daniel B. Neill, Andrew W. Moore, Gregory F. Cooper

2 0.058189958 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks

Author: Lei Zhang, Dimitris Samaras, Nelly Alia-klein, Nora Volkow, Rita Goldstein

Abstract: Functional Magnetic Resonance Imaging (fMRI) has enabled scientists to look into the active brain. However, interactivity between functional brain regions, is still little studied. In this paper, we contribute a novel framework for modeling the interactions between multiple active brain regions, using Dynamic Bayesian Networks (DBNs) as generative models for brain activation patterns. This framework is applied to modeling of neuronal circuits associated with reward. The novelty of our framework from a Machine Learning perspective lies in the use of DBNs to reveal the brain connectivity and interactivity. Such interactivity models which are derived from fMRI data are then validated through a group classiﬁcation task. We employ and compare four different types of DBNs: Parallel Hidden Markov Models, Coupled Hidden Markov Models, Fully-linked Hidden Markov Models and Dynamically MultiLinked HMMs (DML-HMM). Moreover, we propose and compare two schemes of learning DML-HMMs. Experimental results show that by using DBNs, group classiﬁcation can be performed even if the DBNs are constructed from as few as 5 brain regions. We also demonstrate that, by using the proposed learning algorithms, different DBN structures characterize drug addicted subjects vs. control subjects. This ﬁnding provides an independent test for the effect of psychopathology on brain function. In general, we demonstrate that incorporation of computer science principles into functional neuroimaging clinical studies provides a novel approach for probing human brain function.

3 0.055552818 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

Author: Kazuho Watanabe, Sumio Watanabe

Abstract: The Variational Bayesian framework has been widely used to approximate the Bayesian learning. In various applications, it has provided computational tractability and good generalization performance. In this paper, we discuss the Variational Bayesian learning of the mixture of exponential families and provide some additional theoretical support by deriving the asymptotic form of the stochastic complexity. The stochastic complexity, which corresponds to the minimum free energy and a lower bound of the marginal likelihood, is a key quantity for model selection. It also enables us to discuss the eﬀect of hyperparameters and the accuracy of the Variational Bayesian approach as an approximation of the true Bayesian learning. 1

4 0.054788925 18 nips-2005-Active Learning For Identifying Function Threshold Boundaries

Author: Brent Bryan, Robert C. Nichol, Christopher R. Genovese, Jeff Schneider, Christopher J. Miller, Larry Wasserman

Abstract: We present an efﬁcient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassiﬁcation rate, variance, and their combinations, and show how they perform on a number of data sets. We then show how these algorithms are used to determine simultaneously valid 1 − α conﬁdence intervals for seven cosmological parameters. Experimentation shows that the algorithm reduces the computation necessary for the parameter estimation problem by an order of magnitude.

5 0.051769182 140 nips-2005-Nonparametric inference of prior probabilities from Bayes-optimal behavior

Author: Liam Paninski

Abstract: We discuss a method for obtaining a subject’s a priori beliefs from his/her behavior in a psychophysics context, under the assumption that the behavior is (nearly) optimal from a Bayesian perspective. The method is nonparametric in the sense that we do not assume that the prior belongs to any ﬁxed class of distributions (e.g., Gaussian). Despite this increased generality, the method is relatively simple to implement, being based in the simplest case on a linear programming algorithm, and more generally on a straightforward maximum likelihood or maximum a posteriori formulation, which turns out to be a convex optimization problem (with no non-global local maxima) in many important cases. In addition, we develop methods for analyzing the uncertainty of these estimates. We demonstrate the accuracy of the method in a simple simulated coin-ﬂipping setting; in particular, the method is able to precisely track the evolution of the subject’s posterior distribution as more and more data are observed. We close by brieﬂy discussing an interesting connection to recent models of neural population coding.

6 0.050280854 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

7 0.047301058 14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

8 0.046591315 33 nips-2005-Bayesian Sets

9 0.045687124 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

10 0.043177344 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection

11 0.04305971 103 nips-2005-Kernels for gene regulatory regions

12 0.041103911 131 nips-2005-Multiple Instance Boosting for Object Detection

13 0.039796509 30 nips-2005-Assessing Approximations for Gaussian Process Classification

14 0.039759003 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis

15 0.039391369 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception

16 0.037780117 102 nips-2005-Kernelized Infomax Clustering

17 0.03737637 156 nips-2005-Prediction and Change Detection

18 0.036272388 159 nips-2005-Q-Clustering

19 0.035993692 93 nips-2005-Ideal Observers for Detecting Motion: Correspondence Noise

20 0.034845781 121 nips-2005-Location-based activity recognition

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.123), (1, 0.002), (2, 0.007), (3, 0.062), (4, 0.004), (5, -0.012), (6, -0.023), (7, 0.023), (8, -0.074), (9, 0.027), (10, -0.007), (11, -0.051), (12, 0.112), (13, -0.008), (14, 0.018), (15, -0.027), (16, 0.026), (17, 0.002), (18, 0.023), (19, 0.037), (20, 0.016), (21, 0.051), (22, 0.054), (23, -0.063), (24, -0.02), (25, 0.067), (26, -0.027), (27, 0.015), (28, -0.069), (29, 0.075), (30, -0.015), (31, 0.076), (32, -0.02), (33, -0.016), (34, -0.053), (35, -0.042), (36, 0.07), (37, -0.191), (38, 0.012), (39, -0.009), (40, -0.019), (41, 0.069), (42, 0.032), (43, -0.001), (44, 0.184), (45, 0.065), (46, 0.096), (47, -0.221), (48, 0.041), (49, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94243646 4 nips-2005-A Bayesian Spatial Scan Statistic

Author: Daniel B. Neill, Andrew W. Moore, Gregory F. Cooper

2 0.54607582 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks

Author: Lei Zhang, Dimitris Samaras, Nelly Alia-klein, Nora Volkow, Rita Goldstein

3 0.43592215 35 nips-2005-Bayesian model learning in human visual perception

Author: Gergő Orbán, Jozsef Fiser, Richard N. Aslin, Máté Lengyel

Abstract: Humans make optimal perceptual decisions in noisy and ambiguous conditions. Computations underlying such optimal behavior have been shown to rely on probabilistic inference according to generative models whose structure is usually taken to be known a priori. We argue that Bayesian model selection is ideal for inferring similar and even more complex model structures from experience. We ﬁnd in experiments that humans learn subtle statistical properties of visual scenes in a completely unsupervised manner. We show that these ﬁndings are well captured by Bayesian model learning within a class of models that seek to explain observed variables by independent hidden causes. 1

4 0.43570352 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine

Author: François Laviolette, Mario Marchand, Mohak Shah

Abstract: We design a new learning algorithm for the Set Covering Machine from a PAC-Bayes perspective and propose a PAC-Bayes risk bound which is minimized for classiﬁers achieving a non trivial margin-sparsity trade-oﬀ. 1

5 0.43449336 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models

Author: Kazuho Watanabe, Sumio Watanabe

6 0.42209148 103 nips-2005-Kernels for gene regulatory regions

7 0.4086428 18 nips-2005-Active Learning For Identifying Function Threshold Boundaries

8 0.39233837 127 nips-2005-Mixture Modeling by Affinity Propagation

9 0.38343489 33 nips-2005-Bayesian Sets

10 0.36510298 3 nips-2005-A Bayesian Framework for Tilt Perception and Confidence

11 0.33753687 173 nips-2005-Sensory Adaptation within a Bayesian Framework for Perception

12 0.3323406 156 nips-2005-Prediction and Change Detection

13 0.32907644 93 nips-2005-Ideal Observers for Detecting Motion: Correspondence Noise

14 0.32332721 68 nips-2005-Factorial Switching Kalman Filters for Condition Monitoring in Neonatal Intensive Care

15 0.31858429 146 nips-2005-On the Accuracy of Bounded Rationality: How Far from Optimal Is Fast and Frugal?

16 0.30860093 45 nips-2005-Conditional Visual Tracking in Kernel Space

17 0.30240083 140 nips-2005-Nonparametric inference of prior probabilities from Bayes-optimal behavior

18 0.29034087 34 nips-2005-Bayesian Surprise Attracts Human Attention

19 0.2889711 14 nips-2005-A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

20 0.27819261 205 nips-2005-Worst-Case Bounds for Gaussian Process Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.038), (10, 0.022), (25, 0.401), (27, 0.043), (31, 0.036), (34, 0.074), (39, 0.026), (44, 0.017), (55, 0.023), (57, 0.018), (69, 0.061), (73, 0.033), (88, 0.062), (91, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80506819 4 nips-2005-A Bayesian Spatial Scan Statistic

Author: Daniel B. Neill, Andrew W. Moore, Gregory F. Cooper

2 0.68056846 169 nips-2005-Saliency Based on Information Maximization

Author: Neil Bruce, John Tsotsos

Abstract: A model of bottom-up overt attention is proposed based on the principle of maximizing information sampled from a scene. The proposed operation is based on Shannon's self-information measure and is achieved in a neural circuit, which is demonstrated as having close ties with the circuitry existent in the primate visual cortex. It is further shown that the proposed saliency measure may be extended to address issues that currently elude explanation in the domain of saliency based models. Resu lts on natural images are compared with experimental eye tracking data revealing the efficacy of the model in predicting the deployment of overt attention as compared with existing efforts.

3 0.49257332 195 nips-2005-Transfer learning for text classification

Author: Chuong B. Do, Andrew Y. Ng

Abstract: Linear text classiﬁcation algorithms work by computing an inner product between a test document vector and a parameter vector. In many such algorithms, including naive Bayes and most TFIDF variants, the parameters are determined by some simple, closed-form, function of training set statistics; we call this mapping mapping from statistics to parameters, the parameter function. Much research in text classiﬁcation over the last few decades has consisted of manual efforts to identify better parameter functions. In this paper, we propose an algorithm for automatically learning this function from related classiﬁcation problems. The parameter function found by our algorithm then deﬁnes a new learning algorithm for text classiﬁcation, which we can apply to novel classiﬁcation tasks. We ﬁnd that our learned classiﬁer outperforms existing methods on a variety of multiclass text classiﬁcation tasks. 1

4 0.35373679 193 nips-2005-The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Author: Gregory Zelinsky, Wei Zhang, Bing Yu, Xin Chen, Dimitris Samaras

Abstract: To investigate how top-down (TD) and bottom-up (BU) information is weighted in the guidance of human search behavior, we manipulated the proportions of BU and TD components in a saliency-based model. The model is biologically plausible and implements an artiﬁcial retina and a neuronal population code. The BU component is based on featurecontrast. The TD component is deﬁned by a feature-template match to a stored target representation. We compared the model’s behavior at different mixtures of TD and BU components to the eye movement behavior of human observers performing the identical search task. We found that a purely TD model provides a much closer match to human behavior than any mixture model using BU information. Only when biological constraints are removed (e.g., eliminating the retina) did a BU/TD mixture model begin to approximate human behavior.

5 0.33124885 34 nips-2005-Bayesian Surprise Attracts Human Attention

Author: Laurent Itti, Pierre F. Baldi

Abstract: The concept of surprise is central to sensory processing, adaptation, learning, and attention. Yet, no widely-accepted mathematical theory currently exists to quantitatively characterize surprise elicited by a stimulus or event, for observers that range from single neurons to complex natural or engineered systems. We describe a formal Bayesian deﬁnition of surprise that is the only consistent formulation under minimal axiomatic assumptions. Surprise quantiﬁes how data affects a natural or artiﬁcial observer, by measuring the difference between posterior and prior beliefs of the observer. Using this framework we measure the extent to which humans direct their gaze towards surprising items while watching television and video games. We ﬁnd that subjects are strongly attracted towards surprising locations, with 72% of all human gaze shifts directed towards locations more surprising than the average, a ﬁgure which rises to 84% when considering only gaze targets simultaneously selected by all subjects. The resulting theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction. Life is full of surprises, ranging from a great christmas gift or a new magic trick, to wardrobe malfunctions, reckless drivers, terrorist attacks, and tsunami waves. Key to survival is our ability to rapidly attend to, identify, and learn from surprising events, to decide on present and future courses of action [1]. Yet, little theoretical and computational understanding exists of the very essence of surprise, as evidenced by the absence from our everyday vocabulary of a quantitative unit of surprise: Qualities such as the “wow factor” have remained vague and elusive to mathematical analysis. Informal correlates of surprise exist at nearly all stages of neural processing. In sensory neuroscience, it has been suggested that only the unexpected at one stage is transmitted to the next stage [2]. Hence, sensory cortex may have evolved to adapt to, to predict, and to quiet down the expected statistical regularities of the world [3, 4, 5, 6], focusing instead on events that are unpredictable or surprising. Electrophysiological evidence for this early sensory emphasis onto surprising stimuli exists from studies of adaptation in visual [7, 8, 4, 9], olfactory [10, 11], and auditory cortices [12], subcortical structures like the LGN [13], and even retinal ganglion cells [14, 15] and cochlear hair cells [16]: neural response greatly attenuates with repeated or prolonged exposure to an initially novel stimulus. Surprise and novelty are also central to learning and memory formation [1], to the point that surprise is believed to be a necessary trigger for associative learning [17, 18], as supported by mounting evidence for a role of the hippocampus as a novelty detector [19, 20, 21]. Finally, seeking novelty is a well-identiﬁed human character trait, with possible association with the dopamine D4 receptor gene [22, 23, 24]. In the Bayesian framework, we develop the only consistent theory of surprise, in terms of the difference between the posterior and prior distributions of beliefs of an observer over the available class of models or hypotheses about the world. We show that this deﬁnition derived from ﬁrst principles presents key advantages over more ad-hoc formulations, typically relying on detecting outlier stimuli. Armed with this new framework, we provide direct experimental evidence that surprise best characterizes what attracts human gaze in large amounts of natural video stimuli. We here extend a recent pilot study [25], adding more comprehensive theory, large-scale human data collection, and additional analysis. 1 Theory Bayesian Deﬁnition of Surprise. We propose that surprise is a general concept, which can be derived from ﬁrst principles and formalized across spatio-temporal scales, sensory modalities, and, more generally, data types and data sources. Two elements are essential for a principled deﬁnition of surprise. First, surprise can exist only in the presence of uncertainty, which can arise from intrinsic stochasticity, missing information, or limited computing resources. A world that is purely deterministic and predictable in real-time for a given observer contains no surprises. Second, surprise can only be deﬁned in a relative, subjective, manner and is related to the expectations of the observer, be it a single synapse, neuronal circuit, organism, or computer device. The same data may carry different amount of surprise for different observers, or even for the same observer taken at different times. In probability and decision theory it can be shown that the only consistent and optimal way for modeling and reasoning about uncertainty is provided by the Bayesian theory of probability [26, 27, 28]. Furthermore, in the Bayesian framework, probabilities correspond to subjective degrees of beliefs in hypotheses or models which are updated, as data is acquired, using Bayes’ theorem as the fundamental tool for transforming prior belief distributions into posterior belief distributions. Therefore, within the same optimal framework, the only consistent deﬁnition of surprise must involve: (1) probabilistic concepts to cope with uncertainty; and (2) prior and posterior distributions to capture subjective expectations. Consistently with this Bayesian approach, the background information of an observer is captured by his/her/its prior probability distribution {P (M )}M ∈M over the hypotheses or models M in a model space M. Given this prior distribution of beliefs, the fundamental effect of a new data observation D on the observer is to change the prior distribution {P (M )}M ∈M into the posterior distribution {P (M |D)}M ∈M via Bayes theorem, whereby P (D|M ) ∀M ∈ M, P (M |D) = P (M ). (1) P (D) In this framework, the new data observation D carries no surprise if it leaves the observer beliefs unaffected, that is, if the posterior is identical to the prior; conversely, D is surprising if the posterior distribution resulting from observing D signiﬁcantly differs from the prior distribution. Therefore we formally measure surprise elicited by data as some distance measure between the posterior and prior distributions. This is best done using the relative entropy or Kullback-Leibler (KL) divergence [29]. Thus, surprise is deﬁned by the average of the log-odd ratio: P (M |D) S(D, M) = KL(P (M |D), P (M )) = P (M |D) log dM (2) P (M ) M taken with respect to the posterior distribution over the model class M. Note that KL is not symmetric but has well-known theoretical advantages, including invariance with respect to Figure 1: Computing surprise in early sensory neurons. (a) Prior data observations, tuning preferences, and top-down inﬂuences contribute to shaping a set of “prior beliefs” a neuron may have over a class of internal models or hypotheses about the world. For instance, M may be a set of Poisson processes parameterized by the rate λ, with {P (M )}M ∈M = {P (λ)}λ∈I +∗ the prior distribution R of beliefs about which Poisson models well describe the world as sensed by the neuron. New data D updates the prior into the posterior using Bayes’ theorem. Surprise quantiﬁes the difference between the posterior and prior distributions over the model class M. The remaining panels detail how surprise differs from conventional model ﬁtting and outlier-based novelty. (b) In standard iterative Bayesian model ﬁtting, at every iteration N , incoming data DN is used to update the prior {P (M |D1 , D2 , ..., DN −1 )}M ∈M into the posterior {P (M |D1 , D2 , ..., DN )}M ∈M . Freezing this learning at a given iteration, one then picks the currently best model, usually using either a maximum likelihood criterion, or a maximum a posteriori one (yielding MM AP shown). (c) This best model is used for a number of tasks at the current iteration, including outlier-based novelty detection. New data is then considered novel at that instant if it has low likelihood for the best model b a (e.g., DN is more novel than DN ). This focus onto the single best model presents obvious limitations, especially in situations where other models are nearly as good (e.g., M∗ in panel (b) is entirely ignored during standard novelty computation). One palliative solution is to consider mixture models, or simply P (D), but this just amounts to shifting the problem into a different model class. (d) Surprise directly addresses this problem by simultaneously considering all models and by measuring how data changes the observer’s distribution of beliefs from {P (M |D1 , D2 , ..., DN −1 )}M ∈M to {P (M |D1 , D2 , ..., DN )}M ∈M over the entire model class M (orange shaded area). reparameterizations. A unit of surprise — a “wow” — may then be deﬁned for a single model M as the amount of surprise corresponding to a two-fold variation between P (M |D) and P (M ), i.e., as log P (M |D)/P (M ) (with log taken in base 2), with the total number of wows experienced for all models obtained through the integration in eq. 2. Surprise and outlier detection. Outlier detection based on the likelihood P (D|M best ) of D given a single best model Mbest is at best an approximation to surprise and, in some cases, is misleading. Consider, for instance, a case where D has very small probability both for a model or hypothesis M and for a single alternative hypothesis M. Although D is a strong outlier, it carries very little information regarding whether M or M is the better model, and therefore very little surprise. Thus an outlier detection method would strongly focus attentional resources onto D, although D is a false positive, in the sense that it carries no useful information for discriminating between the two alternative hypotheses M and M. Figure 1 further illustrates this disconnect between outlier detection and surprise. 2 Human experiments To test the surprise hypothesis — that surprise attracts human attention and gaze in natural scenes — we recorded eye movements from eight na¨ve observers (three females and ı ﬁve males, ages 23-32, normal or corrected-to-normal vision). Each watched a subset from 50 videoclips totaling over 25 minutes of playtime (46,489 video frames, 640 × 480, 60.27 Hz, mean screen luminance 30 cd/m2 , room 4 cd/m2 , viewing distance 80cm, ﬁeld of view 28◦ × 21◦ ). Clips comprised outdoors daytime and nighttime scenes of crowded environments, video games, and television broadcast including news, sports, and commercials. Right-eye position was tracked with a 240 Hz video-based device (ISCAN RK-464), with methods as previously [30]. Two hundred calibrated eye movement traces (10,192 saccades) were analyzed, corresponding to four distinct observers for each of the 50 clips. Figure 2 shows sample scanpaths for one videoclip. To characterize image regions selected by participants, we process videoclips through computational metrics that output a topographic dynamic master response map, assigning in real-time a response value to every input location. A good master map would highlight, more than expected by chance, locations gazed to by observers. To score each metric we hence sample, at onset of every human saccade, master map activity around the saccade’s future endpoint, and around a uniformly random endpoint (random sampling was repeated 100 times to evaluate variability). We quantify differences between histograms of master Figure 2: (a) Sample eye movement traces from four observers (squares denote saccade endpoints). (b) Our data exhibits high inter-individual overlap, shown here with the locations where one human saccade endpoint was nearby (≈ 5◦ ) one (white squares), two (cyan squares), or all three (black squares) other humans. (c) A metric where the master map was created from the three eye movement traces other than that being tested yields an upper-bound KL score, computed by comparing the histograms of metric values at human (narrow blue bars) and random (wider green bars) saccade targets. Indeed, this metric’s map was very sparse (many random saccades landing on locations with nearzero response), yet humans preferentially saccaded towards the three active hotspots corresponding to the eye positions of three other humans (many human saccades landing on locations with near-unity responses). map samples collected from human and random saccades using again the Kullback-Leibler (KL) distance: metrics which better predict human scanpaths exhibit higher distances from random as, typically, observers non-uniformly gaze towards a minority of regions with highest metric responses while avoiding a majority of regions with low metric responses. This approach presents several advantages over simpler scoring schemes [31, 32], including agnosticity to putative mechanisms for generating saccades and the fact that applying any continuous nonlinearity to master map values would not affect scoring. Experimental results. We test six computational metrics, encompassing and extending the state-of-the-art found in previous studies. The ﬁrst three quantify static image properties (local intensity variance in 16 × 16 image patches [31]; local oriented edge density as measured with Gabor ﬁlters [33]; and local Shannon entropy in 16 × 16 image patches [34]). The remaining three metrics are more sensitive to dynamic events (local motion [33]; outlier-based saliency [33]; and surprise [25]). For all metrics, we ﬁnd that humans are signiﬁcantly attracted by image regions with higher metric responses. However, the static metrics typically respond vigorously at numerous visual locations (Figure 3), hence they are poorly speciﬁc and yield relatively low KL scores between humans and random. The metrics sensitive to motion, outliers, and surprising events, in comparison, yield sparser maps and higher KL scores. The surprise metric of interest here quantiﬁes low-level surprise in image patches over space and time, and at this point does not account for high-level or cognitive beliefs of our human observers. Rather, it assumes a family of simple models for image patches, each processed through 72 early feature detectors sensitive to color, orientation, motion, etc., and computes surprise from shifts in the distribution of beliefs about which models better describe the patches (see [25] and [35] for details). We ﬁnd that the surprise metric signiﬁcantly outperforms all other computational metrics (p < 10−100 or better on t-tests for equality of KL scores), scoring nearly 20% better than the second-best metric (saliency) and 60% better than the best static metric (entropy). Surprising stimuli often substantially differ from simple feature outliers; for example, a continually blinking light on a static background elicits sustained ﬂicker due to its locally outlier temporal dynamics but is only surprising for a moment. Similarly, a shower of randomly-colored pixels continually excites all low-level feature detectors but rapidly becomes unsurprising. Strongest attractors of human attention. Clearly, in our and previous eye-tracking experiments, in some situations potentially interesting targets were more numerous than in others. With many possible targets, different observers may orient towards different locations, making it more difﬁcult for a single metric to accurately predict all observers. Hence we consider (Figure 4) subsets of human saccades where at least two, three, or all four observers simultaneously agreed on a gaze target. Observers could have agreed based on bottom-up factors (e.g., only one location had interesting visual appearance at that time), top-down factors (e.g., only one object was of current cognitive interest), or both (e.g., a single cognitively interesting object was present which also had distinctive appearance). Irrespectively of the cause for agreement, it indicates consolidated belief that a location was attractive. While the KL scores of all metrics improved when progressively focusing onto only those locations, dynamic metrics improved more steeply, indicating that stimuli which more reliably attracted all observers carried more motion, saliency, and surprise. Surprise remained signiﬁcantly the best metric to characterize these agreed-upon attractors of human gaze (p < 10−100 or better on t-tests for equality of KL scores). Overall, surprise explained the greatest fraction of human saccades, indicating that humans are signiﬁcantly attracted towards surprising locations in video displays. Over 72% of all human saccades were targeted to locations predicted to be more surprising than on average. When only considering saccades where two, three, or four observers agreed on a common gaze target, this ﬁgure rose to 76%, 80%, and 84%, respectively. Figure 3: (a) Sample video frames, with corresponding human saccades and predictions from the entropy, surprise, and human-derived metrics. Entropy maps, like intensity variance and orientation maps, exhibited many locations with high responses, hence had low speciﬁcity and were poorly discriminative. In contrast, motion, saliency, and surprise maps were much sparser and more speciﬁc, with surprise signiﬁcantly more often on target. For three example frames (ﬁrst column), saccades from one subject are shown (arrows) with corresponding apertures over which master map activity at the saccade endpoint was sampled (circles). (b) KL scores for these metrics indicate signiﬁcantly different performance levels, and a strict ranking of variance < orientation < entropy < motion < saliency < surprise < human-derived. KL scores were computed by comparing the number of human saccades landing onto each given range of master map values (narrow blue bars) to the number of random saccades hitting the same range (wider green bars). A score of zero would indicate equality between the human and random histograms, i.e., humans did not tend to hit various master map values any differently from expected by chance, or, the master map could not predict human saccades better than random saccades. Among the six computational metrics tested in total, surprise performed best, in that surprising locations were relatively few yet reliably gazed to by humans. Figure 4: KL scores when considering only saccades where at least one (all 10,192 saccades), two (7,948 saccades), three (5,565 saccades), or all four (2,951 saccades) humans agreed on a common gaze location, for the static (a) and dynamic metrics (b). Static metrics improved substantially when progressively focusing onto saccades with stronger inter-observer agreement (average slope 0.56 ± 0.37 percent KL score units per 1,000 pruned saccades). Hence, when humans agreed on a location, they also tended to be more reliably predicted by the metrics. Furthermore, dynamic metrics improved 4.5 times more steeply (slope 2.44 ± 0.37), suggesting a stronger role of dynamic events in attracting human attention. Surprising events were signiﬁcantly the strongest (t-tests for equality of KL scores between surprise and other metrics, p < 10−100 ). 3 Discussion While previous research has shown with either static scenes or dynamic synthetic stimuli that humans preferentially ﬁxate regions of high entropy [34], contrast [31], saliency [32], ﬂicker [36], or motion [37], our data provides direct experimental evidence that humans ﬁxate surprising locations even more reliably. These conclusions were made possible by developing new tools to quantify what attracts human gaze over space and time in dynamic natural scenes. Surprise explained best where humans look when considering all saccades, and even more so when restricting the analysis to only those saccades for which human observers tended to agree. Surprise hence represents an inexpensive, easily computable approximation to human attentional allocation. In the absence of quantitative tools to measure surprise, most experimental and modeling work to date has adopted the approximation that novel events are surprising, and has focused on experimental scenarios which are simple enough to ensure an overlap between informal notions of novelty and surprise: for example, a stimulus is novel during testing if it has not been seen during training [9]. Our deﬁnition opens new avenues for more sophisticated experiments, where surprise elicited by different stimuli can be precisely compared and calibrated, yielding predictions at the single-unit as well as behavioral levels. The deﬁnition of surprise — as the distance between the posterior and prior distributions of beliefs over models — is entirely general and readily applicable to the analysis of auditory, olfactory, gustatory, or somatosensory data. While here we have focused on behavior rather than detailed biophysical implementation, it is worth noting that detecting surprise in neural spike trains does not require semantic understanding of the data carried by the spike trains, and thus could provide guiding signals during self-organization and development of sensory areas. At higher processing levels, top-down cues and task demands are known to combine with stimulus novelty in capturing attention and triggering learning [1, 38], ideas which may now be formalized and quantiﬁed in terms of priors, posteriors, and surprise. Surprise, indeed, inherently depends on uncertainty and on prior beliefs. Hence surprise theory can further be tested and utilized in experiments where the prior is biased, for ex- ample by top-down instructions or prior exposures to stimuli [38]. In addition, simple surprise-based behavioral measures such as the eye-tracking one used here may prove useful for early diagnostic of human conditions including autism and attention-deﬁcit hyperactive disorder, as well as for quantitative comparison between humans and animals which may have lower or different priors, including monkeys, frogs, and ﬂies. Beyond sensory biology, computable surprise could guide the development of data mining and compression systems (giving more bits to surprising regions of interest), to ﬁnd surprising agents in crowds, surprising sentences in books or speeches, surprising sequences in genomes, surprising medical symptoms, surprising odors in airport luggage racks, surprising documents on the world-wide-web, or to design surprising advertisements. Acknowledgments: Supported by HFSP, NSF and NGA (L.I.), NIH and NSF (P.B.). We thank UCI’s Institute for Genomics and Bioinformatics and USC’s Center High Performance Computing and Communications (www.usc.edu/hpcc) for access to their computing clusters. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] Ranganath, C. & Rainer, G. Nat Rev Neurosci 4, 193–202 (2003). Rao, R. P. & Ballard, D. H. Nat Neurosci 2, 79–87 (1999). Olshausen, B. A. & Field, D. J. Nature 381, 607–609 (1996). M¨ ller, J. R., Metha, A. B., Krauskopf, J. & Lennie, P. Science 285, 1405–1408 (1999). u Dragoi, V., Sharma, J., Miller, E. K. & Sur, M. Nat Neurosci 5, 883–891 (2002). David, S. V., Vinje, W. E. & Gallant, J. L. J Neurosci 24, 6991–7006 (2004). Maffei, L., Fiorentini, A. & Bisti, S. Science 182, 1036–1038 (1973). Movshon, J. A. & Lennie, P. Nature 278, 850–852 (1979). Fecteau, J. H. & Munoz, D. P. Nat Rev Neurosci 4, 435–443 (2003). Kurahashi, T. & Menini, A. Nature 385, 725–729 (1997). Bradley, J., Bonigk, W., Yau, K. W. & Frings, S. Nat Neurosci 7, 705–710 (2004). Ulanovsky, N., Las, L. & Nelken, I. Nat Neurosci 6, 391–398 (2003). Solomon, S. G., Peirce, J. W., Dhruv, N. T. & Lennie, P. Neuron 42, 155–162 (2004). Smirnakis, S. M., Berry, M. J. & et al. Nature 386, 69–73 (1997). Brown, S. P. & Masland, R. H. Nat Neurosci 4, 44–51 (2001). Kennedy, H. J., Evans, M. G. & et al. Nat Neurosci 6, 832–836 (2003). Schultz, W. & Dickinson, A. Annu Rev Neurosci 23, 473–500 (2000). Fletcher, P. C., Anderson, J. M., Shanks, D. R. et al. Nat Neurosci 4, 1043–1048 (2001). Knight, R. Nature 383, 256–259 (1996). Stern, C. E., Corkin, S., Gonzalez, R. G. et al. Proc Natl Acad Sci U S A 93, 8660–8665 (1996). Li, S., Cullen, W. K., Anwyl, R. & Rowan, M. J. Nat Neurosci 6, 526–531 (2003). Ebstein, R. P., Novick, O., Umansky, R. et al. Nat Genet 12, 78–80 (1996). Benjamin, J., Li, L. & et al. Nat Genet 12, 81–84 (1996). Lusher, J. M., Chandler, C. & Ball, D. Mol Psychiatry 6, 497–499 (2001). Itti, L. & Baldi, P. In Proc. IEEE CVPR. San Siego, CA (2005 in press). Cox, R. T. Am. J. Phys. 14, 1–13 (1964). Savage, L. J. The foundations of statistics (Dover, New York, 1972). (First Edition in 1954). Jaynes, E. T. Probability Theory. The Logic of Science (Cambridge University Press, 2003). Kullback, S. Information Theory and Statistics (Wiley, New York:New York, 1959). Itti, L. Visual Cognition (2005 in press). Reinagel, P. & Zador, A. M. Network 10, 341–350 (1999). Parkhurst, D., Law, K. & Niebur, E. Vision Res 42, 107–123 (2002). Itti, L. & Koch, C. Nat Rev Neurosci 2, 194–203 (2001). Privitera, C. M. & Stark, L. W. IEEE Trans Patt Anal Mach Intell 22, 970–982 (2000). All source code for all metrics is freely available at http://iLab.usc.edu/toolkit/. Theeuwes, J. Percept Psychophys 57, 637–644 (1995). Abrams, R. A. & Christ, S. E. Psychol Sci 14, 427–432 (2003). Wolfe, J. M. & Horowitz, T. S. Nat Rev Neurosci 5, 495–501 (2004).

6 0.31956562 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

7 0.31791887 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

8 0.31288138 184 nips-2005-Structured Prediction via the Extragradient Method

9 0.31218714 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks

10 0.31181493 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

11 0.31174088 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

12 0.31172797 144 nips-2005-Off-policy Learning with Options and Recognizers

13 0.31047669 45 nips-2005-Conditional Visual Tracking in Kernel Space

14 0.30908945 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

15 0.30757362 102 nips-2005-Kernelized Infomax Clustering

16 0.30680668 30 nips-2005-Assessing Approximations for Gaussian Process Classification

17 0.30608875 98 nips-2005-Infinite latent feature models and the Indian buffet process

18 0.30554476 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

19 0.30550876 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions

20 0.30534607 78 nips-2005-From Weighted Classification to Policy Search