nips nips2013 nips2013-237 knowledge-graph by maker-knowledge-mining

237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

Source: pdf

Author: Matjaz Jogan, Alan Stocker

Abstract: How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model ﬁts the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Optimal integration of visual speed across different spatiotemporal frequency channels Matjaˇ Jogan and Alan A. [sent-1, score-1.038]

2 edu Abstract How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? [sent-4, score-1.362]

3 Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. [sent-5, score-1.343]

4 We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). [sent-6, score-0.797]

5 We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. [sent-7, score-1.604]

6 We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. [sent-8, score-0.966]

7 The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. [sent-9, score-0.745]

8 Fits are improved if we assume that the channel responses are subject to divisive normalization. [sent-11, score-0.534]

9 Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. [sent-12, score-1.317]

10 1 Introduction Low contrast stimuli are perceived to move slower than high contrast ones [17]. [sent-13, score-0.65]

11 This effect can be explained with a Bayesian observer model that assumes a prior distribution with a peak at slow speeds [18, 8, 15]. [sent-14, score-0.423]

12 Based on a noisy sensory measurement m of the true stimulus speed s the Bayesian observer model computes the posterior probability p(s|m) = p(m|s)p(s) p(m) (1) by multiplying the likelihood function p(m|s) with the probability p(s) representing the observer’s prior expectation. [sent-16, score-0.944]

13 if stimulus contrast is low), the likelihood function is broad and the posterior probability distribution is shifted toward the peak of the prior, resulting in a perceived speed that is biased toward slow speeds. [sent-19, score-1.124]

14 5 ωt(Hz) 3 p( b s( a Figure 1: a) A natural stimulus in motion exhibits a rich spatiotemporal frequency spectrum that determines how humans perceive its speed s. [sent-24, score-1.128]

15 5 c/deg and moves with a speed of 2 deg/s will trigger responses r = {r1 , r2 } in two corresponding channels (red circles). [sent-30, score-0.706]

16 In this paper we make a step toward a more general observer model of visual speed perception that, in the longterm, will allow us to predict perceived speed for arbitrary complex stimuli (Fig. [sent-32, score-1.524]

17 1), which decomposes complex motion stimuli into simpler components processed in separate spatiotemporal frequency channels. [sent-35, score-0.745]

18 Based on the motion energy model [1, 12], we assume that each channel is sensitive to a narrow spatiotemporal frequency band. [sent-36, score-0.899]

19 The observed speed of a stimulus is then a result of combining the sensory evidence provided by these individual channels with a prior expectation for slow speeds. [sent-37, score-1.026]

20 Here we employ an analogous approach by treating the responses of individual spatiotemporal frequency channels as independent cues about a stimulus’ motion. [sent-41, score-0.865]

21 We validated the model against the data of a series of psychophysical experiments in which we measured how humans’ speed percept of coherent motion depends on the stimulus energy in different spatial frequency bands. [sent-42, score-1.187]

22 Stimuli consisted of drifting sinusoidal gratings at two different spatial frequencies and contrasts, and various combinations of these single gratings. [sent-43, score-0.474]

23 For a given stimulus speed s, single gratings target only one channel while the combined stimuli target multiple channels. [sent-44, score-1.425]

24 We consider s to be the speed of locally coherent and translational stimulus motion (Fig. [sent-47, score-0.77]

25 This motion can be represented by its power spectrum in spatiotemporal frequency space. [sent-49, score-0.494]

26 For a given motion direction the energy lies in a two-dimensional plane spanned by a temporal frequency axis ωt and a spatial frequency axis ωs and is constrained to coordinates that satisfy s = ωt /ωs (Fig. [sent-50, score-0.554]

27 According to the motion energy model, we assume that the visual system contains motion units that are tuned to speciﬁc locations in this plane [1, 12]. [sent-52, score-0.378]

28 A coherent motion stimulus with speed s and multiple spatial frequencies ωs will therefore drive only those units whose tuning curves are centered at coordinates (ωs , ωs s). [sent-53, score-0.913]

29 We formulate our Bayesian observer model in terms of k spatiotemporal frequency channels, each tuned to a narrow spatiotemporal frequency band (Fig. [sent-54, score-0.941]

30 A moving stimulus will elicit a total response r = [r1 , r2 , . [sent-56, score-0.385]

31 The response of each channel provides a likelihood 2 channels likelihoods low speed prior stimulus estimate normalization posterior Figure 2: Bayesian observer model of speed perception with multiple spatiotemporal channels. [sent-60, score-2.295]

32 A moving stimulus with speed s is decomposed and processed in separate channels that are sensitive to energy in speciﬁc spatiotemporal frequency bands. [sent-61, score-1.285]

33 Based on the channel response ri we formulate a likelihood function p(ri |s) for each channel. [sent-62, score-0.587]

34 Here we assume perceived speed s to ˆ be the mode of the posterior. [sent-64, score-0.636]

35 We consider a model with and without response normalization across channels (red dashed line). [sent-65, score-0.491]

36 Assuming independent channel noise, we can formulate the posterior probability of an Bayesian observer model that performs optimal integration as p(s|r) ∝ p(s) p(ri |s) . [sent-67, score-0.643]

37 For reasons of simplicity and without loss of generality, we focus on the case where the stimulus activates two channels with responses r = [ri ], i ∈ {1, 2}. [sent-72, score-0.701]

38 Assuming that E µ(ri )|s approximates the stimulus speed s, s the expected value of s is ˆ 3 E s|s ˆ = = 2 2 2 2 σ2 σ1 σ1 σ2 2 E µ(r1 )|s + σ 2 + σ 2 E µ(r2 )|s + a σ 2 + σ 2 + σ2 1 2 1 2 2 2 2 2 2 2 σ2 σ1 σ1 σ2 σ1 σ2 2 2 s + σ2 + σ2 s + a σ2 + σ2 = s + a σ2 + σ2 . [sent-79, score-0.579]

39 , percepts) for stimuli that activate both channels is always smaller than the variances of estimates that are 2 2 based on each of the channel responses alone (σ1 and σ2 ). [sent-86, score-0.992]

40 Second, because of the slow speed prior a is negative, and perceived speeds are more biased toward slower speeds the larger the sensory uncertainty. [sent-88, score-1.127]

41 As a result, the perceived speed of combined stimuli that activate both channels is always faster than the percepts based on each of the individual channel responses alone. [sent-89, score-1.78]

42 Finally, the model predicts that the perceived speed of a combined stimulus solely depends on the responses of the channels to its constituent components, and is therefore independent of the relative phase of the components we combined [5]. [sent-90, score-1.583]

43 , their responses are independent of the number of active channels and the overall activity in the system. [sent-94, score-0.414]

44 Here we assume that the response of an individual channel ri is normalized such that its normalized ∗ response ri is given by rn ∗ ri = ri i n . [sent-99, score-1.02]

45 , the relative difference) between the individual channel responses for increasing values of the exponent n. [sent-102, score-0.588]

46 Note that normalization affects only the responses ri , thus modulating the ∗ width of the individual likelihood functions. [sent-104, score-0.426]

47 By explicitly modeling the encoding of visual motion in spatiotemporal frequency channels, we already extended the Bayesian model of speed perception toward a more physiological interpretation. [sent-107, score-1.021]

48 3 Results In the second part of this paper we test the validity of our model with and without channel normalization against data from a psychophysical two alternative forced choice (2AFC) speed discrimination experiment. [sent-109, score-1.043]

49 1 Speed discrimination experiment Seven subjects performed a 2AFC visual speed discrimination task. [sent-111, score-0.807]

50 In each trial, subjects were presented for 1250ms with a reference and a test stimulus on either side of a ﬁxation mark (eccentricity 4 peaks-add 3ωs = 1. [sent-112, score-0.402]

51 5 Figure 3: Single frequency gratings were combined in either a ”peaks-add” or a ”peaks-subtract” phase conﬁguration (0 deg and 60 deg phase, respectively) [5]. [sent-114, score-0.662]

52 We used these two phase-combinations to test whether the channel hypothesis is valid or not. [sent-116, score-0.386]

53 After stimulus presentation, a brief ﬂash appeared on the left or right side of the ﬁxation mark and subjects had to answer whether the grating that was presented on the indicated side was moving faster or slower than the grating on the other side. [sent-120, score-0.539]

54 Four of these stimuli were simple sinewave gratings of a single spatial frequency, either ωs = 0. [sent-123, score-0.554]

55 The low frequency test stimulus had a contrast of 22. [sent-126, score-0.508]

56 5%, while the three higher frequency stimuli had contrasts 7. [sent-127, score-0.398]

57 The other six stimuli were pair-wise combinations of the single frequency gratings (Fig. [sent-131, score-0.616]

58 All test stimuli were drifting at a speed of 2 deg/s. [sent-135, score-0.606]

59 The reference stimulus was a broadband stimulus stimulus whose speed was regulated by an adaptive staircase procedure. [sent-136, score-1.186]

60 The simple stimuli were designed to target individual spatiotemporal frequency channels while the combined stimuli were meant to target two channels simultaneously. [sent-139, score-1.444]

61 The two phase conﬁgurations (peaks-add and peaks-subtract) were used to test the multiple channel hypothesis: if combined stimuli are decomposed and processed in separate channels, their perceived speeds should be independent of the phase conﬁguration. [sent-140, score-1.291]

62 In particular, the difference in overall contrast of the two conﬁgurations should not affect perceived speed (Fig 3). [sent-141, score-0.682]

63 Matching speeds (PSEs) and relative discrimination thresholds (Weber-fraction) were extracted from a maximum-likelihood ﬁt of each of the 10 psychometric functions with a cumulative Gaussian. [sent-142, score-0.649]

64 We found no signiﬁcant difference in perceived speeds and thresholds between the combined grating stimuli in ”peaks-add” and ”peaks-subtract” conﬁguration (Fig. [sent-146, score-0.972]

65 This suggests that the perceived speed of combined stimuli is independent of the relative phase between the individual stimulus components, and therefore is processed in independent channels. [sent-154, score-1.373]

66 05 b matching speed (deg/s) 3 data channel model channel model+norm. [sent-159, score-1.034]

67 5 c/deg combined peaks-subtract Figure 4: Data and model ﬁts for speed discrimination task: a) relative discrimination thresholds (Weber-fraction) and b) matching speeds (PSEs). [sent-167, score-1.093]

68 For the single frequency gratings, the perceived speed increases with contrast as predicted by the standard Bayesian model. [sent-169, score-0.835]

69 For the combined stimuli, there is no signiﬁcant difference (based on 95% conﬁdence intervals) in perceived speeds between the combined grating stimuli in ”peaks-add” and ”peaks-subtract” conﬁguration. [sent-170, score-0.962]

70 The Bayesian model with normalized responses (red line) better accounts for the data than the model without interaction between the channels (blue line). [sent-171, score-0.513]

71 The model without normalization has six parameters: four channel responses ri for each simple stimulus reﬂecting the individual likelihood widths, the reference response rref and the local slope of the prior a. [sent-174, score-1.317]

72 1 The model with normalization has two additional parameters n1 and n2 , reﬂecting the exponents of the normalization in each of the two channels (Eq. [sent-175, score-0.495]

73 The model with and without response normalization was simultaneously ﬁt to the psychometric functions of all 10 test conditions using the cumulative probability distribution (Eq. [sent-177, score-0.4]

74 8) and a 1 Alternatively, channel responses as function of contrast could be modeled according to a contrast response 2 function ri = M + Rmax c2 c+c2 , where M is the baseline response, Rmax the maximal response, and c50 is 50 the semi saturation contrast level. [sent-178, score-0.816]

75 6 gaussian fit channel model channel model+norm. [sent-179, score-0.701]

76 From these ﬁts we extracted the matching speeds (PSEs) and relative discrimination thresholds (Weber-fractions) shown in Fig. [sent-190, score-0.535]

77 In particular, the data reﬂect the inverse relationship between relative matching speeds and discrimination thresholds predicted by the slow-speed prior of the model. [sent-193, score-0.541]

78 The model with response normalization, however, better captures subjects’ precepts in particular in conditions where very low contrast stimuli were combined. [sent-194, score-0.381]

79 5) as well as the extracted discrimination thresholds and matching speeds (Fig. [sent-196, score-0.508]

80 Further support of the normalized model comes form the ﬁtted parameter values: for the model with no normalization, the response level of the highest contrast stimulus r4 was not well constrained2 (r1 =6. [sent-200, score-0.505]

81 The results suggest that the perceived speed of a combined stimulus can be accurately described as an optimal combination of sensory information provided by individual spatiotemporal frequency channels that interact via response normalization. [sent-215, score-1.866]

82 4 Discussion We have shown that human visual speed perception can be accurately described by a Bayesian observer model that optimally combines sensory information from independent channels, each sensitive to motion energies in a speciﬁc spatiotemporal frequency band. [sent-216, score-1.204]

83 Our model expands the previously proposed Bayesian model of speed perception [16]. [sent-217, score-0.416]

84 It no longer assumes a single likelihood function affected by stimulus contrast but rather considers the combination of likelihood functions based on the motion energies in different spatiotemporal frequency channels. [sent-218, score-0.895]

85 7 We tested our model against data from a 2AFC speed discrimination experiment. [sent-221, score-0.511]

86 Stimuli consisted of drifting sinewave gratings at different spatial frequencies and combinations thereof. [sent-222, score-0.495]

87 Subjects’ perceived speeds of the combined stimuli were independent of the phase conﬁguration of the constituent sinewave gratings even though different phases resulted in different overall contrast values. [sent-223, score-1.16]

88 This supports the hypothesis that perceived speed is processed across multiple spatiotemporal frequency channels (Graham and Nachmias used a similar approach to demonstrate the existence of individual spatial frequency channels [5]). [sent-224, score-1.863]

89 The proposed observer model provided a good ﬁt to the data, but the ﬁt was improved when the channel responses were assumed to be subject to normalization by the overall channel response. [sent-225, score-1.13]

90 Considering that divisive normalization is arguably an ubiquitous process in neural representations, we see this result as a consequence of our attempt to formulate Bayesian observer models at a level that is closer to a physiological description. [sent-226, score-0.387]

91 Future experiments that will test more stimulus combinations will help to further improve the characterization of the channel responses and interactions. [sent-228, score-0.823]

92 This decrease is nicely reﬂected in the measured decrease in discrimination thresholds for the combined stimuli when the thresholds for both individual gratings were approximately the same (Fig. [sent-232, score-0.939]

93 Note, that because of the slow speed prior, a Bayesian model predicts that the perceived speed are inversely proportional to the discrimination threshold, a prediction that is well supported by our data. [sent-234, score-1.172]

94 Here we provide a behavioral account for both discrimination thresholds and matching speeds by directly estimating the parameters of the likelihoods and the speed prior from psychophysical data. [sent-240, score-0.917]

95 In the long term, the goal is to be able to predict the perceived motion for an arbitrarily complex natural stimulus, and we believe the proposed model is a step in this direction. [sent-242, score-0.503]

96 Contrast and stimulus complexity moderate the relationship between spatial frequency and perceived speed: Implications for MT models of speed perception. [sent-256, score-1.145]

97 A logarithmic, scale-invariant representation of speed in macaque middle temporal area accounts for speed discrimination performance. [sent-305, score-0.805]

98 Estimating target speed from the population response in visual area MT. [sent-313, score-0.453]

99 Perceived speed and direction of complex gratings and plaids. [sent-339, score-0.512]

100 Noise characteristics and prior expectations in human visual speed perception. [sent-351, score-0.387]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('perceived', 0.344), ('channel', 0.339), ('speed', 0.292), ('stimulus', 0.287), ('channels', 0.268), ('gratings', 0.22), ('stimuli', 0.214), ('spatiotemporal', 0.205), ('discrimination', 0.196), ('observer', 0.181), ('speeds', 0.162), ('frequency', 0.153), ('responses', 0.146), ('motion', 0.136), ('psychometric', 0.134), ('normalization', 0.102), ('response', 0.098), ('grating', 0.096), ('ri', 0.095), ('thresholds', 0.083), ('deg', 0.083), ('perception', 0.078), ('drifting', 0.078), ('sensory', 0.073), ('combined', 0.073), ('spatial', 0.069), ('psychophysical', 0.069), ('visual', 0.063), ('bayesian', 0.06), ('percept', 0.06), ('subjects', 0.06), ('integration', 0.057), ('coherent', 0.055), ('pses', 0.051), ('rref', 0.051), ('sinewave', 0.051), ('phase', 0.05), ('divisive', 0.049), ('individual', 0.049), ('frequencies', 0.048), ('simoncelli', 0.047), ('contrast', 0.046), ('stocker', 0.045), ('cues', 0.044), ('energy', 0.043), ('likelihoods', 0.042), ('matching', 0.041), ('sr', 0.038), ('processed', 0.037), ('toward', 0.037), ('likelihood', 0.034), ('physiological', 0.034), ('cue', 0.034), ('reference', 0.033), ('prior', 0.032), ('contrasts', 0.031), ('surround', 0.03), ('percepts', 0.03), ('sinusoidal', 0.03), ('humans', 0.029), ('var', 0.029), ('combinations', 0.029), ('slope', 0.028), ('normalized', 0.028), ('relative', 0.027), ('exponent', 0.027), ('ts', 0.027), ('gurations', 0.026), ('optical', 0.026), ('curves', 0.026), ('perceive', 0.026), ('aic', 0.026), ('extracted', 0.026), ('guration', 0.026), ('hypothesis', 0.025), ('accounts', 0.025), ('slow', 0.025), ('xation', 0.025), ('activate', 0.025), ('tted', 0.024), ('interact', 0.024), ('circles', 0.024), ('graham', 0.024), ('suppression', 0.024), ('integrated', 0.023), ('con', 0.023), ('neurosci', 0.023), ('rmax', 0.023), ('optics', 0.023), ('model', 0.023), ('vision', 0.022), ('test', 0.022), ('modalities', 0.022), ('posterior', 0.022), ('neuroscience', 0.022), ('red', 0.022), ('cumulative', 0.021), ('formulate', 0.021), ('nicely', 0.021), ('america', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

Author: Matjaz Jogan, Alan Stocker

2 0.1667778 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

Author: Zhuo Wang, Alan Stocker, Daniel Lee

Abstract: In many neural systems, information about stimulus variables is often represented in a distributed manner by means of a population code. It is generally assumed that the responses of the neural population are tuned to the stimulus statistics, and most prior work has investigated the optimal tuning characteristics of one or a small number of stimulus variables. In this work, we investigate the optimal tuning for diffeomorphic representations of high-dimensional stimuli. We analytically derive the solution that minimizes the L2 reconstruction loss. We compared our solution with other well-known criteria such as maximal mutual information. Our solution suggests that the optimal weights do not necessarily decorrelate the inputs, and the optimal nonlinearity differs from the conventional equalization solution. Results illustrating these optimal representations are shown for some input distributions that may be relevant for understanding the coding of perceptual pathways. 1

3 0.15740241 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow

Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive ﬁelds followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the ﬁrst two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1

4 0.11263958 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

Author: Daniel L. Yamins, Ha Hong, Charles Cadieu, James J. DiCarlo

Abstract: Humans recognize visually-presented objects rapidly and accurately. To understand this ability, we seek to construct models of the ventral stream, the series of cortical areas thought to subserve object recognition. One tool to assess the quality of a model of the ventral stream is the Representational Dissimilarity Matrix (RDM), which uses a set of visual stimuli and measures the distances produced in either the brain (i.e. fMRI voxel responses, neural ﬁring rates) or in models (features). Previous work has shown that all known models of the ventral stream fail to capture the RDM pattern observed in either IT cortex, the highest ventral area, or in the human ventral stream. In this work, we construct models of the ventral stream using a novel optimization procedure for category-level object recognition problems, and produce RDMs resembling both macaque IT and human ventral stream. The model, while novel in the optimization procedure, further develops a long-standing functional hypothesis that the ventral visual stream is a hierarchically arranged series of processing stages optimized for visual object recognition. 1

5 0.10969258 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

Author: Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski

Abstract: With the advent of modern stimulation techniques in neuroscience, the opportunity arises to map neuron to neuron connectivity. In this work, we develop a method for efﬁciently inferring posterior distributions over synaptic strengths in neural microcircuits. The input to our algorithm is data from experiments in which action potentials from putative presynaptic neurons can be evoked while a subthreshold recording is made from a single postsynaptic neuron. We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for signiﬁcant prior information about the connectivity and neuronal cell types to be incorporated if available. Due to the technical challenges and sparsity of these systems, it is important to focus experimental time stimulating the neurons whose synaptic strength is most ambiguous, therefore we also develop an online optimal design algorithm for choosing which neurons to stimulate at each trial. 1

6 0.10425606 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data

7 0.10017709 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

8 0.097602457 88 nips-2013-Designed Measurements for Vector Count Data

9 0.09567669 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively

10 0.091943078 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?

11 0.083982691 183 nips-2013-Mapping paradigm ontologies to and from the brain

12 0.081288703 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

13 0.080431305 205 nips-2013-Multisensory Encoding, Decoding, and Identification

14 0.077792466 69 nips-2013-Context-sensitive active sensing in humans

15 0.066530898 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

16 0.065679021 173 nips-2013-Least Informative Dimensions

17 0.056560438 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

18 0.052338168 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

19 0.051767021 54 nips-2013-Bayesian optimization explains human active search

20 0.048622191 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.117), (1, 0.051), (2, -0.102), (3, -0.054), (4, -0.147), (5, -0.028), (6, -0.003), (7, -0.043), (8, -0.009), (9, 0.097), (10, -0.051), (11, 0.023), (12, -0.066), (13, -0.04), (14, -0.046), (15, 0.016), (16, -0.042), (17, -0.083), (18, -0.093), (19, -0.029), (20, -0.021), (21, 0.032), (22, -0.083), (23, -0.102), (24, -0.067), (25, -0.068), (26, -0.058), (27, 0.12), (28, -0.012), (29, -0.003), (30, -0.009), (31, -0.055), (32, -0.138), (33, -0.109), (34, 0.006), (35, -0.105), (36, -0.019), (37, -0.049), (38, -0.108), (39, -0.009), (40, -0.013), (41, -0.0), (42, -0.12), (43, 0.012), (44, 0.019), (45, 0.001), (46, 0.023), (47, 0.053), (48, -0.083), (49, -0.087)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96817964 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

Author: Matjaz Jogan, Alan Stocker

2 0.74775892 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow

3 0.7393254 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

Author: Zhuo Wang, Alan Stocker, Daniel Lee

4 0.63416988 205 nips-2013-Multisensory Encoding, Decoding, and Identification

Author: Aurel A. Lazar, Yevgeniy Slutskiy

Abstract: We investigate a spiking neuron model of multisensory integration. Multiple stimuli from different sensory modalities are encoded by a single neural circuit comprised of a multisensory bank of receptive ﬁelds in cascade with a population of biophysical spike generators. We demonstrate that stimuli of different dimensions can be faithfully multiplexed and encoded in the spike domain and derive tractable algorithms for decoding each stimulus from the common pool of spikes. We also show that the identiﬁcation of multisensory processing in a single neuron is dual to the recovery of stimuli encoded with a population of multisensory neurons, and prove that only a projection of the circuit onto input stimuli can be identiﬁed. We provide an example of multisensory integration using natural audio and video and discuss the performance of the proposed decoding and identiﬁcation algorithms. 1

5 0.59102631 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?

Author: Cheston Tan, Jedediah M. Singer, Thomas Serre, David Sheinberg, Tomaso Poggio

Abstract: The macaque Superior Temporal Sulcus (STS) is a brain area that receives and integrates inputs from both the ventral and dorsal visual processing streams (thought to specialize in form and motion processing respectively). For the processing of articulated actions, prior work has shown that even a small population of STS neurons contains sufﬁcient information for the decoding of actor invariant to action, action invariant to actor, as well as the speciﬁc conjunction of actor and action. This paper addresses two questions. First, what are the invariance properties of individual neural representations (rather than the population representation) in STS? Second, what are the neural encoding mechanisms that can produce such individual neural representations from streams of pixel images? We ﬁnd that a simple model, one that simply computes a linear weighted sum of ventral and dorsal responses to short action “snippets”, produces surprisingly good ﬁts to the neural data. Interestingly, even using inputs from a single stream, both actor-invariance and action-invariance can be accounted for, by having different linear weights. 1

6 0.57700986 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

7 0.57086754 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively

8 0.51630312 183 nips-2013-Mapping paradigm ontologies to and from the brain

9 0.48459333 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

10 0.46678862 69 nips-2013-Context-sensitive active sensing in humans

11 0.42783779 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

12 0.42412698 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data

13 0.41934851 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

14 0.4134976 121 nips-2013-Firing rate predictions in optimal balanced networks

15 0.37886074 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

16 0.37878802 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

17 0.37105101 284 nips-2013-Robust Spatial Filtering with Beta Divergence

18 0.35357371 88 nips-2013-Designed Measurements for Vector Count Data

19 0.34722134 173 nips-2013-Least Informative Dimensions

20 0.32959419 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.035), (33, 0.171), (34, 0.095), (41, 0.015), (48, 0.224), (49, 0.066), (56, 0.09), (70, 0.048), (85, 0.016), (89, 0.108), (93, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85132748 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

Author: Matjaz Jogan, Alan Stocker

2 0.7277211 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classiﬁcation benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture signiﬁcantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classiﬁcation pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

3 0.72562456 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow

4 0.71797949 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

Author: Ryan D. Turner, Steven Bottone, Clay J. Stanek

Abstract: The Bayesian online change point detection (BOCPD) algorithm provides an efﬁcient way to do exact inference when the parameters of an underlying model may suddenly change over time. BOCPD requires computation of the underlying model’s posterior predictives, which can only be computed online in O(1) time and memory for exponential family models. We develop variational approximations to the posterior on change point times (formulated as run lengths) for efﬁcient inference when the underlying model is not in the exponential family, and does not have tractable posterior predictive distributions. In doing so, we develop improvements to online variational inference. We apply our methodology to a tracking problem using radar data with a signal-to-noise feature that is Rice distributed. We also develop a variational method for inferring the parameters of the (non-exponential family) Rice distribution. Change point detection has been applied to many applications [5; 7]. In recent years there have been great improvements to the Bayesian approaches via the Bayesian online change point detection algorithm (BOCPD) [1; 23; 27]. Likewise, the radar tracking community has been improving in its use of feature-aided tracking [10]: methods that use auxiliary information from radar returns such as signal-to-noise ratio (SNR), which depend on radar cross sections (RCS) [21]. Older systems would often ﬁlter only noisy position (and perhaps Doppler) measurements while newer systems use more information to improve performance. We use BOCPD for modeling the RCS feature. Whereas BOCPD inference could be done exactly when ﬁnding change points in conjugate exponential family models the physics of RCS measurements often causes them to be distributed in non-exponential family ways, often following a Rice distribution. To do inference efﬁciently we call upon variational Bayes (VB) to ﬁnd approximate posterior (predictive) distributions. Furthermore, the nature of both BOCPD and tracking require the use of online updating. We improve upon the existing and limited approaches to online VB [24; 13]. This paper produces contributions to, and builds upon background from, three independent areas: change point detection, variational Bayes, and radar tracking. Although the emphasis in machine learning is on ﬁltering, a substantial part of tracking with radar data involves data association, illustrated in Figure 1. Observations of radar returns contain measurements from multiple objects (targets) in the sky. If we knew which radar return corresponded to which target we would be presented with NT ∈ N0 independent ﬁltering problems; Kalman ﬁlters [14] (or their nonlinear extensions) are applied to “average out” the kinematic errors in the measurements (typically positions) using the measurements associated with each target. The data association problem is to determine which measurement goes to which track. In the classical setup, once a particular measurement is associated with a certain target, that measurement is plugged into the ﬁlter for that target as if we knew with certainty it was the correct assignment. The association algorithms, in effect, ﬁnd the maximum a posteriori (MAP) estimate on the measurement-to-track association. However, approaches such as the joint probabilistic data association (JPDA) ﬁlter [2] and the probability hypothesis density (PHD) ﬁlter [16] have deviated from this. 1 To ﬁnd the MAP estimate a log likelihood of the data under each possible assignment vector a must be computed. These are then used to construct cost matrices that reduce the assignment problem to a particular kind of optimization problem (the details of which are beyond the scope of this paper). The motivation behind feature-aided tracking is that additional features increase the probability that the MAP measurement-to-track assignment is correct. Based on physical arguments the RCS feature (SNR) is often Rice distributed [21, Ch. 3]; although, in certain situations RCS is exponential or gamma distributed [26]. The parameters of the RCS distribution are determined by factors such as the shape of the aircraft facing the radar sensor. Given that different aircraft have different RCS characteristics, if one attempts to create a continuous track estimating the path of an aircraft, RCS features may help distinguish one aircraft from another if they cross paths or come near one another, for example. RCS also helps distinguish genuine aircraft returns from clutter: a ﬂock of birds or random electrical noise, for example. However, the parameters of the RCS distributions may also change for the same aircraft due to a change in angle or ground conditions. These must be taken into account for accurate association. Providing good predictions in light of a possible sudden change in the parameters of a time series is “right up the alley” of BOCPD and change point methods. The original BOCPD papers [1; 11] studied sudden changes in the parameters of exponential family models for time series. In this paper, we expand the set of applications of BOCPD to radar SNR data which often has the same change point structure found in other applications, and requires online predictions. The BOCPD model is highly modular in that it looks for changes in the parameters of any underlying process model (UPM). The UPM merely needs to provide posterior predictive probabilities, the UPM can otherwise be a “black box.” The BOCPD queries the UPM for a prediction of the next data point under each possible run length, the number of points since the last change point. If (and only if by Hipp [12]) the UPM is exponential family (with a conjugate prior) the posterior is computed by accumulating the sufﬁcient statistics since the last potential change point. This allows for O(1) UPM updates in both computation and memory as the run length increases. We motivate the use of VB for implementing UPMs when the data within a regime is believed to follow a distribution that is not exponential family. The methods presented in this paper can be used to ﬁnd variational run length posteriors for general non-exponential family UPMs in addition to the Rice distribution. Additionally, the methods for improving online updating in VB (Section 2.2) are applicable in areas outside of change point detection. Likelihood clutter (birds) track 1 (747) track 2 (EMB 110) 0 5 10 15 20 SNR Figure 1: Illustrative example of a tracking scenario: The black lines (−) show the true tracks while the red stars (∗) show the state estimates over time for track 2 and the blue stars for track 1. The 95% credible regions on the states are shown as blue ellipses. The current (+) and previous (×) measurements are connected to their associated tracks via red lines. The clutter measurements (birds in this case) are shown with black dots (·). The distributions on the SNR (RCS) for each track (blue and red) and the clutter (black) are shown on the right. To our knowledge this paper is the ﬁrst to demonstrate how to compute Bayesian posterior distributions on the parameters of a Rice distribution; the closest work would be Lauwers et al. [15], which computes a MAP estimate. Other novel factors of this paper include: demonstrating the usefulness (and advantages over existing techniques) of change point detection for RCS estimation and tracking; and applying variational inference for UPMs where analytic posterior predictives are not possible. This paper provides four main technical contributions: 1) VB inference for inferring the parameters of a Rice distribution. 2) General improvements to online VB (which is then applied to updating the UPM in BOCPD). 3) Derive a VB approximation to the run length posterior when the UPM posterior predictive is intractable. 4) Handle censored measurements (particularly for a Rice distribution) in VB. This is key for processing missed detections in data association. 2 1 Background In this section we brieﬂy review the three areas of background: BOCPD, VB, and tracking. 1.1 Bayesian Online Change Point Detection We brieﬂy summarize the model setup and notation for the BOCPD algorithm; see [27, Ch. 5] for a detailed description. We assume we have a time series with n observations so far y1 , . . . , yn ∈ Y. In effect, BOCPD performs message passing to do online inference on the run length rn ∈ 0:n − 1, the number of observations since the last change point. Given an underlying predictive model (UPM) and a hazard function h, we can compute an exact posterior over the run length rn . Conditional on a run length, the UPM produces a sequential prediction on the next data point using all the data since the last change point: p(yn |y(r) , Θm ) where (r) := (n − r):(n − 1). The UPM is a simpler model where the parameters θ change at every change point and are modeled as being sampled from a prior with hyper-parameters Θm . The canonical example of a UPM would be a Gaussian whose mean and variance change at every change point. The online updates are summarized as: P (rn |rn−1 ) p(yn |rn−1 , y(r) ) p(rn−1 , y1:n−1 ) . msgn := p(rn , y1:n ) = rn−1 hazard UPM (1) msgn−1 Unless rn = 0, the sum in (1) only contains one term since the only possibility is that rn−1 = rn −1. The indexing convention is such that if rn = 0 then yn+1 is the ﬁrst observation sampled from the new parameters θ. The marginal posterior predictive on the next data point is easily calculated as: p(yn+1 |y1:n ) = p(yn+1 |y(r) )P (rn |y1:n ) . (2) rn Thus, the predictions from BOCPD fully integrate out any uncertainty in θ. The message updates (1) perform exact inference under a model where the number of change points is not known a priori. BOCPD RCS Model We show the Rice UPM as an example as it is required for our application. The data within a regime are assumed to be iid Rice observations, with a normal-gamma prior: yn ∼ Rice(ν, σ) , ν ∼ N (µ0 , σ 2 /λ0 ) , σ −2 =: τ ∼ Gamma(α0 , β0 ) (3) 2 =⇒ p(yn |ν, σ) = yn τ exp(−τ (yn + ν 2 )/2)I0 (yn ντ )I{yn ≥ 0} (4) where I0 (·) is a modiﬁed Bessel function of order zero, which is what excludes the Rice distribution from the exponential family. Although the normal-gamma is not conjugate to a Rice it will enable us to use the VB-EM algorithm. The UPM parameters are the Rice shape1 ν ∈ R and scale σ ∈ R+ , θ := {ν, σ}, and the hyper-parameters are the normal-gamma parameters Θm := {µ0 , λ0 , α0 , β0 }. Every change point results in a new value for ν and σ being sampled. A posterior on θ is maintained for each run length, i.e. every possible starting point for the current regime, and is updated at each new data point. Therefore, BOCPD maintains n distinct posteriors on θ, and although this can be reduced with pruning, it necessitates posterior updates on θ that are computationally efﬁcient. Note that the run length updates in (1) require the UPM to provide predictive log likelihoods at all sample sizes rn (including zero). Therefore, UPM implementations using such approximations as plug-in MLE predictions will not work very well. The MLE may not even be deﬁned for run lengths smaller than the number of UPM parameters |θ|. For a Rice UPM, the efﬁcient O(1) updating in exponential family models by using a conjugate prior and accumulating sufﬁcient statistics is not possible. This motivates the use of VB methods for approximating the UPM predictions. 1.2 Variational Bayes We follow the framework of VB where when computation of the exact posterior distribution p(θ|y1:n ) is intractable it is often possible to create a variational approximation q(θ) that is locally optimal in terms of the Kullback-Leibler (KL) divergence KL(q p) while constraining q to be in a certain family of distributions Q. In general this is done by optimizing a lower bound L(q) on the evidence log p(y1:n ), using either gradient based methods or standard ﬁxed point equations. 1 The shape ν is usually assumed to be positive (∈ R+ ); however, there is nothing wrong with using a negative ν as Rice(x|ν, σ) = Rice(x|−ν, σ). It also allows for use of a normal-gamma prior. 3 The VB-EM Algorithm In many cases, such as the Rice UPM, the derivation of the VB ﬁxed point equations can be simpliﬁed by applying the VB-EM algorithm [3]. VB-EM is applicable to models that are conjugate-exponential (CE) after being augmented with latent variables x1:n . A model is CE if: 1) The complete data likelihood p(x1:n , y1:n |θ) is an exponential family distribution; and 2) the prior p(θ) is a conjugate prior for the complete data likelihood p(x1:n , y1:n |θ). We only have to constrain the posterior q(θ, x1:n ) = q(θ)q(x1:n ) to factorize between the latent variables and the parameters; we do not constrain the posterior to be of any particular parametric form. Requiring the complete likelihood to be CE is a much weaker condition than requiring the marginal on the observed data p(y1:n |θ) to be CE. Consider a mixture of Gaussians: the model becomes CE when augmented with latent variables (class labels). This is also the case for the Rice distribution (Section 2.1). Like the ordinary EM algorithm [9] the VB-EM algorithm alternates between two steps: 1) Find the posterior of the latent variables treating the expected natural parameters η := Eq(θ) [η] as correct: ¯ q(xi ) ← p(xi |yi , η = η ). 2) Find the posterior of the parameters using the expected sufﬁcient statis¯ ¯ tics S := Eq(x1:n ) [S(x1:n , y1:n )] as if they were the sufﬁcient statistics for the complete data set: ¯ q(θ) ← p(θ|S(x1:n , y1:n ) = S). The posterior will be of the same exponential family as the prior. 1.3 Tracking In this section we review data association, which along with ﬁltering constitutes tracking. In data association we estimate the association vectors a which map measurements to tracks. At each time NZ (n) step, n ∈ N1 , we observe NZ (n) ∈ N0 measurements, Zn = {zi,n }i=1 , which includes returns from both real targets and clutter (spurious measurements). Here, zi,n ∈ Z is a vector of kinematic measurements (positions in R3 , or R4 with a Doppler), augmented with an RCS component R ∈ R+ for the measured SNR, at time tn ∈ R. The assignment vector at time tn is such that an (i) = j if measurement i is associated with track j > 0; an (i) = 0 if measurement i is clutter. The inverse mapping a−1 maps tracks to measurements: meaning a−1 (an (i)) = i if an (i) = 0; and n n a−1 (i) = 0 ⇔ an (j) = i for all j. For example, if NT = 4 and a = [2 0 0 1 4] then NZ = 5, n Nc = 2, and a−1 = [4 1 0 5]. Each track is associated with at most one measurement, and vice-versa. In N D data association we jointly ﬁnd the MAP estimate of the association vectors over a sliding window of the last N − 1 time steps. We assume we have NT (n) ∈ N0 total tracks as a known parameter: NT (n) is adjusted over time using various algorithms (see [2, Ch. 3]). In the generative process each track places a probability distribution on the next N − 1 measurements, with both kinematic and RCS components. However, if the random RCS R for a measurement is below R0 then it will not be observed. There are Nc (n) ∈ N0 clutter measurements from a Poisson process with λ := E[Nc (n)] (often with uniform intensity). The ordering of measurements in Zn is assumed to be uniformly random. For 3D data association the model joint p(Zn−1:n , an−1 , an |Z1:n−2 ) is: NT |Zi | n pi (za−1 (i),n , za−1 n n−1 i=1 (i),n−1 ) × λNc (i) exp(−λ)/|Zi |! i=n−1 p0 (zj,i )I{ai (j)=0} , (5) j=1 where pi is the probability of the measurement sequence under track i; p0 is the clutter distribution. The probability pi is the product of the RCS component predictions (BOCPD) and the kinematic components (ﬁlter); informally, pi (z) = pi (positions) × pi (RCS). If there is a missed detection, i.e. a−1 (i) = 0, we then use pi (za−1 (i),n ) = P (R < R0 ) under the RCS model for track i with no conn n tribution from positional (kinematic) component. Just as BOCPD allows any black box probabilistic predictor to be used as a UPM, any black box model of measurement sequences can used in (5). The estimation of association vectors for the 3D case becomes an optimization problem of the form: ˆ (ˆn−1 , an ) = argmax log P (an−1 , an |Z1:n ) = argmax log p(Zn−1:n , an−1 , an |Z1:n−2 ) , (6) a (an−1 ,an ) (an−1 ,an ) which is effectively optimizing (5) with respect to the assignment vectors. The optimization given in (6) can be cast as a multidimensional assignment (MDA) problem [2], which can be solved efﬁciently in the 2D case. Higher dimensional assignment problems, however, are NP-hard; approximate, yet typically very accurate, solvers must be used for real-time operation, which is usually required for tracking systems [20]. If a radar scan occurs at each time step and a target is not detected, we assume the SNR has not exceeded the threshold, implying 0 ≤ R < R0 . This is a (left) censored measurement and is treated differently than a missing data point. Censoring is accounted for in Section 2.3. 4 2 Online Variational UPMs We cover the four technical challenges for implementing non-exponential family UPMs in an efﬁcient and online manner. We drop the index of the data point i when it is clear from context. 2.1 Variational Posterior for a Rice Distribution The Rice distribution has the property that x ∼ N (ν, σ 2 ) , y ∼ N (0, σ 2 ) =⇒ R = x2 + y 2 ∼ Rice(ν, σ) . (7) For simplicity we perform inference using R2 , as opposed to R, and transform accordingly: x ∼ N (ν, σ 2 ) , 1 R2 − x2 ∼ Gamma( 2 , τ ) , 2 τ := 1/σ 2 ∈ R+ =⇒ p(R2 , x) = p(R2 |x)p(x) = Gamma(R2 − x2 | 1 , τ )N (x|ν, σ 2 ) . 2 2 (8) The complete likelihood (8) is the product of two exponential family models and is exponential family itself, parameterized with base measure h and partition factor g: η = [ντ, −τ /2] , S = [x, R2 ] , h(R2 , x) = (2π R2 − x2 )−1 , g(ν, τ ) = τ exp(−ν 2 τ /2) . By inspection we see that the natural parameters η and sufﬁcient statistics S are the same as a Gaussian with unknown mean and variance. Therefore, we apply the normal-gamma prior on (ν, τ ) as it is the conjugate prior for the complete data likelihood. This allows us to apply the VB-EM 2 algorithm. We use yi := Ri as the VB observation, not Ri as in (3). In (5), z·,· (end) is the RCS R. VB M-Step We derive the posterior updates to the parameters given expected sufﬁcient statistics: n λ0 µ0 + i E[xi ] , λn = λ0 + n , αn = α0 + n , λ0 + n i=1 n n 1 1 nλ0 1 βn = β0 + (E[xi ] − x)2 + ¯ (¯ − µ0 )2 + x R2 − E[xi ]2 . 2 i=1 2 λ0 + n 2 i=1 i x := ¯ E[xi ]/n , µn = (9) (10) This is the same as an observation from a Gaussian and a gamma that share a (inverse) scale τ . 2 2 ¯ VB E-Step We then must ﬁnd both expected sufﬁcient statistics S. The expectation E[Ri |Ri ] = 2 2 Ri trivially; leaving E[xi |Ri ]. Recall that the joint on (x, y ) is a bivariate normal; if we constrain the radius to R, the angle ω will be distributed by a von Mises (VM) distribution. Therefore, ω := arccos(x/R) ∼ VM(0, κ) , κ = R E[ντ ] =⇒ E[x] = R E[cos ω] = RI1 (κ)/I0 (κ) , (11) where computing κ constitutes the VB E-step and we have used the trigonometric moment on ω [18]. This completes the computations required to do the VB updates on the Rice posterior. Variational Lower Bound For completeness, and to assess convergence, we derive the VB lower bound L(q). Using the standard formula [4] for L(q) = Eq [log p(y1:n , x1:n , θ)] + H[q] we get: n 2 1 E[log τ /2] − 1 E[τ ]Ri + (E[ντ ] − κi /Ri )E[xi ] − 2 E[ν 2 τ ] + log I0 (κi ) − KL(q p) , 2 (12) i=1 where p in the KL is the prior on (ν, τ ) which is easy to compute as q and p are both normal-gamma. Equivalently, (12) can be optimized directly instead of using the VB-EM updates. 2.2 Online Variational Inference In Section 2.1 we derived an efﬁcient way to compute the variational posterior for a Rice distribution for a ﬁxed data set. However, as is apparent from (1) we need online predictions from the UPM; we must be able to update the posterior one data point at a time. When the UPM is exponential family and we can compute the posterior exactly, we merely use the posterior from the previous step as the prior. However, since we are only computing a variational approximation to the posterior, using the previous posterior as the prior does not give the exact same answer as re-computing the posterior from batch. This gives two obvious options: 1) recompute the posterior from batch every update at O(n) cost or 2) use the previous posterior as the prior at O(1) cost and reduced accuracy. 5 The difference between the options is encapsulated by looking at the expected sufﬁcient statistics: n ¯ S = i=1 Eq(xi |y1:n ) [S(xi , yi )]. Naive online updating uses old expected sufﬁcient statistics whose n ¯ posterior effectively uses S = i=1 Eq(xi |y1:i ) [S(xi , yi )]. We get the best of both worlds if we adjust those estimates over time. We in fact can do this if we project the expected sufﬁcient statistics into a “feature space” in terms of the expected natural parameters. For some function f , q(xi ) = p(xi |yi , η = η ) =⇒ Eq(xi |y1:n ) [S(xi , yi )] = f (yi , η ) . ¯ ¯ If f is piecewise continuous then we can represent it with an inner product [8, Sec. 2.1.6] n n ¯ f (yi , η ) = φ(¯) ψ(yi ) =⇒ S = ¯ η φ(¯) ψ(yi ) = φ(¯) η η ψ(yi ) , i=1 i=1 (13) (14) where an inﬁnite dimensional φ and ψ may be required for exact representation, but can be approximated by a ﬁnite inner product. In the Rice distribution case we use (11) f (yi , η ) = E[xi ] = Ri I (Ri E[ντ ]) = Ri I ((Ri /µ0 ) µ0 E[ντ ]) , ¯ I (·) := I1 (·)/I0 (·) , (15) 2 Ri where recall that yi = and η1 = E[ντ ]. We can easily represent f with an inner product if we can ¯ represent I as an inner product: I (uv) = φ(u) ψ(v). We use unitless φi (u) = I (ci u) with c1:G as a log-linear grid from 10−2 to 103 and G = 50. We use a lookup table for ψ(v) that was trained to match I using non-negative least squares, which left us with a sparse lookup table. Online updating for VB posteriors was also developed in [24; 13]. These methods involved introducing forgetting factors to forget the contributions from old data points that might be detrimental to accuracy. Since the VB predictions are “embedded” in a change point method, they are automatically phased out if the posterior predictions become inaccurate making the forgetting factors unnecessary. 2.3 Censored Data As mentioned in Section 1.3, we must handle censored RCS observations during a missed detection. In the VB-EM framework we merely have to compute the expected sufﬁcient statistics given the censored measurement: E[S|R < R0 ]. The expected sufﬁcient statistic from (11) is now: R0 E[x|R < R0 ] = 0 ν ν E[x|R]p(R)dR RiceCDF (R0 |ν, τ ) = ν(1 − Q2 ( σ , R0 ))/(1 − Q1 ( σ , R0 )) , σ σ where QM is the Marcum Q function [17] of order M . Similar updates for E[S|R < R0 ] are possible for exponential or gamma UPMs, but are not shown as they are relatively easy to derive. 2.4 Variational Run Length Posteriors: Predictive Log Likelihoods Both updating the BOCPD run length posterior (1) and ﬁnding the marginal predictive log likelihood of the next point (2) require calculating the UPM’s posterior predictive log likelihood log p(yn+1 |rn , y(r) ). The marginal posterior predictive from (2) is used in data association (6) and benchmarking BOCPD against other methods. However, the exact posterior predictive distribution obtained by integrating the Rice likelihood against the VB posterior is difﬁcult to compute. We can break the BOCPD update (1) into a time and measurement update. The measurement update corresponds to a Bayesian model comparison (BMC) calculation with prior p(rn |y1:n ): p(rn |y1:n+1 ) ∝ p(yn+1 |rn , y(r) )p(rn |y1:n ) . (16) Using the BMC results in Bishop [4, Sec. 10.1.4] we ﬁnd a variational posterior on the run length by using the variational lower bound for each run length Li (q) ≤ log p(yn+1 |rn = i, y(r) ), calculated using (12), as a proxy for the exact UPM posterior predictive in (16). This gives the exact VB posterior if the approximating family Q is of the form: q(rn , θ, x) = qUPM (θ, x|rn )q(rn ) =⇒ q(rn = i) = exp(Li (q))p(rn = i|y1:n )/ exp(L(q)) , (17) where qUPM contains whatever constraints we used to compute Li (q). The normalizer on q(rn ) serves as a joint VB lower bound: L(q) = log i exp(Li (q))p(rn = i|y1:n ) ≤ log p(yn+1 |y1:n ). Note that the conditional factorization is different than the typical independence constraint on q. Furthermore, we derive the estimation of the assignment vectors a in (6) as a VB routine. We use a similar conditional constraint on the latent BOCPD variables given the assignment and constrain the assignment posterior to be a point mass. In the 2D assignment case, for example, ˆ q(an , X1:NT ) = q(X1:NT |an )q(an ) = q(X1:NT |an )I{an = an } , (18) 6 2 10 0 10 −1 10 −2 10 10 20 30 40 50 RCS RMSE (dBsm) RCS RMSE (dBsm) 10 KL (nats) 5 10 1 8 6 4 2 3 2 1 0 0 0 100 200 Sample Size (a) Online Updating 4 300 Time (b) Exponential RCS 400 0 100 200 300 400 Time (c) Rice RCS Figure 2: Left: KL from naive updating ( ), Sato’s method [24] ( ), and improved online VB (◦) to the batch VB posterior vs. sample size n; using a standard normal-gamma prior. Each curve represents a true ν in the generating Rice distribution: ν = 3.16 (red), ν = 10.0 (green), ν = 31.6 (blue) and τ = 1. Middle: The RMSE (dB scale) of the estimate on the mean RCS distribution E[Rn ] is plotted for an exponential RCS model. The curves are BOCPD (blue), IMM (black), identity (magenta), α-ﬁlter (green), and median ﬁlter (red). Right: Same as the middle but for the Rice RCS case. The dashed lines are 95% conﬁdence intervals. where each track’s Xi represents all the latent variables used to compute the variational lower bound on log p(zj,n |an (j) = i). In the BOCPD case, Xi := {rn , x, θ}. The resulting VB ﬁxed point ˆ equations ﬁnd the posterior on the latent variables Xi by taking an as the true assignment and solving ˆ the VB problem of (17); the assignment an is found by using (6) and taking the joint BOCPD lower bound L(q) as a proxy for the BOCPD predictive log likelihood component of log pi in (5). 3 3.1 Results Improved Online Solution We ﬁrst demonstrate the accuracy of the online VB approximation (Section 2.2) on a Rice estimation example; here, we only test the VB posterior as no change point detection is applied. Figure 2(a) compares naive online updating, Sato’s method [24], and our improved online updating in KL(online batch) of the posteriors for three different true parameters ν as sample size n increases. The performance curves are the KL divergence between these online approximations to the posterior and the batch VB solution (i.e. restarting VB from “scratch” every new data point) vs sample size. The error for our method stays around a modest 10−2 nats while naive updating incurs large errors of 1 to 50 nats [19, Ch. 4]. Sato’s method tends to settle in around a 1 nat approximation error. The recommended annealing schedule, i.e. forgetting factors, in [24] performed worse than naive updating. We did a grid search over annealing exponents and show the results for the best performing schedule of n−0.52 . By contrast, our method does not require the tuning of an annealing schedule. 3.2 RCS Estimation Benchmarking We now compare BOCPD with other methods for RCS estimation. We use the same experimental example as Slocumb and Klusman III [25], which uses an augmented interacting multiple model (IMM) based method for estimating the RCS; we also compare against the same α-ﬁlter and median ﬁlter used in [25]. As a reference point, we also consider the “identity ﬁlter” which is merely an unbiased ﬁlter that uses only yn to estimate the mean RCS E[Rn ] at time step n. We extend this example to look at Rice RCS in addition to the exponential RCS case. The bias correction constants in the IMM were adjusted for the Rice distribution case as per [25, Sec. 3.4]. The results on exponential distributions used in [25] and the Rice distribution case are shown in Figures 2(b) and 2(c). The IMM used in [25] was hard-coded to expect jumps in the SNR of multiples of ±10 dB, which is exactly what is presented in the example (a sequence of 20, 10, 30, and 10 dB). In [25] the authors mention that the IMM reaches an RMSE “ﬂoor” at 2 dB, yet BOCPD continues to drop as low as 0.56 dB. The RMSE from BOCPD does not spike nearly as high as the other methods upon a change in E[Rn ]. The α-ﬁlter and median ﬁlter appear worse than both the IMM and BOCPD. The RMSE and conﬁdence intervals are calculated from 5000 runs of the experiment. 7 45 80 40 30 Northing (km) Improvement (%) 35 25 20 15 10 5 60 40 20 0 0 −5 1 2 3 4 −20 5 Difficulty 0 20 40 60 80 100 Easting (km) (a) SIAP Metrics (b) Heathrow (LHR) Figure 3: Left: Average relative improvements (%) for SIAP metrics: position accuracy (red ), velocity accuracy (green ), and spurious tracks (blue ◦) across difﬁculty levels. Right: LHR: true trajectories shown as black lines (−), estimates using a BOCPD RCS model for association shown as blue stars (∗), and the standard tracker as red circles (◦). The standard tracker has spurious tracks over east London and near Ipswich. Background map data: Google Earth (TerraMetrics, Data SIO, NOAA, U.S. Navy, NGA, GEBCO, Europa Technologies) 3.3 Flightradar24 Tracking Problem Finally, we used real ﬂight trajectories from ﬂightradar24 and plugged them into our 3D tracking algorithm. We compare tracking performance between using our BOCPD model and the relatively standard constant probability of detection (no RCS) [2, Sec. 3.5] setup. We use the single integrated air picture (SIAP) metrics [6] to demonstrate the improved performance of the tracking. The SIAP metrics are a standard set of metrics used to compare tracking systems. We broke the data into 30 regions during a one hour period (in Sept. 2012) sampled every 5 s, each within a 200 km by 200 km area centered around the world’s 30 busiest airports [22]. Commercial airport trafﬁc is typically very orderly and does not allow aircraft to ﬂy close to one another or cross paths. Feature-aided tracking is most necessary in scenarios with a more chaotic air situation. Therefore, we took random subsets of 10 ﬂight paths and randomly shifted their start time to allow for scenarios of greater interest. The resulting SIAP metric improvements are shown in Figure 3(a) where we look at performance by a difﬁculty metric: the number of times in a scenario any two aircraft come within ∼400 m of each other. The biggest improvements are seen for difﬁculties above three where positional accuracy increases by 30%. Signiﬁcant improvements are also seen for velocity accuracy (11%) and the frequency of spurious tracks (6%). Signiﬁcant performance gains are seen at all difﬁculty levels considered. The larger improvements at level three over level ﬁve are possibly due to some level ﬁve scenarios that are not resolvable simply through more sophisticated models. We demonstrate how our RCS methods prevent the creation of spurious tracks around London Heathrow in Figure 3(b). 4 Conclusions We have demonstrated that it is possible to use sophisticated and recent developments in machine learning such as BOCPD, and use the modern inference method of VB, to produce demonstrable improvements in the much more mature ﬁeld of radar tracking. We ﬁrst closed a “hole” in the literature in Section 2.1 by deriving variational inference on the parameters of a Rice distribution, with its inherent applicability to radar tracking. In Sections 2.2 and 2.4 we showed that it is possible to use these variational UPMs for non-exponential family models in BOCPD without sacriﬁcing its modular or online nature. The improvements in online VB are extendable to UPMs besides a Rice distribution and more generally beyond change point detection. We can use the variational lower bound from the UPM and obtain a principled variational approximation to the run length posterior. Furthermore, we cast the estimation of the assignment vectors themselves as a VB problem, which is in large contrast to the tracking literature. More algorithms from the tracking literature can possibly be cast in various machine learning frameworks, such as VB, and improved upon from there. 8 References [1] Adams, R. P. and MacKay, D. J. (2007). Bayesian online changepoint detection. Technical report, University of Cambridge, Cambridge, UK. [2] Bar-Shalom, Y., Willett, P., and Tian, X. (2011). Tracking and Data Fusion: A Handbook of Algorithms. YBS Publishing. [3] Beal, M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. In Bayesian Statistics, volume 7, pages 453–464. [4] Bishop, C. M. (2007). Pattern Recognition and Machine Learning. Springer. [5] Braun, J. V., Braun, R., and M¨ ller, H.-G. (2000). Multiple changepoint ﬁtting via quasilikelihood, with u application to DNA sequence segmentation. Biometrika, 87(2):301–314. [6] Byrd, E. (2003). Single integrated air picture (SIAP) attributes version 2.0. Technical Report 2003-029, DTIC. [7] Chen, J. and Gupta, A. (1997). Testing and locating variance changepoints with application to stock prices. Journal of the Americal Statistical Association, 92(438):739–747. [8] Courant, R. and Hilbert, D. (1953). Methods of Mathematical Physics. Interscience. [9] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38. [10] Ehrman, L. M. and Blair, W. D. (2006). Comparison of methods for using target amplitude to improve measurement-to-track association in multi-target tracking. In Information Fusion, 2006 9th International Conference on, pages 1–8. IEEE. [11] Fearnhead, P. and Liu, Z. (2007). Online inference for multiple changepoint problems. Journal of the Royal Statistical Society, Series B, 69(4):589–605. [12] Hipp, C. (1974). Sufﬁcient statistics and exponential families. The Annals of Statistics, 2(6):1283–1292. [13] Honkela, A. and Valpola, H. (2003). On-line variational Bayesian learning. In 4th International Symposium on Independent Component Analysis and Blind Signal Separation, pages 803–808. [14] Kalman, R. E. (1960). A new approach to linear ﬁltering and prediction problems. Transactions of the ASME — Journal of Basic Engineering, 82(Series D):35–45. [15] Lauwers, L., Barb´ , K., Van Moer, W., and Pintelon, R. (2009). Estimating the parameters of a Rice e distribution: A Bayesian approach. In Instrumentation and Measurement Technology Conference, 2009. I2MTC’09. IEEE, pages 114–117. IEEE. [16] Mahler, R. (2003). Multi-target Bayes ﬁltering via ﬁrst-order multi-target moments. IEEE Trans. AES, 39(4):1152–1178. [17] Marcum, J. (1950). Table of Q functions. U.S. Air Force RAND Research Memorandum M-339, Rand Corporation, Santa Monica, CA. [18] Mardia, K. V. and Jupp, P. E. (2000). Directional Statistics. John Wiley & Sons, New York. [19] Murray, I. (2007). Advances in Markov chain Monte Carlo methods. PhD thesis, Gatsby computational neuroscience unit, University College London, London, UK. [20] Poore, A. P., Rijavec, N., Barker, T. N., and Munger, M. L. (1993). Data association problems posed as multidimensional assignment problems: algorithm development. In Optical Engineering and Photonics in Aerospace Sensing, pages 172–182. International Society for Optics and Photonics. [21] Richards, M. A., Scheer, J., and Holm, W. A., editors (2010). Principles of Modern Radar: Basic Principles. SciTech Pub. [22] Rogers, S. (2012). The world’s top 100 airports: listed, ranked and mapped. The Guardian. [23] Saatci, Y., Turner, R., and Rasmussen, C. E. (2010). Gaussian process change point models. In 27th ¸ International Conference on Machine Learning, pages 927–934, Haifa, Israel. Omnipress. [24] Sato, M.-A. (2001). Online model selection based on the variational Bayes. Neural Computation, 13(7):1649–1681. [25] Slocumb, B. J. and Klusman III, M. E. (2005). A multiple model SNR/RCS likelihood ratio score for radar-based feature-aided tracking. In Optics & Photonics 2005, pages 59131N–59131N. International Society for Optics and Photonics. [26] Swerling, P. (1954). Probability of detection for ﬂuctuating targets. Technical Report RM-1217, Rand Corporation. [27] Turner, R. (2011). Gaussian Processes for State Space Models and Change Point Detection. PhD thesis, University of Cambridge, Cambridge, UK. 9

5 0.71586734 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

Author: George H. Chen, Stanislav Nikolov, Devavrat Shah

Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justiﬁcation for the effectiveness of nearest-neighbor-like classiﬁcation of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classiﬁcation rule that can be approximated by a nearest-neighbor classiﬁer. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classiﬁcation under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassiﬁcation rate as nearest-neighbor classiﬁcation while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1

6 0.71211028 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

7 0.71105313 91 nips-2013-Dirty Statistical Models

8 0.70963353 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

9 0.7093671 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

10 0.70801806 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

11 0.70392865 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

12 0.70215976 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

13 0.70096785 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

14 0.70035386 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

15 0.69823551 331 nips-2013-Top-Down Regularization of Deep Belief Networks

16 0.69822359 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

17 0.6978488 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

18 0.69659573 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

19 0.6964584 109 nips-2013-Estimating LASSO Risk and Noise Level

20 0.69640827 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit