nips nips2013 nips2013-237 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matjaz Jogan, Alan Stocker
Abstract: How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Optimal integration of visual speed across different spatiotemporal frequency channels Matjaˇ Jogan and Alan A. [sent-1, score-1.038]
2 edu Abstract How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? [sent-4, score-1.362]
3 Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. [sent-5, score-1.343]
4 We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). [sent-6, score-0.797]
5 We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. [sent-7, score-1.604]
6 We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. [sent-8, score-0.966]
7 The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. [sent-9, score-0.745]
8 Fits are improved if we assume that the channel responses are subject to divisive normalization. [sent-11, score-0.534]
9 Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. [sent-12, score-1.317]
10 1 Introduction Low contrast stimuli are perceived to move slower than high contrast ones [17]. [sent-13, score-0.65]
11 This effect can be explained with a Bayesian observer model that assumes a prior distribution with a peak at slow speeds [18, 8, 15]. [sent-14, score-0.423]
12 Based on a noisy sensory measurement m of the true stimulus speed s the Bayesian observer model computes the posterior probability p(s|m) = p(m|s)p(s) p(m) (1) by multiplying the likelihood function p(m|s) with the probability p(s) representing the observer’s prior expectation. [sent-16, score-0.944]
13 if stimulus contrast is low), the likelihood function is broad and the posterior probability distribution is shifted toward the peak of the prior, resulting in a perceived speed that is biased toward slow speeds. [sent-19, score-1.124]
14 5 ωt(Hz) 3 p( b s( a Figure 1: a) A natural stimulus in motion exhibits a rich spatiotemporal frequency spectrum that determines how humans perceive its speed s. [sent-24, score-1.128]
15 5 c/deg and moves with a speed of 2 deg/s will trigger responses r = {r1 , r2 } in two corresponding channels (red circles). [sent-30, score-0.706]
16 In this paper we make a step toward a more general observer model of visual speed perception that, in the longterm, will allow us to predict perceived speed for arbitrary complex stimuli (Fig. [sent-32, score-1.524]
17 1), which decomposes complex motion stimuli into simpler components processed in separate spatiotemporal frequency channels. [sent-35, score-0.745]
18 Based on the motion energy model [1, 12], we assume that each channel is sensitive to a narrow spatiotemporal frequency band. [sent-36, score-0.899]
19 The observed speed of a stimulus is then a result of combining the sensory evidence provided by these individual channels with a prior expectation for slow speeds. [sent-37, score-1.026]
20 Here we employ an analogous approach by treating the responses of individual spatiotemporal frequency channels as independent cues about a stimulus’ motion. [sent-41, score-0.865]
21 We validated the model against the data of a series of psychophysical experiments in which we measured how humans’ speed percept of coherent motion depends on the stimulus energy in different spatial frequency bands. [sent-42, score-1.187]
22 Stimuli consisted of drifting sinusoidal gratings at two different spatial frequencies and contrasts, and various combinations of these single gratings. [sent-43, score-0.474]
23 For a given stimulus speed s, single gratings target only one channel while the combined stimuli target multiple channels. [sent-44, score-1.425]
24 We consider s to be the speed of locally coherent and translational stimulus motion (Fig. [sent-47, score-0.77]
25 This motion can be represented by its power spectrum in spatiotemporal frequency space. [sent-49, score-0.494]
26 For a given motion direction the energy lies in a two-dimensional plane spanned by a temporal frequency axis ωt and a spatial frequency axis ωs and is constrained to coordinates that satisfy s = ωt /ωs (Fig. [sent-50, score-0.554]
27 According to the motion energy model, we assume that the visual system contains motion units that are tuned to specific locations in this plane [1, 12]. [sent-52, score-0.378]
28 A coherent motion stimulus with speed s and multiple spatial frequencies ωs will therefore drive only those units whose tuning curves are centered at coordinates (ωs , ωs s). [sent-53, score-0.913]
29 We formulate our Bayesian observer model in terms of k spatiotemporal frequency channels, each tuned to a narrow spatiotemporal frequency band (Fig. [sent-54, score-0.941]
30 A moving stimulus will elicit a total response r = [r1 , r2 , . [sent-56, score-0.385]
31 The response of each channel provides a likelihood 2 channels likelihoods low speed prior stimulus estimate normalization posterior Figure 2: Bayesian observer model of speed perception with multiple spatiotemporal channels. [sent-60, score-2.295]
32 A moving stimulus with speed s is decomposed and processed in separate channels that are sensitive to energy in specific spatiotemporal frequency bands. [sent-61, score-1.285]
33 Based on the channel response ri we formulate a likelihood function p(ri |s) for each channel. [sent-62, score-0.587]
34 Here we assume perceived speed s to ˆ be the mode of the posterior. [sent-64, score-0.636]
35 We consider a model with and without response normalization across channels (red dashed line). [sent-65, score-0.491]
36 Assuming independent channel noise, we can formulate the posterior probability of an Bayesian observer model that performs optimal integration as p(s|r) ∝ p(s) p(ri |s) . [sent-67, score-0.643]
37 For reasons of simplicity and without loss of generality, we focus on the case where the stimulus activates two channels with responses r = [ri ], i ∈ {1, 2}. [sent-72, score-0.701]
38 Assuming that E µ(ri )|s approximates the stimulus speed s, s the expected value of s is ˆ 3 E s|s ˆ = = 2 2 2 2 σ2 σ1 σ1 σ2 2 E µ(r1 )|s + σ 2 + σ 2 E µ(r2 )|s + a σ 2 + σ 2 + σ2 1 2 1 2 2 2 2 2 2 2 σ2 σ1 σ1 σ2 σ1 σ2 2 2 s + σ2 + σ2 s + a σ2 + σ2 = s + a σ2 + σ2 . [sent-79, score-0.579]
39 , percepts) for stimuli that activate both channels is always smaller than the variances of estimates that are 2 2 based on each of the channel responses alone (σ1 and σ2 ). [sent-86, score-0.992]
40 Second, because of the slow speed prior a is negative, and perceived speeds are more biased toward slower speeds the larger the sensory uncertainty. [sent-88, score-1.127]
41 As a result, the perceived speed of combined stimuli that activate both channels is always faster than the percepts based on each of the individual channel responses alone. [sent-89, score-1.78]
42 Finally, the model predicts that the perceived speed of a combined stimulus solely depends on the responses of the channels to its constituent components, and is therefore independent of the relative phase of the components we combined [5]. [sent-90, score-1.583]
43 , their responses are independent of the number of active channels and the overall activity in the system. [sent-94, score-0.414]
44 Here we assume that the response of an individual channel ri is normalized such that its normalized ∗ response ri is given by rn ∗ ri = ri i n . [sent-99, score-1.02]
45 , the relative difference) between the individual channel responses for increasing values of the exponent n. [sent-102, score-0.588]
46 Note that normalization affects only the responses ri , thus modulating the ∗ width of the individual likelihood functions. [sent-104, score-0.426]
47 By explicitly modeling the encoding of visual motion in spatiotemporal frequency channels, we already extended the Bayesian model of speed perception toward a more physiological interpretation. [sent-107, score-1.021]
48 3 Results In the second part of this paper we test the validity of our model with and without channel normalization against data from a psychophysical two alternative forced choice (2AFC) speed discrimination experiment. [sent-109, score-1.043]
49 1 Speed discrimination experiment Seven subjects performed a 2AFC visual speed discrimination task. [sent-111, score-0.807]
50 In each trial, subjects were presented for 1250ms with a reference and a test stimulus on either side of a fixation mark (eccentricity 4 peaks-add 3ωs = 1. [sent-112, score-0.402]
51 5 Figure 3: Single frequency gratings were combined in either a ”peaks-add” or a ”peaks-subtract” phase configuration (0 deg and 60 deg phase, respectively) [5]. [sent-114, score-0.662]
52 We used these two phase-combinations to test whether the channel hypothesis is valid or not. [sent-116, score-0.386]
53 After stimulus presentation, a brief flash appeared on the left or right side of the fixation mark and subjects had to answer whether the grating that was presented on the indicated side was moving faster or slower than the grating on the other side. [sent-120, score-0.539]
54 Four of these stimuli were simple sinewave gratings of a single spatial frequency, either ωs = 0. [sent-123, score-0.554]
55 The low frequency test stimulus had a contrast of 22. [sent-126, score-0.508]
56 5%, while the three higher frequency stimuli had contrasts 7. [sent-127, score-0.398]
57 The other six stimuli were pair-wise combinations of the single frequency gratings (Fig. [sent-131, score-0.616]
58 All test stimuli were drifting at a speed of 2 deg/s. [sent-135, score-0.606]
59 The reference stimulus was a broadband stimulus stimulus whose speed was regulated by an adaptive staircase procedure. [sent-136, score-1.186]
60 The simple stimuli were designed to target individual spatiotemporal frequency channels while the combined stimuli were meant to target two channels simultaneously. [sent-139, score-1.444]
61 The two phase configurations (peaks-add and peaks-subtract) were used to test the multiple channel hypothesis: if combined stimuli are decomposed and processed in separate channels, their perceived speeds should be independent of the phase configuration. [sent-140, score-1.291]
62 In particular, the difference in overall contrast of the two configurations should not affect perceived speed (Fig 3). [sent-141, score-0.682]
63 Matching speeds (PSEs) and relative discrimination thresholds (Weber-fraction) were extracted from a maximum-likelihood fit of each of the 10 psychometric functions with a cumulative Gaussian. [sent-142, score-0.649]
64 We found no significant difference in perceived speeds and thresholds between the combined grating stimuli in ”peaks-add” and ”peaks-subtract” configuration (Fig. [sent-146, score-0.972]
65 This suggests that the perceived speed of combined stimuli is independent of the relative phase between the individual stimulus components, and therefore is processed in independent channels. [sent-154, score-1.373]
66 05 b matching speed (deg/s) 3 data channel model channel model+norm. [sent-159, score-1.034]
67 5 c/deg combined peaks-subtract Figure 4: Data and model fits for speed discrimination task: a) relative discrimination thresholds (Weber-fraction) and b) matching speeds (PSEs). [sent-167, score-1.093]
68 For the single frequency gratings, the perceived speed increases with contrast as predicted by the standard Bayesian model. [sent-169, score-0.835]
69 For the combined stimuli, there is no significant difference (based on 95% confidence intervals) in perceived speeds between the combined grating stimuli in ”peaks-add” and ”peaks-subtract” configuration. [sent-170, score-0.962]
70 The Bayesian model with normalized responses (red line) better accounts for the data than the model without interaction between the channels (blue line). [sent-171, score-0.513]
71 The model without normalization has six parameters: four channel responses ri for each simple stimulus reflecting the individual likelihood widths, the reference response rref and the local slope of the prior a. [sent-174, score-1.317]
72 1 The model with normalization has two additional parameters n1 and n2 , reflecting the exponents of the normalization in each of the two channels (Eq. [sent-175, score-0.495]
73 The model with and without response normalization was simultaneously fit to the psychometric functions of all 10 test conditions using the cumulative probability distribution (Eq. [sent-177, score-0.4]
74 8) and a 1 Alternatively, channel responses as function of contrast could be modeled according to a contrast response 2 function ri = M + Rmax c2 c+c2 , where M is the baseline response, Rmax the maximal response, and c50 is 50 the semi saturation contrast level. [sent-178, score-0.816]
75 6 gaussian fit channel model channel model+norm. [sent-179, score-0.701]
76 From these fits we extracted the matching speeds (PSEs) and relative discrimination thresholds (Weber-fractions) shown in Fig. [sent-190, score-0.535]
77 In particular, the data reflect the inverse relationship between relative matching speeds and discrimination thresholds predicted by the slow-speed prior of the model. [sent-193, score-0.541]
78 The model with response normalization, however, better captures subjects’ precepts in particular in conditions where very low contrast stimuli were combined. [sent-194, score-0.381]
79 5) as well as the extracted discrimination thresholds and matching speeds (Fig. [sent-196, score-0.508]
80 Further support of the normalized model comes form the fitted parameter values: for the model with no normalization, the response level of the highest contrast stimulus r4 was not well constrained2 (r1 =6. [sent-200, score-0.505]
81 The results suggest that the perceived speed of a combined stimulus can be accurately described as an optimal combination of sensory information provided by individual spatiotemporal frequency channels that interact via response normalization. [sent-215, score-1.866]
82 4 Discussion We have shown that human visual speed perception can be accurately described by a Bayesian observer model that optimally combines sensory information from independent channels, each sensitive to motion energies in a specific spatiotemporal frequency band. [sent-216, score-1.204]
83 Our model expands the previously proposed Bayesian model of speed perception [16]. [sent-217, score-0.416]
84 It no longer assumes a single likelihood function affected by stimulus contrast but rather considers the combination of likelihood functions based on the motion energies in different spatiotemporal frequency channels. [sent-218, score-0.895]
85 7 We tested our model against data from a 2AFC speed discrimination experiment. [sent-221, score-0.511]
86 Stimuli consisted of drifting sinewave gratings at different spatial frequencies and combinations thereof. [sent-222, score-0.495]
87 Subjects’ perceived speeds of the combined stimuli were independent of the phase configuration of the constituent sinewave gratings even though different phases resulted in different overall contrast values. [sent-223, score-1.16]
88 This supports the hypothesis that perceived speed is processed across multiple spatiotemporal frequency channels (Graham and Nachmias used a similar approach to demonstrate the existence of individual spatial frequency channels [5]). [sent-224, score-1.863]
89 The proposed observer model provided a good fit to the data, but the fit was improved when the channel responses were assumed to be subject to normalization by the overall channel response. [sent-225, score-1.13]
90 Considering that divisive normalization is arguably an ubiquitous process in neural representations, we see this result as a consequence of our attempt to formulate Bayesian observer models at a level that is closer to a physiological description. [sent-226, score-0.387]
91 Future experiments that will test more stimulus combinations will help to further improve the characterization of the channel responses and interactions. [sent-228, score-0.823]
92 This decrease is nicely reflected in the measured decrease in discrimination thresholds for the combined stimuli when the thresholds for both individual gratings were approximately the same (Fig. [sent-232, score-0.939]
93 Note, that because of the slow speed prior, a Bayesian model predicts that the perceived speed are inversely proportional to the discrimination threshold, a prediction that is well supported by our data. [sent-234, score-1.172]
94 Here we provide a behavioral account for both discrimination thresholds and matching speeds by directly estimating the parameters of the likelihoods and the speed prior from psychophysical data. [sent-240, score-0.917]
95 In the long term, the goal is to be able to predict the perceived motion for an arbitrarily complex natural stimulus, and we believe the proposed model is a step in this direction. [sent-242, score-0.503]
96 Contrast and stimulus complexity moderate the relationship between spatial frequency and perceived speed: Implications for MT models of speed perception. [sent-256, score-1.145]
97 A logarithmic, scale-invariant representation of speed in macaque middle temporal area accounts for speed discrimination performance. [sent-305, score-0.805]
98 Estimating target speed from the population response in visual area MT. [sent-313, score-0.453]
99 Perceived speed and direction of complex gratings and plaids. [sent-339, score-0.512]
100 Noise characteristics and prior expectations in human visual speed perception. [sent-351, score-0.387]
wordName wordTfidf (topN-words)
[('perceived', 0.344), ('channel', 0.339), ('speed', 0.292), ('stimulus', 0.287), ('channels', 0.268), ('gratings', 0.22), ('stimuli', 0.214), ('spatiotemporal', 0.205), ('discrimination', 0.196), ('observer', 0.181), ('speeds', 0.162), ('frequency', 0.153), ('responses', 0.146), ('motion', 0.136), ('psychometric', 0.134), ('normalization', 0.102), ('response', 0.098), ('grating', 0.096), ('ri', 0.095), ('thresholds', 0.083), ('deg', 0.083), ('perception', 0.078), ('drifting', 0.078), ('sensory', 0.073), ('combined', 0.073), ('spatial', 0.069), ('psychophysical', 0.069), ('visual', 0.063), ('bayesian', 0.06), ('percept', 0.06), ('subjects', 0.06), ('integration', 0.057), ('coherent', 0.055), ('pses', 0.051), ('rref', 0.051), ('sinewave', 0.051), ('phase', 0.05), ('divisive', 0.049), ('individual', 0.049), ('frequencies', 0.048), ('simoncelli', 0.047), ('contrast', 0.046), ('stocker', 0.045), ('cues', 0.044), ('energy', 0.043), ('likelihoods', 0.042), ('matching', 0.041), ('sr', 0.038), ('processed', 0.037), ('toward', 0.037), ('likelihood', 0.034), ('physiological', 0.034), ('cue', 0.034), ('reference', 0.033), ('prior', 0.032), ('contrasts', 0.031), ('surround', 0.03), ('percepts', 0.03), ('sinusoidal', 0.03), ('humans', 0.029), ('var', 0.029), ('combinations', 0.029), ('slope', 0.028), ('normalized', 0.028), ('relative', 0.027), ('exponent', 0.027), ('ts', 0.027), ('gurations', 0.026), ('optical', 0.026), ('curves', 0.026), ('perceive', 0.026), ('aic', 0.026), ('extracted', 0.026), ('guration', 0.026), ('hypothesis', 0.025), ('accounts', 0.025), ('slow', 0.025), ('xation', 0.025), ('activate', 0.025), ('tted', 0.024), ('interact', 0.024), ('circles', 0.024), ('graham', 0.024), ('suppression', 0.024), ('integrated', 0.023), ('con', 0.023), ('neurosci', 0.023), ('rmax', 0.023), ('optics', 0.023), ('model', 0.023), ('vision', 0.022), ('test', 0.022), ('modalities', 0.022), ('posterior', 0.022), ('neuroscience', 0.022), ('red', 0.022), ('cumulative', 0.021), ('formulate', 0.021), ('nicely', 0.021), ('america', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
Author: Matjaz Jogan, Alan Stocker
Abstract: How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. 1
2 0.1667778 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
Author: Zhuo Wang, Alan Stocker, Daniel Lee
Abstract: In many neural systems, information about stimulus variables is often represented in a distributed manner by means of a population code. It is generally assumed that the responses of the neural population are tuned to the stimulus statistics, and most prior work has investigated the optimal tuning characteristics of one or a small number of stimulus variables. In this work, we investigate the optimal tuning for diffeomorphic representations of high-dimensional stimuli. We analytically derive the solution that minimizes the L2 reconstruction loss. We compared our solution with other well-known criteria such as maximal mutual information. Our solution suggests that the optimal weights do not necessarily decorrelate the inputs, and the optimal nonlinearity differs from the conventional equalization solution. Results illustrating these optimal representations are shown for some input distributions that may be relevant for understanding the coding of perceptual pathways. 1
3 0.15740241 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow
Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1
Author: Daniel L. Yamins, Ha Hong, Charles Cadieu, James J. DiCarlo
Abstract: Humans recognize visually-presented objects rapidly and accurately. To understand this ability, we seek to construct models of the ventral stream, the series of cortical areas thought to subserve object recognition. One tool to assess the quality of a model of the ventral stream is the Representational Dissimilarity Matrix (RDM), which uses a set of visual stimuli and measures the distances produced in either the brain (i.e. fMRI voxel responses, neural firing rates) or in models (features). Previous work has shown that all known models of the ventral stream fail to capture the RDM pattern observed in either IT cortex, the highest ventral area, or in the human ventral stream. In this work, we construct models of the ventral stream using a novel optimization procedure for category-level object recognition problems, and produce RDMs resembling both macaque IT and human ventral stream. The model, while novel in the optimization procedure, further develops a long-standing functional hypothesis that the ventral visual stream is a hierarchically arranged series of processing stages optimized for visual object recognition. 1
5 0.10969258 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
Author: Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski
Abstract: With the advent of modern stimulation techniques in neuroscience, the opportunity arises to map neuron to neuron connectivity. In this work, we develop a method for efficiently inferring posterior distributions over synaptic strengths in neural microcircuits. The input to our algorithm is data from experiments in which action potentials from putative presynaptic neurons can be evoked while a subthreshold recording is made from a single postsynaptic neuron. We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for significant prior information about the connectivity and neuronal cell types to be incorporated if available. Due to the technical challenges and sparsity of these systems, it is important to focus experimental time stimulating the neurons whose synaptic strength is most ambiguous, therefore we also develop an online optimal design algorithm for choosing which neurons to stimulate at each trial. 1
6 0.10425606 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
7 0.10017709 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
8 0.097602457 88 nips-2013-Designed Measurements for Vector Count Data
9 0.09567669 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively
10 0.091943078 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?
11 0.083982691 183 nips-2013-Mapping paradigm ontologies to and from the brain
12 0.081288703 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
13 0.080431305 205 nips-2013-Multisensory Encoding, Decoding, and Identification
14 0.077792466 69 nips-2013-Context-sensitive active sensing in humans
15 0.066530898 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
16 0.065679021 173 nips-2013-Least Informative Dimensions
17 0.056560438 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
18 0.052338168 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
19 0.051767021 54 nips-2013-Bayesian optimization explains human active search
20 0.048622191 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
topicId topicWeight
[(0, 0.117), (1, 0.051), (2, -0.102), (3, -0.054), (4, -0.147), (5, -0.028), (6, -0.003), (7, -0.043), (8, -0.009), (9, 0.097), (10, -0.051), (11, 0.023), (12, -0.066), (13, -0.04), (14, -0.046), (15, 0.016), (16, -0.042), (17, -0.083), (18, -0.093), (19, -0.029), (20, -0.021), (21, 0.032), (22, -0.083), (23, -0.102), (24, -0.067), (25, -0.068), (26, -0.058), (27, 0.12), (28, -0.012), (29, -0.003), (30, -0.009), (31, -0.055), (32, -0.138), (33, -0.109), (34, 0.006), (35, -0.105), (36, -0.019), (37, -0.049), (38, -0.108), (39, -0.009), (40, -0.013), (41, -0.0), (42, -0.12), (43, 0.012), (44, 0.019), (45, 0.001), (46, 0.023), (47, 0.053), (48, -0.083), (49, -0.087)]
simIndex simValue paperId paperTitle
same-paper 1 0.96817964 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
Author: Matjaz Jogan, Alan Stocker
Abstract: How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. 1
2 0.74775892 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow
Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1
3 0.7393254 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
Author: Zhuo Wang, Alan Stocker, Daniel Lee
Abstract: In many neural systems, information about stimulus variables is often represented in a distributed manner by means of a population code. It is generally assumed that the responses of the neural population are tuned to the stimulus statistics, and most prior work has investigated the optimal tuning characteristics of one or a small number of stimulus variables. In this work, we investigate the optimal tuning for diffeomorphic representations of high-dimensional stimuli. We analytically derive the solution that minimizes the L2 reconstruction loss. We compared our solution with other well-known criteria such as maximal mutual information. Our solution suggests that the optimal weights do not necessarily decorrelate the inputs, and the optimal nonlinearity differs from the conventional equalization solution. Results illustrating these optimal representations are shown for some input distributions that may be relevant for understanding the coding of perceptual pathways. 1
4 0.63416988 205 nips-2013-Multisensory Encoding, Decoding, and Identification
Author: Aurel A. Lazar, Yevgeniy Slutskiy
Abstract: We investigate a spiking neuron model of multisensory integration. Multiple stimuli from different sensory modalities are encoded by a single neural circuit comprised of a multisensory bank of receptive fields in cascade with a population of biophysical spike generators. We demonstrate that stimuli of different dimensions can be faithfully multiplexed and encoded in the spike domain and derive tractable algorithms for decoding each stimulus from the common pool of spikes. We also show that the identification of multisensory processing in a single neuron is dual to the recovery of stimuli encoded with a population of multisensory neurons, and prove that only a projection of the circuit onto input stimuli can be identified. We provide an example of multisensory integration using natural audio and video and discuss the performance of the proposed decoding and identification algorithms. 1
5 0.59102631 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?
Author: Cheston Tan, Jedediah M. Singer, Thomas Serre, David Sheinberg, Tomaso Poggio
Abstract: The macaque Superior Temporal Sulcus (STS) is a brain area that receives and integrates inputs from both the ventral and dorsal visual processing streams (thought to specialize in form and motion processing respectively). For the processing of articulated actions, prior work has shown that even a small population of STS neurons contains sufficient information for the decoding of actor invariant to action, action invariant to actor, as well as the specific conjunction of actor and action. This paper addresses two questions. First, what are the invariance properties of individual neural representations (rather than the population representation) in STS? Second, what are the neural encoding mechanisms that can produce such individual neural representations from streams of pixel images? We find that a simple model, one that simply computes a linear weighted sum of ventral and dorsal responses to short action “snippets”, produces surprisingly good fits to the neural data. Interestingly, even using inputs from a single stream, both actor-invariance and action-invariance can be accounted for, by having different linear weights. 1
7 0.57086754 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively
8 0.51630312 183 nips-2013-Mapping paradigm ontologies to and from the brain
9 0.48459333 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
10 0.46678862 69 nips-2013-Context-sensitive active sensing in humans
11 0.42783779 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
12 0.42412698 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
13 0.41934851 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
14 0.4134976 121 nips-2013-Firing rate predictions in optimal balanced networks
15 0.37886074 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
16 0.37878802 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
17 0.37105101 284 nips-2013-Robust Spatial Filtering with Beta Divergence
18 0.35357371 88 nips-2013-Designed Measurements for Vector Count Data
19 0.34722134 173 nips-2013-Least Informative Dimensions
20 0.32959419 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
topicId topicWeight
[(16, 0.035), (33, 0.171), (34, 0.095), (41, 0.015), (48, 0.224), (49, 0.066), (56, 0.09), (70, 0.048), (85, 0.016), (89, 0.108), (93, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.85132748 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
Author: Matjaz Jogan, Alan Stocker
Abstract: How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects’ perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure. 1
2 0.7277211 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman
Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1
3 0.72562456 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow
Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1
Author: Ryan D. Turner, Steven Bottone, Clay J. Stanek
Abstract: The Bayesian online change point detection (BOCPD) algorithm provides an efficient way to do exact inference when the parameters of an underlying model may suddenly change over time. BOCPD requires computation of the underlying model’s posterior predictives, which can only be computed online in O(1) time and memory for exponential family models. We develop variational approximations to the posterior on change point times (formulated as run lengths) for efficient inference when the underlying model is not in the exponential family, and does not have tractable posterior predictive distributions. In doing so, we develop improvements to online variational inference. We apply our methodology to a tracking problem using radar data with a signal-to-noise feature that is Rice distributed. We also develop a variational method for inferring the parameters of the (non-exponential family) Rice distribution. Change point detection has been applied to many applications [5; 7]. In recent years there have been great improvements to the Bayesian approaches via the Bayesian online change point detection algorithm (BOCPD) [1; 23; 27]. Likewise, the radar tracking community has been improving in its use of feature-aided tracking [10]: methods that use auxiliary information from radar returns such as signal-to-noise ratio (SNR), which depend on radar cross sections (RCS) [21]. Older systems would often filter only noisy position (and perhaps Doppler) measurements while newer systems use more information to improve performance. We use BOCPD for modeling the RCS feature. Whereas BOCPD inference could be done exactly when finding change points in conjugate exponential family models the physics of RCS measurements often causes them to be distributed in non-exponential family ways, often following a Rice distribution. To do inference efficiently we call upon variational Bayes (VB) to find approximate posterior (predictive) distributions. Furthermore, the nature of both BOCPD and tracking require the use of online updating. We improve upon the existing and limited approaches to online VB [24; 13]. This paper produces contributions to, and builds upon background from, three independent areas: change point detection, variational Bayes, and radar tracking. Although the emphasis in machine learning is on filtering, a substantial part of tracking with radar data involves data association, illustrated in Figure 1. Observations of radar returns contain measurements from multiple objects (targets) in the sky. If we knew which radar return corresponded to which target we would be presented with NT ∈ N0 independent filtering problems; Kalman filters [14] (or their nonlinear extensions) are applied to “average out” the kinematic errors in the measurements (typically positions) using the measurements associated with each target. The data association problem is to determine which measurement goes to which track. In the classical setup, once a particular measurement is associated with a certain target, that measurement is plugged into the filter for that target as if we knew with certainty it was the correct assignment. The association algorithms, in effect, find the maximum a posteriori (MAP) estimate on the measurement-to-track association. However, approaches such as the joint probabilistic data association (JPDA) filter [2] and the probability hypothesis density (PHD) filter [16] have deviated from this. 1 To find the MAP estimate a log likelihood of the data under each possible assignment vector a must be computed. These are then used to construct cost matrices that reduce the assignment problem to a particular kind of optimization problem (the details of which are beyond the scope of this paper). The motivation behind feature-aided tracking is that additional features increase the probability that the MAP measurement-to-track assignment is correct. Based on physical arguments the RCS feature (SNR) is often Rice distributed [21, Ch. 3]; although, in certain situations RCS is exponential or gamma distributed [26]. The parameters of the RCS distribution are determined by factors such as the shape of the aircraft facing the radar sensor. Given that different aircraft have different RCS characteristics, if one attempts to create a continuous track estimating the path of an aircraft, RCS features may help distinguish one aircraft from another if they cross paths or come near one another, for example. RCS also helps distinguish genuine aircraft returns from clutter: a flock of birds or random electrical noise, for example. However, the parameters of the RCS distributions may also change for the same aircraft due to a change in angle or ground conditions. These must be taken into account for accurate association. Providing good predictions in light of a possible sudden change in the parameters of a time series is “right up the alley” of BOCPD and change point methods. The original BOCPD papers [1; 11] studied sudden changes in the parameters of exponential family models for time series. In this paper, we expand the set of applications of BOCPD to radar SNR data which often has the same change point structure found in other applications, and requires online predictions. The BOCPD model is highly modular in that it looks for changes in the parameters of any underlying process model (UPM). The UPM merely needs to provide posterior predictive probabilities, the UPM can otherwise be a “black box.” The BOCPD queries the UPM for a prediction of the next data point under each possible run length, the number of points since the last change point. If (and only if by Hipp [12]) the UPM is exponential family (with a conjugate prior) the posterior is computed by accumulating the sufficient statistics since the last potential change point. This allows for O(1) UPM updates in both computation and memory as the run length increases. We motivate the use of VB for implementing UPMs when the data within a regime is believed to follow a distribution that is not exponential family. The methods presented in this paper can be used to find variational run length posteriors for general non-exponential family UPMs in addition to the Rice distribution. Additionally, the methods for improving online updating in VB (Section 2.2) are applicable in areas outside of change point detection. Likelihood clutter (birds) track 1 (747) track 2 (EMB 110) 0 5 10 15 20 SNR Figure 1: Illustrative example of a tracking scenario: The black lines (−) show the true tracks while the red stars (∗) show the state estimates over time for track 2 and the blue stars for track 1. The 95% credible regions on the states are shown as blue ellipses. The current (+) and previous (×) measurements are connected to their associated tracks via red lines. The clutter measurements (birds in this case) are shown with black dots (·). The distributions on the SNR (RCS) for each track (blue and red) and the clutter (black) are shown on the right. To our knowledge this paper is the first to demonstrate how to compute Bayesian posterior distributions on the parameters of a Rice distribution; the closest work would be Lauwers et al. [15], which computes a MAP estimate. Other novel factors of this paper include: demonstrating the usefulness (and advantages over existing techniques) of change point detection for RCS estimation and tracking; and applying variational inference for UPMs where analytic posterior predictives are not possible. This paper provides four main technical contributions: 1) VB inference for inferring the parameters of a Rice distribution. 2) General improvements to online VB (which is then applied to updating the UPM in BOCPD). 3) Derive a VB approximation to the run length posterior when the UPM posterior predictive is intractable. 4) Handle censored measurements (particularly for a Rice distribution) in VB. This is key for processing missed detections in data association. 2 1 Background In this section we briefly review the three areas of background: BOCPD, VB, and tracking. 1.1 Bayesian Online Change Point Detection We briefly summarize the model setup and notation for the BOCPD algorithm; see [27, Ch. 5] for a detailed description. We assume we have a time series with n observations so far y1 , . . . , yn ∈ Y. In effect, BOCPD performs message passing to do online inference on the run length rn ∈ 0:n − 1, the number of observations since the last change point. Given an underlying predictive model (UPM) and a hazard function h, we can compute an exact posterior over the run length rn . Conditional on a run length, the UPM produces a sequential prediction on the next data point using all the data since the last change point: p(yn |y(r) , Θm ) where (r) := (n − r):(n − 1). The UPM is a simpler model where the parameters θ change at every change point and are modeled as being sampled from a prior with hyper-parameters Θm . The canonical example of a UPM would be a Gaussian whose mean and variance change at every change point. The online updates are summarized as: P (rn |rn−1 ) p(yn |rn−1 , y(r) ) p(rn−1 , y1:n−1 ) . msgn := p(rn , y1:n ) = rn−1 hazard UPM (1) msgn−1 Unless rn = 0, the sum in (1) only contains one term since the only possibility is that rn−1 = rn −1. The indexing convention is such that if rn = 0 then yn+1 is the first observation sampled from the new parameters θ. The marginal posterior predictive on the next data point is easily calculated as: p(yn+1 |y1:n ) = p(yn+1 |y(r) )P (rn |y1:n ) . (2) rn Thus, the predictions from BOCPD fully integrate out any uncertainty in θ. The message updates (1) perform exact inference under a model where the number of change points is not known a priori. BOCPD RCS Model We show the Rice UPM as an example as it is required for our application. The data within a regime are assumed to be iid Rice observations, with a normal-gamma prior: yn ∼ Rice(ν, σ) , ν ∼ N (µ0 , σ 2 /λ0 ) , σ −2 =: τ ∼ Gamma(α0 , β0 ) (3) 2 =⇒ p(yn |ν, σ) = yn τ exp(−τ (yn + ν 2 )/2)I0 (yn ντ )I{yn ≥ 0} (4) where I0 (·) is a modified Bessel function of order zero, which is what excludes the Rice distribution from the exponential family. Although the normal-gamma is not conjugate to a Rice it will enable us to use the VB-EM algorithm. The UPM parameters are the Rice shape1 ν ∈ R and scale σ ∈ R+ , θ := {ν, σ}, and the hyper-parameters are the normal-gamma parameters Θm := {µ0 , λ0 , α0 , β0 }. Every change point results in a new value for ν and σ being sampled. A posterior on θ is maintained for each run length, i.e. every possible starting point for the current regime, and is updated at each new data point. Therefore, BOCPD maintains n distinct posteriors on θ, and although this can be reduced with pruning, it necessitates posterior updates on θ that are computationally efficient. Note that the run length updates in (1) require the UPM to provide predictive log likelihoods at all sample sizes rn (including zero). Therefore, UPM implementations using such approximations as plug-in MLE predictions will not work very well. The MLE may not even be defined for run lengths smaller than the number of UPM parameters |θ|. For a Rice UPM, the efficient O(1) updating in exponential family models by using a conjugate prior and accumulating sufficient statistics is not possible. This motivates the use of VB methods for approximating the UPM predictions. 1.2 Variational Bayes We follow the framework of VB where when computation of the exact posterior distribution p(θ|y1:n ) is intractable it is often possible to create a variational approximation q(θ) that is locally optimal in terms of the Kullback-Leibler (KL) divergence KL(q p) while constraining q to be in a certain family of distributions Q. In general this is done by optimizing a lower bound L(q) on the evidence log p(y1:n ), using either gradient based methods or standard fixed point equations. 1 The shape ν is usually assumed to be positive (∈ R+ ); however, there is nothing wrong with using a negative ν as Rice(x|ν, σ) = Rice(x|−ν, σ). It also allows for use of a normal-gamma prior. 3 The VB-EM Algorithm In many cases, such as the Rice UPM, the derivation of the VB fixed point equations can be simplified by applying the VB-EM algorithm [3]. VB-EM is applicable to models that are conjugate-exponential (CE) after being augmented with latent variables x1:n . A model is CE if: 1) The complete data likelihood p(x1:n , y1:n |θ) is an exponential family distribution; and 2) the prior p(θ) is a conjugate prior for the complete data likelihood p(x1:n , y1:n |θ). We only have to constrain the posterior q(θ, x1:n ) = q(θ)q(x1:n ) to factorize between the latent variables and the parameters; we do not constrain the posterior to be of any particular parametric form. Requiring the complete likelihood to be CE is a much weaker condition than requiring the marginal on the observed data p(y1:n |θ) to be CE. Consider a mixture of Gaussians: the model becomes CE when augmented with latent variables (class labels). This is also the case for the Rice distribution (Section 2.1). Like the ordinary EM algorithm [9] the VB-EM algorithm alternates between two steps: 1) Find the posterior of the latent variables treating the expected natural parameters η := Eq(θ) [η] as correct: ¯ q(xi ) ← p(xi |yi , η = η ). 2) Find the posterior of the parameters using the expected sufficient statis¯ ¯ tics S := Eq(x1:n ) [S(x1:n , y1:n )] as if they were the sufficient statistics for the complete data set: ¯ q(θ) ← p(θ|S(x1:n , y1:n ) = S). The posterior will be of the same exponential family as the prior. 1.3 Tracking In this section we review data association, which along with filtering constitutes tracking. In data association we estimate the association vectors a which map measurements to tracks. At each time NZ (n) step, n ∈ N1 , we observe NZ (n) ∈ N0 measurements, Zn = {zi,n }i=1 , which includes returns from both real targets and clutter (spurious measurements). Here, zi,n ∈ Z is a vector of kinematic measurements (positions in R3 , or R4 with a Doppler), augmented with an RCS component R ∈ R+ for the measured SNR, at time tn ∈ R. The assignment vector at time tn is such that an (i) = j if measurement i is associated with track j > 0; an (i) = 0 if measurement i is clutter. The inverse mapping a−1 maps tracks to measurements: meaning a−1 (an (i)) = i if an (i) = 0; and n n a−1 (i) = 0 ⇔ an (j) = i for all j. For example, if NT = 4 and a = [2 0 0 1 4] then NZ = 5, n Nc = 2, and a−1 = [4 1 0 5]. Each track is associated with at most one measurement, and vice-versa. In N D data association we jointly find the MAP estimate of the association vectors over a sliding window of the last N − 1 time steps. We assume we have NT (n) ∈ N0 total tracks as a known parameter: NT (n) is adjusted over time using various algorithms (see [2, Ch. 3]). In the generative process each track places a probability distribution on the next N − 1 measurements, with both kinematic and RCS components. However, if the random RCS R for a measurement is below R0 then it will not be observed. There are Nc (n) ∈ N0 clutter measurements from a Poisson process with λ := E[Nc (n)] (often with uniform intensity). The ordering of measurements in Zn is assumed to be uniformly random. For 3D data association the model joint p(Zn−1:n , an−1 , an |Z1:n−2 ) is: NT |Zi | n pi (za−1 (i),n , za−1 n n−1 i=1 (i),n−1 ) × λNc (i) exp(−λ)/|Zi |! i=n−1 p0 (zj,i )I{ai (j)=0} , (5) j=1 where pi is the probability of the measurement sequence under track i; p0 is the clutter distribution. The probability pi is the product of the RCS component predictions (BOCPD) and the kinematic components (filter); informally, pi (z) = pi (positions) × pi (RCS). If there is a missed detection, i.e. a−1 (i) = 0, we then use pi (za−1 (i),n ) = P (R < R0 ) under the RCS model for track i with no conn n tribution from positional (kinematic) component. Just as BOCPD allows any black box probabilistic predictor to be used as a UPM, any black box model of measurement sequences can used in (5). The estimation of association vectors for the 3D case becomes an optimization problem of the form: ˆ (ˆn−1 , an ) = argmax log P (an−1 , an |Z1:n ) = argmax log p(Zn−1:n , an−1 , an |Z1:n−2 ) , (6) a (an−1 ,an ) (an−1 ,an ) which is effectively optimizing (5) with respect to the assignment vectors. The optimization given in (6) can be cast as a multidimensional assignment (MDA) problem [2], which can be solved efficiently in the 2D case. Higher dimensional assignment problems, however, are NP-hard; approximate, yet typically very accurate, solvers must be used for real-time operation, which is usually required for tracking systems [20]. If a radar scan occurs at each time step and a target is not detected, we assume the SNR has not exceeded the threshold, implying 0 ≤ R < R0 . This is a (left) censored measurement and is treated differently than a missing data point. Censoring is accounted for in Section 2.3. 4 2 Online Variational UPMs We cover the four technical challenges for implementing non-exponential family UPMs in an efficient and online manner. We drop the index of the data point i when it is clear from context. 2.1 Variational Posterior for a Rice Distribution The Rice distribution has the property that x ∼ N (ν, σ 2 ) , y ∼ N (0, σ 2 ) =⇒ R = x2 + y 2 ∼ Rice(ν, σ) . (7) For simplicity we perform inference using R2 , as opposed to R, and transform accordingly: x ∼ N (ν, σ 2 ) , 1 R2 − x2 ∼ Gamma( 2 , τ ) , 2 τ := 1/σ 2 ∈ R+ =⇒ p(R2 , x) = p(R2 |x)p(x) = Gamma(R2 − x2 | 1 , τ )N (x|ν, σ 2 ) . 2 2 (8) The complete likelihood (8) is the product of two exponential family models and is exponential family itself, parameterized with base measure h and partition factor g: η = [ντ, −τ /2] , S = [x, R2 ] , h(R2 , x) = (2π R2 − x2 )−1 , g(ν, τ ) = τ exp(−ν 2 τ /2) . By inspection we see that the natural parameters η and sufficient statistics S are the same as a Gaussian with unknown mean and variance. Therefore, we apply the normal-gamma prior on (ν, τ ) as it is the conjugate prior for the complete data likelihood. This allows us to apply the VB-EM 2 algorithm. We use yi := Ri as the VB observation, not Ri as in (3). In (5), z·,· (end) is the RCS R. VB M-Step We derive the posterior updates to the parameters given expected sufficient statistics: n λ0 µ0 + i E[xi ] , λn = λ0 + n , αn = α0 + n , λ0 + n i=1 n n 1 1 nλ0 1 βn = β0 + (E[xi ] − x)2 + ¯ (¯ − µ0 )2 + x R2 − E[xi ]2 . 2 i=1 2 λ0 + n 2 i=1 i x := ¯ E[xi ]/n , µn = (9) (10) This is the same as an observation from a Gaussian and a gamma that share a (inverse) scale τ . 2 2 ¯ VB E-Step We then must find both expected sufficient statistics S. The expectation E[Ri |Ri ] = 2 2 Ri trivially; leaving E[xi |Ri ]. Recall that the joint on (x, y ) is a bivariate normal; if we constrain the radius to R, the angle ω will be distributed by a von Mises (VM) distribution. Therefore, ω := arccos(x/R) ∼ VM(0, κ) , κ = R E[ντ ] =⇒ E[x] = R E[cos ω] = RI1 (κ)/I0 (κ) , (11) where computing κ constitutes the VB E-step and we have used the trigonometric moment on ω [18]. This completes the computations required to do the VB updates on the Rice posterior. Variational Lower Bound For completeness, and to assess convergence, we derive the VB lower bound L(q). Using the standard formula [4] for L(q) = Eq [log p(y1:n , x1:n , θ)] + H[q] we get: n 2 1 E[log τ /2] − 1 E[τ ]Ri + (E[ντ ] − κi /Ri )E[xi ] − 2 E[ν 2 τ ] + log I0 (κi ) − KL(q p) , 2 (12) i=1 where p in the KL is the prior on (ν, τ ) which is easy to compute as q and p are both normal-gamma. Equivalently, (12) can be optimized directly instead of using the VB-EM updates. 2.2 Online Variational Inference In Section 2.1 we derived an efficient way to compute the variational posterior for a Rice distribution for a fixed data set. However, as is apparent from (1) we need online predictions from the UPM; we must be able to update the posterior one data point at a time. When the UPM is exponential family and we can compute the posterior exactly, we merely use the posterior from the previous step as the prior. However, since we are only computing a variational approximation to the posterior, using the previous posterior as the prior does not give the exact same answer as re-computing the posterior from batch. This gives two obvious options: 1) recompute the posterior from batch every update at O(n) cost or 2) use the previous posterior as the prior at O(1) cost and reduced accuracy. 5 The difference between the options is encapsulated by looking at the expected sufficient statistics: n ¯ S = i=1 Eq(xi |y1:n ) [S(xi , yi )]. Naive online updating uses old expected sufficient statistics whose n ¯ posterior effectively uses S = i=1 Eq(xi |y1:i ) [S(xi , yi )]. We get the best of both worlds if we adjust those estimates over time. We in fact can do this if we project the expected sufficient statistics into a “feature space” in terms of the expected natural parameters. For some function f , q(xi ) = p(xi |yi , η = η ) =⇒ Eq(xi |y1:n ) [S(xi , yi )] = f (yi , η ) . ¯ ¯ If f is piecewise continuous then we can represent it with an inner product [8, Sec. 2.1.6] n n ¯ f (yi , η ) = φ(¯) ψ(yi ) =⇒ S = ¯ η φ(¯) ψ(yi ) = φ(¯) η η ψ(yi ) , i=1 i=1 (13) (14) where an infinite dimensional φ and ψ may be required for exact representation, but can be approximated by a finite inner product. In the Rice distribution case we use (11) f (yi , η ) = E[xi ] = Ri I (Ri E[ντ ]) = Ri I ((Ri /µ0 ) µ0 E[ντ ]) , ¯ I (·) := I1 (·)/I0 (·) , (15) 2 Ri where recall that yi = and η1 = E[ντ ]. We can easily represent f with an inner product if we can ¯ represent I as an inner product: I (uv) = φ(u) ψ(v). We use unitless φi (u) = I (ci u) with c1:G as a log-linear grid from 10−2 to 103 and G = 50. We use a lookup table for ψ(v) that was trained to match I using non-negative least squares, which left us with a sparse lookup table. Online updating for VB posteriors was also developed in [24; 13]. These methods involved introducing forgetting factors to forget the contributions from old data points that might be detrimental to accuracy. Since the VB predictions are “embedded” in a change point method, they are automatically phased out if the posterior predictions become inaccurate making the forgetting factors unnecessary. 2.3 Censored Data As mentioned in Section 1.3, we must handle censored RCS observations during a missed detection. In the VB-EM framework we merely have to compute the expected sufficient statistics given the censored measurement: E[S|R < R0 ]. The expected sufficient statistic from (11) is now: R0 E[x|R < R0 ] = 0 ν ν E[x|R]p(R)dR RiceCDF (R0 |ν, τ ) = ν(1 − Q2 ( σ , R0 ))/(1 − Q1 ( σ , R0 )) , σ σ where QM is the Marcum Q function [17] of order M . Similar updates for E[S|R < R0 ] are possible for exponential or gamma UPMs, but are not shown as they are relatively easy to derive. 2.4 Variational Run Length Posteriors: Predictive Log Likelihoods Both updating the BOCPD run length posterior (1) and finding the marginal predictive log likelihood of the next point (2) require calculating the UPM’s posterior predictive log likelihood log p(yn+1 |rn , y(r) ). The marginal posterior predictive from (2) is used in data association (6) and benchmarking BOCPD against other methods. However, the exact posterior predictive distribution obtained by integrating the Rice likelihood against the VB posterior is difficult to compute. We can break the BOCPD update (1) into a time and measurement update. The measurement update corresponds to a Bayesian model comparison (BMC) calculation with prior p(rn |y1:n ): p(rn |y1:n+1 ) ∝ p(yn+1 |rn , y(r) )p(rn |y1:n ) . (16) Using the BMC results in Bishop [4, Sec. 10.1.4] we find a variational posterior on the run length by using the variational lower bound for each run length Li (q) ≤ log p(yn+1 |rn = i, y(r) ), calculated using (12), as a proxy for the exact UPM posterior predictive in (16). This gives the exact VB posterior if the approximating family Q is of the form: q(rn , θ, x) = qUPM (θ, x|rn )q(rn ) =⇒ q(rn = i) = exp(Li (q))p(rn = i|y1:n )/ exp(L(q)) , (17) where qUPM contains whatever constraints we used to compute Li (q). The normalizer on q(rn ) serves as a joint VB lower bound: L(q) = log i exp(Li (q))p(rn = i|y1:n ) ≤ log p(yn+1 |y1:n ). Note that the conditional factorization is different than the typical independence constraint on q. Furthermore, we derive the estimation of the assignment vectors a in (6) as a VB routine. We use a similar conditional constraint on the latent BOCPD variables given the assignment and constrain the assignment posterior to be a point mass. In the 2D assignment case, for example, ˆ q(an , X1:NT ) = q(X1:NT |an )q(an ) = q(X1:NT |an )I{an = an } , (18) 6 2 10 0 10 −1 10 −2 10 10 20 30 40 50 RCS RMSE (dBsm) RCS RMSE (dBsm) 10 KL (nats) 5 10 1 8 6 4 2 3 2 1 0 0 0 100 200 Sample Size (a) Online Updating 4 300 Time (b) Exponential RCS 400 0 100 200 300 400 Time (c) Rice RCS Figure 2: Left: KL from naive updating ( ), Sato’s method [24] ( ), and improved online VB (◦) to the batch VB posterior vs. sample size n; using a standard normal-gamma prior. Each curve represents a true ν in the generating Rice distribution: ν = 3.16 (red), ν = 10.0 (green), ν = 31.6 (blue) and τ = 1. Middle: The RMSE (dB scale) of the estimate on the mean RCS distribution E[Rn ] is plotted for an exponential RCS model. The curves are BOCPD (blue), IMM (black), identity (magenta), α-filter (green), and median filter (red). Right: Same as the middle but for the Rice RCS case. The dashed lines are 95% confidence intervals. where each track’s Xi represents all the latent variables used to compute the variational lower bound on log p(zj,n |an (j) = i). In the BOCPD case, Xi := {rn , x, θ}. The resulting VB fixed point ˆ equations find the posterior on the latent variables Xi by taking an as the true assignment and solving ˆ the VB problem of (17); the assignment an is found by using (6) and taking the joint BOCPD lower bound L(q) as a proxy for the BOCPD predictive log likelihood component of log pi in (5). 3 3.1 Results Improved Online Solution We first demonstrate the accuracy of the online VB approximation (Section 2.2) on a Rice estimation example; here, we only test the VB posterior as no change point detection is applied. Figure 2(a) compares naive online updating, Sato’s method [24], and our improved online updating in KL(online batch) of the posteriors for three different true parameters ν as sample size n increases. The performance curves are the KL divergence between these online approximations to the posterior and the batch VB solution (i.e. restarting VB from “scratch” every new data point) vs sample size. The error for our method stays around a modest 10−2 nats while naive updating incurs large errors of 1 to 50 nats [19, Ch. 4]. Sato’s method tends to settle in around a 1 nat approximation error. The recommended annealing schedule, i.e. forgetting factors, in [24] performed worse than naive updating. We did a grid search over annealing exponents and show the results for the best performing schedule of n−0.52 . By contrast, our method does not require the tuning of an annealing schedule. 3.2 RCS Estimation Benchmarking We now compare BOCPD with other methods for RCS estimation. We use the same experimental example as Slocumb and Klusman III [25], which uses an augmented interacting multiple model (IMM) based method for estimating the RCS; we also compare against the same α-filter and median filter used in [25]. As a reference point, we also consider the “identity filter” which is merely an unbiased filter that uses only yn to estimate the mean RCS E[Rn ] at time step n. We extend this example to look at Rice RCS in addition to the exponential RCS case. The bias correction constants in the IMM were adjusted for the Rice distribution case as per [25, Sec. 3.4]. The results on exponential distributions used in [25] and the Rice distribution case are shown in Figures 2(b) and 2(c). The IMM used in [25] was hard-coded to expect jumps in the SNR of multiples of ±10 dB, which is exactly what is presented in the example (a sequence of 20, 10, 30, and 10 dB). In [25] the authors mention that the IMM reaches an RMSE “floor” at 2 dB, yet BOCPD continues to drop as low as 0.56 dB. The RMSE from BOCPD does not spike nearly as high as the other methods upon a change in E[Rn ]. The α-filter and median filter appear worse than both the IMM and BOCPD. The RMSE and confidence intervals are calculated from 5000 runs of the experiment. 7 45 80 40 30 Northing (km) Improvement (%) 35 25 20 15 10 5 60 40 20 0 0 −5 1 2 3 4 −20 5 Difficulty 0 20 40 60 80 100 Easting (km) (a) SIAP Metrics (b) Heathrow (LHR) Figure 3: Left: Average relative improvements (%) for SIAP metrics: position accuracy (red ), velocity accuracy (green ), and spurious tracks (blue ◦) across difficulty levels. Right: LHR: true trajectories shown as black lines (−), estimates using a BOCPD RCS model for association shown as blue stars (∗), and the standard tracker as red circles (◦). The standard tracker has spurious tracks over east London and near Ipswich. Background map data: Google Earth (TerraMetrics, Data SIO, NOAA, U.S. Navy, NGA, GEBCO, Europa Technologies) 3.3 Flightradar24 Tracking Problem Finally, we used real flight trajectories from flightradar24 and plugged them into our 3D tracking algorithm. We compare tracking performance between using our BOCPD model and the relatively standard constant probability of detection (no RCS) [2, Sec. 3.5] setup. We use the single integrated air picture (SIAP) metrics [6] to demonstrate the improved performance of the tracking. The SIAP metrics are a standard set of metrics used to compare tracking systems. We broke the data into 30 regions during a one hour period (in Sept. 2012) sampled every 5 s, each within a 200 km by 200 km area centered around the world’s 30 busiest airports [22]. Commercial airport traffic is typically very orderly and does not allow aircraft to fly close to one another or cross paths. Feature-aided tracking is most necessary in scenarios with a more chaotic air situation. Therefore, we took random subsets of 10 flight paths and randomly shifted their start time to allow for scenarios of greater interest. The resulting SIAP metric improvements are shown in Figure 3(a) where we look at performance by a difficulty metric: the number of times in a scenario any two aircraft come within ∼400 m of each other. The biggest improvements are seen for difficulties above three where positional accuracy increases by 30%. Significant improvements are also seen for velocity accuracy (11%) and the frequency of spurious tracks (6%). Significant performance gains are seen at all difficulty levels considered. The larger improvements at level three over level five are possibly due to some level five scenarios that are not resolvable simply through more sophisticated models. We demonstrate how our RCS methods prevent the creation of spurious tracks around London Heathrow in Figure 3(b). 4 Conclusions We have demonstrated that it is possible to use sophisticated and recent developments in machine learning such as BOCPD, and use the modern inference method of VB, to produce demonstrable improvements in the much more mature field of radar tracking. We first closed a “hole” in the literature in Section 2.1 by deriving variational inference on the parameters of a Rice distribution, with its inherent applicability to radar tracking. In Sections 2.2 and 2.4 we showed that it is possible to use these variational UPMs for non-exponential family models in BOCPD without sacrificing its modular or online nature. The improvements in online VB are extendable to UPMs besides a Rice distribution and more generally beyond change point detection. We can use the variational lower bound from the UPM and obtain a principled variational approximation to the run length posterior. Furthermore, we cast the estimation of the assignment vectors themselves as a VB problem, which is in large contrast to the tracking literature. More algorithms from the tracking literature can possibly be cast in various machine learning frameworks, such as VB, and improved upon from there. 8 References [1] Adams, R. P. and MacKay, D. J. (2007). Bayesian online changepoint detection. Technical report, University of Cambridge, Cambridge, UK. [2] Bar-Shalom, Y., Willett, P., and Tian, X. (2011). Tracking and Data Fusion: A Handbook of Algorithms. YBS Publishing. [3] Beal, M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. In Bayesian Statistics, volume 7, pages 453–464. [4] Bishop, C. M. (2007). Pattern Recognition and Machine Learning. Springer. [5] Braun, J. V., Braun, R., and M¨ ller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with u application to DNA sequence segmentation. Biometrika, 87(2):301–314. [6] Byrd, E. (2003). Single integrated air picture (SIAP) attributes version 2.0. Technical Report 2003-029, DTIC. [7] Chen, J. and Gupta, A. (1997). Testing and locating variance changepoints with application to stock prices. Journal of the Americal Statistical Association, 92(438):739–747. [8] Courant, R. and Hilbert, D. (1953). Methods of Mathematical Physics. Interscience. [9] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38. [10] Ehrman, L. M. and Blair, W. D. (2006). Comparison of methods for using target amplitude to improve measurement-to-track association in multi-target tracking. In Information Fusion, 2006 9th International Conference on, pages 1–8. IEEE. [11] Fearnhead, P. and Liu, Z. (2007). Online inference for multiple changepoint problems. Journal of the Royal Statistical Society, Series B, 69(4):589–605. [12] Hipp, C. (1974). Sufficient statistics and exponential families. The Annals of Statistics, 2(6):1283–1292. [13] Honkela, A. and Valpola, H. (2003). On-line variational Bayesian learning. In 4th International Symposium on Independent Component Analysis and Blind Signal Separation, pages 803–808. [14] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME — Journal of Basic Engineering, 82(Series D):35–45. [15] Lauwers, L., Barb´ , K., Van Moer, W., and Pintelon, R. (2009). Estimating the parameters of a Rice e distribution: A Bayesian approach. In Instrumentation and Measurement Technology Conference, 2009. I2MTC’09. IEEE, pages 114–117. IEEE. [16] Mahler, R. (2003). Multi-target Bayes filtering via first-order multi-target moments. IEEE Trans. AES, 39(4):1152–1178. [17] Marcum, J. (1950). Table of Q functions. U.S. Air Force RAND Research Memorandum M-339, Rand Corporation, Santa Monica, CA. [18] Mardia, K. V. and Jupp, P. E. (2000). Directional Statistics. John Wiley & Sons, New York. [19] Murray, I. (2007). Advances in Markov chain Monte Carlo methods. PhD thesis, Gatsby computational neuroscience unit, University College London, London, UK. [20] Poore, A. P., Rijavec, N., Barker, T. N., and Munger, M. L. (1993). Data association problems posed as multidimensional assignment problems: algorithm development. In Optical Engineering and Photonics in Aerospace Sensing, pages 172–182. International Society for Optics and Photonics. [21] Richards, M. A., Scheer, J., and Holm, W. A., editors (2010). Principles of Modern Radar: Basic Principles. SciTech Pub. [22] Rogers, S. (2012). The world’s top 100 airports: listed, ranked and mapped. The Guardian. [23] Saatci, Y., Turner, R., and Rasmussen, C. E. (2010). Gaussian process change point models. In 27th ¸ International Conference on Machine Learning, pages 927–934, Haifa, Israel. Omnipress. [24] Sato, M.-A. (2001). Online model selection based on the variational Bayes. Neural Computation, 13(7):1649–1681. [25] Slocumb, B. J. and Klusman III, M. E. (2005). A multiple model SNR/RCS likelihood ratio score for radar-based feature-aided tracking. In Optics & Photonics 2005, pages 59131N–59131N. International Society for Optics and Photonics. [26] Swerling, P. (1954). Probability of detection for fluctuating targets. Technical Report RM-1217, Rand Corporation. [27] Turner, R. (2011). Gaussian Processes for State Space Models and Change Point Detection. PhD thesis, University of Cambridge, Cambridge, UK. 9
5 0.71586734 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
Author: George H. Chen, Stanislav Nikolov, Devavrat Shah
Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1
6 0.71211028 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
7 0.71105313 91 nips-2013-Dirty Statistical Models
8 0.70963353 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
9 0.7093671 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
10 0.70801806 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
11 0.70392865 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
12 0.70215976 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
13 0.70096785 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
14 0.70035386 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
15 0.69823551 331 nips-2013-Top-Down Regularization of Deep Belief Networks
16 0.69822359 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
18 0.69659573 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
19 0.6964584 109 nips-2013-Estimating LASSO Risk and Noise Level
20 0.69640827 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit