nips nips2012 nips2012-118 knowledge-graph by maker-knowledge-mining

118 nips-2012-Entangled Monte Carlo


Source: pdf

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). [sent-5, score-0.179]

2 EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. [sent-6, score-0.966]

3 In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. [sent-7, score-0.486]

4 We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. [sent-8, score-1.038]

5 We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. [sent-9, score-0.595]

6 The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. [sent-10, score-0.744]

7 We are specifically interested in parallel Monte Carlo algorithms that scale not only in scientificcomputing clusters, where node communication is fast and cheap, but also in situations where communication between nodes is limited by a combination of latency, throughput, and cost. [sent-13, score-0.182]

8 Our main contribution is a method, Entangled Monte Carlo simulation (EMC), for carrying out SMC simulation in a cluster with a communication cost per particle independent of the problem size. [sent-17, score-0.547]

9 These desirable characteristics are achieved by limiting the contents of inter-node transmission to summary statistics on the particle weights. [sent-19, score-0.444]

10 We show that our summary statistics are sufficient, in the sense that they can be used in combination with the particle genealogy to quickly reconstruct any particle in any node of the cluster. [sent-21, score-0.899]

11 We will illustrate the advantage of particle reconstruction versus network transmission in the context of phylogenetic inference, a well known example where Monte Carlo simulation is both important 1 and challenging. [sent-22, score-0.737]

12 In the case of the SMC sampler from [2], the cost of transmitting one particle is proportional to the product of the number of species under study, times the number of sites in the sequences, times the number of characters possible at each site. [sent-23, score-0.455]

13 We also introduce the algorithms needed to do these reconstructions efficiently while maintaining a distributed representation of the particle genealogies. [sent-24, score-0.369]

14 For SMC, most of the work has been on parallelization of the proposal steps [6], which is sufficient in setups such as GPU parallelization where communication between computing units is fast and cheap. [sent-30, score-0.365]

15 However in generic clusters or peer-to-peer architectures, we argue that our more efficient parallelization of the resampling step is advantageous. [sent-31, score-0.233]

16 Another popular MCMC parallelization method is parallel tempering [10], where auxiliary chains are added to enable faster exploration of the space by swapping states in different chains. [sent-36, score-0.245]

17 While parallel tempering has a low communication cost independent of the inference problem size, the additional gain of parallelism can quickly decrease as more chains are added because many swaps are needed to get from the most heated chain to the main chain. [sent-37, score-0.212]

18 We start by reviewing stochastic maps in the context of a Markov chain, where it was first introduced to design perfect simulation algorithms. [sent-46, score-0.166]

19 Sr ∩Ss for all r ￿= s (in phylogenetics, this holds since a forest with r trees cannot be a forest with s trees when r ￿= s). [sent-75, score-0.212]

20 The set of partial states of rank R should coincide with the target space, SR = X (in phylogenetics, at rank R = |X| − 1, forests have a single tree and are members of the target space X ). [sent-85, score-0.253]

21 In order to grow particles from one rank to the next, the user needs to specify a proposal probability kernel ν + . [sent-88, score-0.645]

22 Given an initial partial state s and a set of destination partial states A, we denote the probability of proposing (a) (b) + an element in A from s by νs (A). [sent-89, score-0.223]

23 The weighted particles induce a distribution on S defined by: K πr,K (A) ∝ wr,k δsr,k (A), (1) k=1 where A ∈ FS is an event, and δx (A) = 1 if x ∈ A and 0 otherwise. [sent-106, score-0.522]

24 Given the list of particles and weights from the previous generation r − 1, a new list is created in three steps. [sent-112, score-0.727]

25 This is done by sampling independently K times from the weighted particles distribution πr,K defined above. [sent-114, score-0.522]

26 The result of this step is that some of the particles (mostly those of low weight) will be pruned. [sent-115, score-0.522]

27 , sr,K , by extending the ˜ ˜ partial states of each of the sampled particles from the previous iteration. [sent-123, score-0.607]

28 , sR = t, where an X -forest sr = {(ti , Xi )} is a collection of rooted Xi -trees ti such that the disjoint union of leaves of the trees in the forest is equal to the original set of leaves, ￿i Xi = X . [sent-147, score-0.258]

29 Sr ∩ Ss for all r ￿= s (in phylogenetics, this holds since a forest with r trees cannot 1 be a forest with s trees when r ￿= s). [sent-152, score-0.212]

30 11: 12: 13: 14: reconstruct(s, ρ, i, F Algorithm 2 : {Fi : i ∈ I}) 3 The set of partial states of rank R should coincide with the target space, SR = X (in phylogenetics, at rank R = |X | − 1, forests have a single tree and are members of the target space X ). [sent-156, score-0.253]

31 Figure 2: Illustration of compact particles (blue), concrete particles (black), and discarded particles (grey). [sent-159, score-1.648]

32 In order to grow particles from one rank to the next, the user needs to specify a proposal probability kernel ν + . [sent-160, score-0.645]

33 Given an initial partial state s and a set of destination partial states A, we denote the probability of proposing an element in A from s by + νs (A). [sent-161, score-0.223]

34 In the discrete case, we abuse the notation 14 for only a subset Ir of the particles indices {1, . [sent-162, score-0.522]

35 If roughly all particles were resampled exactly once, we would be able to assign to each machine the same indices as the previous iteration, avoiding communication. [sent-169, score-0.555]

36 Instead, a small number of particles is often resampled a large number of times while many others have no offspring. [sent-171, score-0.555]

37 This raises an important question: how can a machine compute a proposal if the particle from which to propose was itself computed by a different machine? [sent-173, score-0.414]

38 The naive approach would consist in transmitting the ‘missing’ particles over the network. [sent-174, score-0.559]

39 However, even if basic optimizations are used (for example sending particles with multiplicities only once), we show in Section 4 that this transfer can be slow in practice. [sent-175, score-0.559]

40 Instead, our approach relies on a combination of the stochastic maps with the particle genealogy to reconstruct the particle. [sent-176, score-0.653]

41 First, note that the resampling step in SMC algorithms induces a one-to-many relationship between the particle in generation r and those in generation r − 1. [sent-178, score-0.682]

42 This relationship is called the particle genealogy, illustrated in Figure 1 (b). [sent-179, score-0.347]

43 Formally, a genealogy is a directed graph where nodes are particles sr,k , r ∈ {1, . [sent-180, score-0.691]

44 , K}, and where node sr−1,k is deemed the parent of node sr,k if the latter was obtained by resampling sr−1,k = sr−1,k followed by proposing sr,k ˜ from sr−1,k . [sent-186, score-0.192]

45 Each machine also maintains a hashtable holding the particles held in memory in the reference machine s : I → S ∪ {nil} (the value nil represent a particle not currently represented explicitly in the reference machine). [sent-188, score-1.049]

46 Algorithm 2 shows that this information, s, ρ, F , is sufficient to instantiate any query particle (indexed by i in the pseudo-code). [sent-189, score-0.347]

47 Finally, how can we do resampling and particle allocation in this distributed framework? [sent-194, score-0.516]

48 1 Allocation and resampling In SMC algorithms, the weights are periodically used for resampling the particles, a step also known as the bootstrapping stage and denoted by resample in Algorithm 1. [sent-197, score-0.295]

49 With each machine having the full information of the weights in the current iteration, they can each perform a standard, global resampling step without further communication. [sent-199, score-0.146]

50 In most cases of interest, each machine can transmit all the individual weights of its particles and to communicate it with every other machine (either via a central server, or a decentralized scheme such as [13]) without becoming the bottleneck. [sent-200, score-0.571]

51 Extreme cases, where even the list of weights alone is too large to transmit, can also be handled by transmitting only the sum of the weights of each machine, and using a distributed hashtable [13] to represent the genealogy. [sent-201, score-0.174]

52 Once the resampling step determines which particles survive to the next generation, the next step is to determine allocation of particles to machines. [sent-204, score-1.239]

53 , AM } be the set of partition of particles {1, . [sent-209, score-0.522]

54 , K} at generation r and let cm denote r r the maximum number of particles that can be processed by machine m. [sent-212, score-0.671]

55 We propose greedy methods where each machine m m ˜m retains as many particles from Ir−1 as possible. [sent-219, score-0.522]

56 Let Ir ⊆ Ir−1 be the set of particles resampled m ˜r | − cm > 0, this machine is in surplus of particles. [sent-220, score-0.66]

57 If |I ˜m greedy schemes to allocate the surplus of particles over to machines m = m, where cm −|Ir | > 0. [sent-222, score-0.715]

58 The surplus particles are allocated according to this list. [sent-224, score-0.586]

59 MostAvailable: attempts to allocate the surplus particles to machines with the most capacity as ˜m defined by cm − Ir . [sent-225, score-0.715]

60 The intention is that the particles are mixed well over different machines so that the reconstruction algorithm rarely traces back the genealogy to the root ancestor. [sent-227, score-0.862]

61 2 Genealogy In this section, we argue that for the purpose of reconstruction, only a sparse subset of the genealogy needs to be represented at any given iteration and machine. [sent-229, score-0.191]

62 The key idea is that if a particle has no descendant in the current generation, storing its parent is not necessary. [sent-230, score-0.382]

63 In practice, we observed that the vast majority of the ancestral particles have this property. [sent-231, score-0.522]

64 First, it is useful to draw a distinction between concrete particles, with s(i) = nil, and compact particles, which are particles implicitly represented via an integer (the parent of the particle), and are therefore considerably more spaceefficient. [sent-234, score-0.639]

65 For example, in the smallest phylogenetic example considered in Section 4, a compact particle occupies about 50, 000 times less memory than a concrete particle. [sent-235, score-0.563]

66 Whereas a concrete particle can grow in size as the problem size increases, a compact particle size is fixed. [sent-236, score-0.776]

67 Update after resampling: Any lineage (path in ρ) that did not survive the resampling stage no longer needs to be maintained. [sent-239, score-0.195]

68 The greyed out particles will never 5 be reconstructed in the future generation so they are no longer maintained. [sent-241, score-0.663]

69 Update after reconstruction: Once a particle is reconstructed, the lineage of the reconstructed particle can be updated. [sent-243, score-0.755]

70 Let j be the particle that is reconstructed at generation r. [sent-244, score-0.488]

71 At any future generation r > r, the reconstruction algorithm will only trace up to j (as s(j) = nil), and hence all its parent can be discarded. [sent-245, score-0.237]

72 If we assumed the weight function α to be constant, the genealogy induced by resampling can be viewed as a Wright-Fisher model [14, 15], which is well approximated by the coalescent when the number of particles is large. [sent-248, score-0.868]

73 3 Compact representation of the stochastic maps The cardinality of the set of the stochastic maps F = {Fi : i ∈ I} grows proportionally to the number of particles K time the number of generations R. [sent-253, score-0.763]

74 Random access of random numbers is an unusual requirement imposed by the genealogy reconstruction algorithm. [sent-257, score-0.309]

75 For example by doing so for every particle generation, we get a faster access time of O(K) and a larger space requirement of O(R). [sent-260, score-0.371]

76 Then we show results on the task of Bayesian phylogenetic inference, a challenging domain where massively parallel simulation is likely to have an impact for practitioners—running phylogenetic MCMC chains for months is not uncommon. [sent-267, score-0.445]

77 To keep the exposition self-contained, we include a review of the phylogenetic SMC techniques we used. [sent-268, score-0.158]

78 1 Experimental setup Given a collection of biological sequences for different species (taxa), Bayesian phylogenetics aims to compute expectations under a posterior distribution over phylogenetic trees, which represent the relationship among the species under study [16]. [sent-270, score-0.435]

79 For intermediate to large numbers of species, Bayesian phylogenetic inference via SMC requires a large number of particles to achieve an accurate estimate. [sent-271, score-0.702]

80 6 In the following section, we use the phylogenetic SMC algorithm described in [2], where particles are proposed using a proposal with density ν(s → s ). [sent-273, score-0.723]

81 Starting from a fully disconnected forest over the species, ν picks one pair of trees in the forest at random, and forms a new tree by connecting their roots. [sent-274, score-0.234]

82 Under weak conditions described in [11], the following weight update yields a consistent estimator for the posterior over phylogenies: α(˜r−1 → sr ) ← s γ(sr ) 1 · , γ(˜r−1 ) ν(˜r−1 → sr ) s s where γ is an unnormalized density over forests. [sent-275, score-0.304]

83 The dataset is a synthetic phylogenetics data with 20 taxa and 1000 sites. [sent-285, score-0.271]

84 3 Speed-up results In this section, we show experimental results where we measure the speed-up of an EMC algorithm on two sets of phylogenetics data by counting the number of times the maps Fi are applied. [sent-289, score-0.191]

85 The question we explore here is how deep the reconstruction algorithm has to trace back, or more precisely, how many times a parallelized version of our algorithm applies maps Fi compared to the number of times the equivalent operation is performed in the serial version of SMC. [sent-290, score-0.209]

86 We denote N1 to be the number of times the proposal function is applied in serial SMC, and NM to be the number of stochastic maps applied in our algorithm ran on M machines. [sent-291, score-0.197]

87 In both subsets, we found a substantial speedup, suggesting that deep reconstruction was rarely needed in practice. [sent-294, score-0.145]

88 We also performed an experiment on synthetic data generated with 20 taxa and 1000 sites that parallelization using EMC is beneficial in the corner case when the weights are all equal. [sent-297, score-0.285]

89 This is an allocation method where particles are allocated at random, which disregards the greedy method suggested in Section 3. [sent-299, score-0.572]

90 4 Timing results An SMC algorithm can easily be distributed over multiple machines by relying naively on particle transmission between machines over the network. [sent-303, score-0.54]

91 In this section, we compared the particle transmission time to reconstruction time of EMC on Amazon EC2 micro instances. [sent-304, score-0.538]

92 (b) Sample generation time including reconstruction time (black), reconstruction time (blue), and particle transfer time (red) by generation. [sent-307, score-0.68]

93 Here, we ran SMC algorithm for 100 generations and measured the total run time of the EMC algorithm and an SMC algorithm parallelized via explicit particle transfer—see Figure 4 (a). [sent-310, score-0.408]

94 We fixed the number of particles per machine at 100 and produced a sequence of experiments by doubling the number of machines and hence the number of particles at each step. [sent-311, score-1.092]

95 In Figure 4 (b), we show the reconstruction time, the sample generation time (which includes the reconstruction time), and the particle transmission time by generation. [sent-312, score-0.74]

96 As expected, the particle transmission is the bottleneck to the SMC algorithm whereas the reconstruction time is stable, which verifies that the reconstruction algorithm rarely traced deep. [sent-313, score-0.661]

97 The total timing result in Figure 4 (a) shows that the overhead arising from increasing the number of particles (or increasing the number of machines) is much smaller compared to the particle transmission method. [sent-314, score-0.997]

98 The breakdown of time by generation in Figure 4 (b) shows that the particle transmission time is volatile as it depends on the network latency and throughput. [sent-315, score-0.552]

99 The new method requires only a small amount of data communication over the network, of size per particle independent of the scale of the inference problem. [sent-318, score-0.441]

100 On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. [sent-366, score-0.155]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('particles', 0.522), ('smc', 0.418), ('particle', 0.347), ('emc', 0.24), ('genealogy', 0.169), ('sr', 0.152), ('taxa', 0.144), ('phylogenetic', 0.134), ('phylogenetics', 0.127), ('resampling', 0.119), ('parallelization', 0.114), ('generation', 0.108), ('ir', 0.104), ('transmission', 0.097), ('reconstruction', 0.094), ('nil', 0.08), ('species', 0.071), ('communication', 0.07), ('proposal', 0.067), ('monte', 0.067), ('forest', 0.065), ('carlo', 0.065), ('simulation', 0.065), ('rrna', 0.064), ('surplus', 0.064), ('maps', 0.064), ('fs', 0.062), ('speedup', 0.059), ('coalescent', 0.058), ('mcmc', 0.053), ('entangled', 0.052), ('allocation', 0.05), ('partial', 0.05), ('concrete', 0.049), ('machines', 0.048), ('actinobacteria', 0.048), ('hashtable', 0.048), ('massively', 0.048), ('doucet', 0.047), ('parallel', 0.042), ('trees', 0.041), ('cm', 0.041), ('allocate', 0.04), ('generations', 0.039), ('fi', 0.039), ('proposing', 0.038), ('transfer', 0.037), ('transmitting', 0.037), ('stochastic', 0.037), ('reconstruct', 0.036), ('list', 0.035), ('parent', 0.035), ('states', 0.035), ('target', 0.035), ('rank', 0.034), ('resampled', 0.033), ('disconnected', 0.033), ('compact', 0.033), ('reconstructed', 0.033), ('sequences', 0.032), ('boinc', 0.032), ('firstopen', 0.032), ('mostavailable', 0.032), ('tempering', 0.032), ('timing', 0.031), ('resample', 0.03), ('tree', 0.03), ('rarely', 0.029), ('serial', 0.029), ('datastructure', 0.028), ('lineage', 0.028), ('moral', 0.028), ('poset', 0.028), ('ribosomal', 0.028), ('weights', 0.027), ('state', 0.026), ('bayesian', 0.026), ('reference', 0.026), ('survive', 0.026), ('nm', 0.025), ('proposals', 0.025), ('wr', 0.024), ('destination', 0.024), ('init', 0.024), ('rna', 0.024), ('subsumed', 0.024), ('keep', 0.024), ('inference', 0.024), ('access', 0.024), ('fn', 0.023), ('chains', 0.022), ('del', 0.022), ('transmit', 0.022), ('exchange', 0.022), ('parallelized', 0.022), ('server', 0.022), ('sequential', 0.022), ('numbers', 0.022), ('needs', 0.022), ('needed', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

2 0.33680218 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

3 0.21439333 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Author: Mijung Park, Jonathan W. Pillow

Abstract: Active learning methods can dramatically improve the yield of neurophysiology experiments by adaptively selecting stimuli to probe a neuron’s receptive field (RF). Bayesian active learning methods specify a posterior distribution over the RF given the data collected so far in the experiment, and select a stimulus on each time step that maximally reduces posterior uncertainty. However, existing methods tend to employ simple Gaussian priors over the RF and do not exploit uncertainty at the level of hyperparameters. Incorporating this uncertainty can substantially speed up active learning, particularly when RFs are smooth, sparse, or local in space and time. Here we describe a novel framework for active learning under hierarchical, conditionally Gaussian priors. Our algorithm uses sequential Markov Chain Monte Carlo sampling (“particle filtering” with MCMC) to construct a mixture-of-Gaussians representation of the RF posterior, and selects optimal stimuli using an approximate infomax criterion. The core elements of this algorithm are parallelizable, making it computationally efficient for real-time experiments. We apply our algorithm to simulated and real neural data, and show that it can provide highly accurate receptive field estimates from very limited data, even with a small number of hyperparameter samples. 1

4 0.11909653 205 nips-2012-MCMC for continuous-time discrete-state systems

Author: Vinayak Rao, Yee W. Teh

Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1

5 0.1018164 11 nips-2012-A Marginalized Particle Gaussian Process Regression

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

6 0.0972303 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

7 0.082708165 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

8 0.082109608 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

9 0.069926232 126 nips-2012-FastEx: Hash Clustering with Exponential Families

10 0.058191728 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

11 0.05809718 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

12 0.056684792 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

13 0.056347027 128 nips-2012-Fast Resampling Weighted v-Statistics

14 0.053017028 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

15 0.05247939 204 nips-2012-MAP Inference in Chains using Column Generation

16 0.05171315 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

17 0.050115455 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

18 0.044631876 182 nips-2012-Learning Networks of Heterogeneous Influence

19 0.043757007 260 nips-2012-Online Sum-Product Computation Over Trees

20 0.043711372 232 nips-2012-Multiplicative Forests for Continuous-Time Processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.139), (1, 0.016), (2, 0.004), (3, 0.057), (4, -0.099), (5, -0.041), (6, -0.012), (7, -0.01), (8, -0.008), (9, -0.014), (10, -0.11), (11, -0.075), (12, -0.006), (13, -0.045), (14, -0.091), (15, -0.049), (16, -0.063), (17, -0.074), (18, 0.075), (19, -0.069), (20, 0.072), (21, 0.109), (22, -0.067), (23, -0.056), (24, -0.107), (25, -0.024), (26, -0.21), (27, 0.036), (28, 0.082), (29, -0.201), (30, 0.034), (31, -0.117), (32, 0.063), (33, 0.041), (34, 0.025), (35, 0.091), (36, -0.143), (37, 0.142), (38, 0.001), (39, 0.09), (40, 0.169), (41, 0.017), (42, 0.073), (43, 0.085), (44, -0.039), (45, 0.062), (46, 0.109), (47, 0.004), (48, 0.108), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94380754 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

2 0.75060892 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

3 0.69463325 205 nips-2012-MCMC for continuous-time discrete-state systems

Author: Vinayak Rao, Yee W. Teh

Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1

4 0.53986192 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

Author: Simon Lyons, Amos J. Storkey, Simo Särkkä

Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1

5 0.48382285 11 nips-2012-A Marginalized Particle Gaussian Process Regression

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

6 0.45244205 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

7 0.44533911 128 nips-2012-Fast Resampling Weighted v-Statistics

8 0.4242782 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

9 0.37270117 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

10 0.37211278 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

11 0.35378867 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

12 0.35137838 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

13 0.34792808 39 nips-2012-Analog readout for optical reservoir computers

14 0.32439384 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

15 0.30935773 182 nips-2012-Learning Networks of Heterogeneous Influence

16 0.30745813 126 nips-2012-FastEx: Hash Clustering with Exponential Families

17 0.30740631 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

18 0.29361463 233 nips-2012-Multiresolution Gaussian Processes

19 0.27833995 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

20 0.27365381 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.046), (17, 0.02), (21, 0.022), (38, 0.085), (42, 0.032), (54, 0.015), (55, 0.025), (60, 0.278), (74, 0.06), (76, 0.18), (80, 0.087), (92, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79633904 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1

2 0.77220517 201 nips-2012-Localizing 3D cuboids in single-view images

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1

3 0.71543479 217 nips-2012-Mixability in Statistical Learning

Author: Tim V. Erven, Peter Grünwald, Mark D. Reid, Robert C. Williamson

Abstract: Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worst-case regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of log-loss, stochastic mixability reduces to a well-known (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1-loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability. 1

4 0.68710297 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

Author: Xue-xin Wei, Alan Stocker

Abstract: A common challenge for Bayesian models of perception is the fact that the two fundamental Bayesian components, the prior distribution and the likelihood function, are formally unconstrained. Here we argue that a neural system that emulates Bayesian inference is naturally constrained by the way it represents sensory information in populations of neurons. More specifically, we show that an efficient coding principle creates a direct link between prior and likelihood based on the underlying stimulus distribution. The resulting Bayesian estimates can show biases away from the peaks of the prior distribution, a behavior seemingly at odds with the traditional view of Bayesian estimation, yet one that has been reported in human perception. We demonstrate that our framework correctly accounts for the repulsive biases previously reported for the perception of visual orientation, and show that the predicted tuning characteristics of the model neurons match the reported orientation tuning properties of neurons in primary visual cortex. Our results suggest that efficient coding is a promising hypothesis in constraining Bayesian models of perceptual inference. 1 Motivation Human perception is not perfect. Biases have been observed in a large number of perceptual tasks and modalities, of which the most salient ones constitute many well-known perceptual illusions. It has been suggested, however, that these biases do not reflect a failure of perception but rather an observer’s attempt to optimally combine the inherently noisy and ambiguous sensory information with appropriate prior knowledge about the world [13, 4, 14]. This hypothesis, which we will refer to as the Bayesian hypothesis, has indeed proven quite successful in providing a normative explanation of perception at a qualitative and, more recently, quantitative level (see e.g. [15]). A major challenge in forming models based on the Bayesian hypothesis is the correct selection of two main components: the prior distribution (belief) and the likelihood function. This has encouraged some to criticize the Bayesian hypothesis altogether, claiming that arbitrary choices for these components always allow for unjustified post-hoc explanations of the data [1]. We do not share this criticism, referring to a number of successful attempts to constrain prior beliefs and likelihood functions based on principled grounds. For example, prior beliefs have been defined as the relative distribution of the sensory variable in the environment in cases where these statistics are relatively easy to measure (e.g. local visual orientations [16]), or where it can be assumed that subjects have learned them over the course of the experiment (e.g. time perception [17]). Other studies have constrained the likelihood function according to known noise characteristics of neurons that are crucially involved in the specific perceptual process (e.g motion tuned neurons in visual cor∗ http://www.sas.upenn.edu/ astocker/lab 1 world neural representation efficient encoding percept Bayesian decoding Figure 1: Encoding-decoding framework. A stimulus representing a sensory variable θ elicits a firing rate response R = {r1 , r2 , ..., rN } in a population of N neurons. The perceptual task is to generate a ˆ good estimate θ(R) of the presented value of the sensory variable based on this population response. Our framework assumes that encoding is efficient, and decoding is Bayesian based on the likelihood p(R|θ), the prior p(θ), and a squared-error loss function. tex [18]). However, we agree that finding appropriate constraints is generally difficult and that prior beliefs and likelihood functions have been often selected on the basis of mathematical convenience. Here, we propose that the efficient coding hypothesis [19] offers a joint constraint on the prior and likelihood function in neural implementations of Bayesian inference. Efficient coding provides a normative description of how neurons encode sensory information, and suggests a direct link between measured perceptual discriminability, neural tuning characteristics, and environmental statistics [11]. We show how this link can be extended to a full Bayesian account of perception that includes perceptual biases. We validate our model framework against behavioral as well as neural data characterizing the perception of visual orientation. We demonstrate that we can account not only for the reported perceptual biases away from the cardinal orientations, but also for the specific response characteristics of orientation-tuned neurons in primary visual cortex. Our work is a novel proposal of how two important normative hypotheses in perception science, namely efficient (en)coding and Bayesian decoding, might be linked. 2 Encoding-decoding framework We consider perception as an inference process that takes place along the simplified neural encodingdecoding cascade illustrated in Fig. 11 . 2.1 Efficient encoding Efficient encoding proposes that the tuning characteristics of a neural population are adapted to the prior distribution p(θ) of the sensory variable such that the population optimally represents the sensory variable [19]. Different definitions of “optimally” are possible, and may lead to different results. Here, we assume an efficient representation that maximizes the mutual information between the sensory variable and the population response. With this definition and an upper limit on the total firing activity, the square-root of the Fisher Information must be proportional to the prior distribution [12, 21]. In order to constrain the tuning curves of individual neurons in the population we also impose a homogeneity constraint, requiring that there exists a one-to-one mapping F (θ) that transforms the ˜ physical space with units θ to a homogeneous space with units θ = F (θ) in which the stimulus distribution becomes uniform. This defines the mapping as θ F (θ) = p(χ)dχ , (1) −∞ which is the cumulative of the prior distribution p(θ). We then assume a neural population with identical tuning curves that evenly tiles the stimulus range in this homogeneous space. The population provides an efficient representation of the sensory variable θ according to the above constraints [11]. ˜ The tuning curves in the physical space are obtained by applying the inverse mapping F −1 (θ). Fig. 2 1 In the context of this paper, we consider ‘inferring’, ‘decoding’, and ‘estimating’ as synonymous. 2 stimulus distribution d samples # a Fisher information discriminability and average firing rates and b firing rate [ Hz] efficient encoding F likelihood function F -1 likelihood c symmetric asymmetric homogeneous space physical space Figure 2: Efficient encoding constrains the likelihood function. a) Prior distribution p(θ) derived from stimulus statistics. b) Efficient coding defines the shape of the tuning curves in the physical space by transforming a set of homogeneous neurons using a mapping F −1 that is the inverse of the cumulative of the prior p(θ) (see Eq. (1)). c) As a result, the likelihood shape is constrained by the prior distribution showing heavier tails on the side of lower prior density. d) Fisher information, discrimination threshold, and average firing rates are all uniform in the homogeneous space. illustrates the applied efficient encoding scheme, the mapping, and the concept of the homogeneous space for the example of a symmetric, exponentially decaying prior distribution p(θ). The key idea here is that by assuming efficient encoding, the prior (i.e. the stimulus distribution in the world) directly constrains the likelihood function. In particular, the shape of the likelihood is determined by the cumulative distribution of the prior. As a result, the likelihood is generally asymmetric, as shown in Fig. 2, exhibiting heavier tails on the side of the prior with lower density. 2.2 Bayesian decoding Let us consider a population of N sensory neurons that efficiently represents a stimulus variable θ as described above. A stimulus θ0 elicits a specific population response that is characterized by the vector R = [r1 , r2 , ..., rN ] where ri is the spike-count of the ith neuron over a given time-window τ . Under the assumption that the variability in the individual firing rates is governed by a Poisson process, we can write the likelihood function over θ as N p(R|θ) = (τ fi (θ))ri −τ fi (θ) e , ri ! i=1 (2) ˆ with fi (θ) describing the tuning curve of neuron i. We then define a Bayesian decoder θLSE as the estimator that minimizes the expected squared-error between the estimate and the true stimulus value, thus θp(R|θ)p(θ)dθ ˆ θLSE (R) = , (3) p(R|θ)p(θ)dθ where we use Bayes’ rule to appropriately combine the sensory evidence with the stimulus prior p(θ). 3 Bayesian estimates can be biased away from prior peaks Bayesian models of perception typically predict perceptual biases toward the peaks of the prior density, a characteristic often considered a hallmark of Bayesian inference. This originates from the 3 a b prior attraction prior prior attraction likelihood repulsion! likelihood c prior prior repulsive bias likelihood likelihood mean posterior mean posterior mean Figure 3: Bayesian estimates biased away from the prior. a) If the likelihood function is symmetric, then the estimate (posterior mean) is, on average, shifted away from the actual value of the sensory variable θ0 towards the prior peak. b) Efficient encoding typically leads to an asymmetric likelihood function whose normalized mean is away from the peak of the prior (relative to θ0 ). The estimate is determined by a combination of prior attraction and shifted likelihood mean, and can exhibit an overall repulsive bias. c) If p(θ0 ) < 0 and the likelihood is relatively narrow, then (1/p(θ)2 ) > 0 (blue line) and the estimate is biased away from the prior peak (see Eq. (6)). common approach of choosing a parametric description of the likelihood function that is computationally convenient (e.g. Gaussian). As a consequence, likelihood functions are typically assumed to be symmetric (but see [23, 24]), leaving the bias of the Bayesian estimator to be mainly determined by the shape of the prior density, i.e. leading to biases toward the peak of the prior (Fig. 3a). In our model framework, the shape of the likelihood function is constrained by the stimulus prior via efficient neural encoding, and is generally not symmetric for non-flat priors. It has a heavier tail on the side with lower prior density (Fig. 3b). The intuition is that due to the efficient allocation of neural resources, the side with smaller prior density will be encoded less accurately, leading to a broader likelihood function on that side. The likelihood asymmetry pulls the Bayes’ least-squares estimate away from the peak of the prior while at the same time the prior pulls it toward its peak. Thus, the resulting estimation bias is the combination of these two counter-acting forces - and both are determined by the prior! 3.1 General derivation of the estimation bias In the following, we will formally derive the mean estimation bias b(θ) of the proposed encodingdecoding framework. Specifically, we will study the conditions for which the bias is repulsive i.e. away from the peak of the prior density. ˆ We first re-write the estimator θLSE (3) by replacing θ with the inverse of its mapping to the homo−1 ˜ geneous space, i.e., θ = F (θ). The motivation for this is that the likelihood in the homogeneous space is symmetric (Fig. 2). Given a value θ0 and the elicited population response R, we can write the estimator as ˜ ˜ ˜ ˜ θp(R|θ)p(θ)dθ F −1 (θ)p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) ˆ θLSE (R) = = . ˜ ˜ ˜ p(R|θ)p(θ)dθ p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) Calculating the derivative of the inverse function and noting that F is the cumulative of the prior density, we get 1 1 1 ˜ ˜ ˜ ˜ ˜ ˜ dθ = dθ. dF −1 (θ) = (F −1 (θ)) dθ = dθ = −1 (θ)) ˜ F (θ) p(θ) p(F ˆ Hence, we can simplify θLSE (R) as ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)p(R|F −1 (θ))dθ . ˜ ˜ p(R|F −1 (θ))dθ With ˜ K(R, θ) = ˜ p(R|F −1 (θ)) ˜ ˜ p(R|F −1 (θ))dθ 4 we can further simplify the notation and get ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)K(R, θ)dθ . (4) ˆ ˜ In order to get the expected value of the estimate, θLSE (θ), we marginalize (4) over the population response space S, ˆ ˜ ˜ ˜ ˜ θLSE (θ) = p(R)F −1 (θ)K(R, θ)dθdR S = F −1 ˜ (θ)( ˜ ˜ p(R)K(R, θ)dR)dθ = ˜ ˜ ˜ F −1 (θ)L(θ)dθ, S where we define ˜ L(θ) = ˜ p(R)K(R, θ)dR. S ˜ ˜ ˜ It follows that L(θ)dθ = 1. Due to the symmetry in this space, it can be shown that L(θ) is ˜0 . Intuitively, L(θ) can be thought as the normalized ˜ symmetric around the true stimulus value θ average likelihood in the homogeneous space. We can then compute the expected bias at θ0 as b(θ0 ) = ˜ ˜ ˜ ˜ F −1 (θ)L(θ)dθ − F −1 (θ0 ) (5) ˜ This is expression is general where F −1 (θ) is defined as the inverse of the cumulative of an arbitrary ˜ prior density p(θ) (see Eq. (1)) and the dispersion of L(θ) is determined by the internal noise level. ˜ ˜ Assuming the prior density to be smooth, we expand F −1 in a neighborhood (θ0 − h, θ0 + h) that is larger than the support of the likelihood function. Using Taylor’s theorem with mean-value forms of the remainder, we get 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ F −1 (θ) = F −1 (θ0 ) + F −1 (θ0 ) (θ − θ0 ) + F −1 (θx ) (θ − θ0 )2 , 2 ˜ ˜ ˜ with θx lying between θ0 and θ. By applying this expression to (5), we find ˜ θ0 +h b(θ0 ) = = 1 2 ˜ θ0 −h 1 −1 ˜ ˜ ˜ ˜ ˜ 1 F (θx )θ (θ − θ0 )2 L(θ)dθ = ˜ 2 2 ˜ θ0 +h −( ˜ θ0 −h p(θx )θ ˜ ˜ 2 ˜ ˜ 1 )(θ − θ0 ) L(θ)dθ = p(θx )3 4 ˜ θ0 +h 1 ˜ − θ0 )2 L(θ)dθ ˜ ˜ ˜ ( ) ˜(θ ˜ p(F −1 (θx )) θ ( 1 ˜ ˜ ˜ ˜ ) (θ − θ0 )2 L(θ)dθ. p(θx )2 θ ˜ θ0 −h ˜ θ0 +h ˜ θ0 −h In general, there is no simple rule to judge the sign of b(θ0 ). However, if the prior is monotonic ˜ ˜ on the interval F −1 ((θ0 − h, θ0 + h)), then the sign of ( p(θ1 )2 ) is always the same as the sign of x 1 1 ( p(θ0 )2 ) . Also, if the likelihood is sufficiently narrow we can approximate ( p(θ1 )2 ) by ( p(θ0 )2 ) , x and therefore approximate the bias as b(θ0 ) ≈ C( 1 ) , p(θ0 )2 (6) where C is a positive constant. The result is quite surprising because it states that as long as the prior is monotonic over the support of the likelihood function, the expected estimation bias is always away from the peaks of the prior! 3.2 Internal (neural) versus external (stimulus) noise The above derivation of estimation bias is based on the assumption that all uncertainty about the sensory variable is caused by neural response variability. This level of internal noise depends on the response magnitude, and thus can be modulated e.g. by changing stimulus contrast. This contrastcontrolled noise modulation is commonly exploited in perceptual studies (e.g. [18]). Internal noise will always lead to repulsive biases in our framework if the prior is monotonic. If internal noise is low, the likelihood is narrow and thus the bias is small. Increasing internal noise leads to increasingly 5 larger biases up to the point where the likelihood becomes wide enough such that monotonicity of the prior over the support of the likelihood is potentially violated. Stimulus noise is another way to modulate the noise level in perception (e.g. random-dot motion stimuli). Such external noise, however, has a different effect on the shape of the likelihood function as compared to internal noise. It modifies the likelihood function (2) by convolving it with the noise kernel. External noise is frequently chosen as additive and symmetric (e.g. zero-mean Gaussian). It is straightforward to prove that such symmetric external noise does not lead to a change in the mean of the likelihood, and thus does not alter the repulsive effect induced by its asymmetry. However, by increasing the overall width of the likelihood, the attractive influence of the prior increases, resulting in an estimate that is closer to the prior peak than without external noise2 . 4 Perception of visual orientation We tested our framework by modelling the perception of visual orientation. Our choice was based on the fact that i) we have pretty good estimates of the prior distribution of local orientations in natural images, ii) tuning characteristics of orientation selective neurons in visual cortex are wellstudied (monkey/cat), and iii) biases in perceived stimulus orientation have been well characterized. We start by creating an efficient neural population based on measured prior distributions of local visual orientation, and then compare the resulting tuning characteristics of the population and the predicted perceptual biases with reported data in the literature. 4.1 Efficient neural model population for visual orientation Previous studies measured the statistics of the local orientation in large sets of natural images and consistently found that the orientation distribution is multimodal, peaking at the two cardinal orientations as shown in Fig. 4a [16, 20]. We assumed that the visual system’s prior belief over orientation p(θ) follows this distribution and approximate it formally as p(θ) ∝ 2 − | sin(θ)| (black line in Fig. 4b) . (7) Based on this prior distribution we defined an efficient neural representation for orientation. We assumed a population of model neurons (N = 30) with tuning curves that follow a von-Mises distribution in the homogeneous space on top of a constant spontaneous firing rate (5 Hz). We then ˜ applied the inverse transformation F −1 (θ) to all these tuning curves to get the corresponding tuning curves in the physical space (Fig. 4b - red curves), where F (θ) is the cumulative of the prior (7). The concentration parameter for the von-Mises tuning curves was set to κ ≈ 1.6 in the homogeneous space in order to match the measured average tuning width (∼ 32 deg) of neurons in area V1 of the macaque [9]. 4.2 Predicted tuning characteristics of neurons in primary visual cortex The orientation tuning characteristics of our model population well match neurophysiological data of neurons in primary visual cortex (V1). Efficient encoding predicts that the distribution of neurons’ preferred orientation follows the prior, with more neurons tuned to cardinal than oblique orientations by a factor of approximately 1.5. A similar ratio has been found for neurons in area V1 of monkey/cat [9, 10]. Also, the tuning widths of the model neurons vary between 25-42 deg depending on their preferred tuning (see Fig. 4c), matching the measured tuning width ratio of 0.6 between neurons tuned to the cardinal versus oblique orientations [9]. An important prediction of our model is that most of the tuning curves should be asymmetric. Such asymmetries have indeed been reported for the orientation tuning of neurons in area V1 [6, 7, 8]. We computed the asymmetry index for our model population as defined in previous studies [6, 7], and plotted it as a function of the preferred tuning of each neuron (Fig. 4d). The overall asymmetry index in our model population is 1.24 ± 0.11, which approximately matches the measured values for neurons in area V1 of the cat (1.26 ± 0.06) [6]. It also predicts that neurons tuned to the cardinal and oblique orientations should show less symmetry than those tuned to orientations in between. Finally, 2 Note, that these predictions are likely to change if the external noise is not symmetric. 6 a b 25 firing rate(Hz) 0 orientation(deg) asymmetry vs. tuning width 1.0 2.0 90 2.0 e asymmetry 1.0 0 asymmetry index 50 30 width (deg) 10 90 preferred tuning(deg) -90 0 d 0 0 90 asymmetry index 0 orientation(deg) tuning width -90 0 0 probability 0 -90 c efficient representation 0.01 0.01 image statistics -90 0 90 preferred tuning(deg) 25 30 35 40 tuning width (deg) Figure 4: Tuning characteristics of model neurons. a) Distribution of local orientations in natural images, replotted from [16]. b) Prior used in the model (black) and predicted tuning curves according to efficient coding (red). c) Tuning width as a function of preferred orientation. d) Tuning curves of cardinal and oblique neurons are more symmetric than those tuned to orientations in between. e) Both narrowly and broadly tuned neurons neurons show less asymmetry than neurons with tuning widths in between. neurons with tuning widths at the lower and upper end of the range are predicted to exhibit less asymmetry than those neurons whose widths lie in between these extremes (illustrated in Fig. 4e). These last two predictions have not been tested yet. 4.3 Predicted perceptual biases Our model framework also provides specific predictions for the expected perceptual biases. Humans show systematic biases in perceived orientation of visual stimuli such as e.g. arrays of Gabor patches (Fig. 5a,d). Two types of biases can be distinguished: First, perceived orientations show an absolute bias away from the cardinal orientations, thus away from the peaks of the orientation prior [2, 3]. We refer to these biases as absolute because they are typically measured by adjusting a noise-free reference until it matched the orientation of the test stimulus. Interestingly, these repulsive absolute biases are the larger the smaller the external stimulus noise is (see Fig. 5b). Second, the relative bias between the perceived overall orientations of a high-noise and a low-noise stimulus is toward the cardinal orientations as shown in Fig. 5c, and thus toward the peak of the prior distribution [3, 16]. The predicted perceptual biases of our model are shown Fig. 5e,f. We computed the likelihood function according to (2) and used the prior in (7). External noise was modeled by convolving the stimulus likelihood function with a Gaussian (different widths for different noise levels). The predictions well match both, the reported absolute bias away as well as the relative biases toward the cardinal orientations. Note, that our model framework correctly accounts for the fact that less external noise leads to larger absolute biases (see also discussion in section 3.2). 5 Discussion We have presented a modeling framework for perception that combines efficient (en)coding and Bayesian decoding. Efficient coding imposes constraints on the tuning characteristics of a population of neurons according to the stimulus distribution (prior). It thus establishes a direct link between prior and likelihood, and provides clear constraints on the latter for a Bayesian observer model of perception. We have shown that the resulting likelihoods are in general asymmetric, with 7 absolute bias (data) b c relative bias (data) -4 0 bias(deg) 4 a low-noise stimulus -90 e 90 absolute bias (model) low external noise high external noise 3 high-noise stimulus -90 f 0 90 relative bias (model) 0 bias(deg) d 0 attraction -3 repulsion -90 0 orientation (deg) 90 -90 0 orientation (deg) 90 Figure 5: Biases in perceived orientation: Human data vs. Model prediction. a,d) Low- and highnoise orientation stimuli of the type used in [3, 16]. b) Humans show absolute biases in perceived orientation that are away from the cardinal orientations. Data replotted from [2] (pink squares) and [3] (green (black) triangles: bias for low (high) external noise). c) Relative bias between stimuli with different external noise level (high minus low). Data replotted from [3] (blue triangles) and [16] (red circles). e,f) Model predictions for absolute and relative bias. heavier tails away from the prior peaks. We demonstrated that such asymmetric likelihoods can lead to the counter-intuitive prediction that a Bayesian estimator is biased away from the peaks of the prior distribution. Interestingly, such repulsive biases have been reported for human perception of visual orientation, yet a principled and consistent explanation of their existence has been missing so far. Here, we suggest that these counter-intuitive biases directly follow from the asymmetries in the likelihood function induced by efficient neural encoding of the stimulus. The good match between our model predictions and the measured perceptual biases and orientation tuning characteristics of neurons in primary visual cortex provides further support of our framework. Previous work has suggested that there might be a link between stimulus statistics, neuronal tuning characteristics, and perceptual behavior based on efficient coding principles, yet none of these studies has recognized the importance of the resulting likelihood asymmetries [16, 11]. We have demonstrated here that such asymmetries can be crucial in explaining perceptual data, even though the resulting estimates appear “anti-Bayesian” at first sight (see also models of sensory adaptation [23]). Note, that we do not provide a neural implementation of the Bayesian inference step. However, we and others have proposed various neural decoding schemes that can approximate Bayes’ leastsquares estimation using efficient coding [26, 25, 22]. It is also worth pointing out that our estimator is set to minimize total squared-error, and that other choices of the loss function (e.g. MAP estimator) could lead to different predictions. Our framework is general and should be directly applicable to other modalities. In particular, it might provide a new explanation for perceptual biases that are hard to reconcile with traditional Bayesian approaches [5]. Acknowledgments We thank M. Jogan and A. Tank for helpful comments on the manuscript. This work was partially supported by grant ONR N000141110744. 8 References [1] M. Jones, and B. C. Love. Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–231,2011. [2] D. P. Andrews. Perception of contours in the central fovea. Nature, 205:1218- 1220, 1965. [3] A. Tomassini, M. J.Morgam. and J. A. Solomon. Orientation uncertainty reduces perceived obliquity. Vision Res, 50, 541–547, 2010. [4] W. S. Geisler, D. Kersten. Illusions, perception and Bayes. Nature Neuroscience, 5(6):508- 510, 2002. [5] M. O. Ernst Perceptual learning: inverting the size-weight illusion. Current Biology, 19:R23- R25, 2009. [6] G. H. Henry, B. Dreher, P. O. Bishop. Orientation specificity of cells in cat striate cortex. J Neurophysiol, 37(6):1394-409,1974. [7] D. Rose, C. Blakemore An analysis of orientation selectivity in the cat’s visual cortex. Exp Brain Res., Apr 30;20(1):1-17, 1974. [8] N. V. Swindale. Orientation tuning curves: empirical description and estimation of parameters. Biol Cybern., 78(1):45-56, 1998. [9] R. L. De Valois, E. W. Yund, N. Hepler. The orientation and direction selectivity of cells in macaque visual cortex. Vision Res.,22, 531544,1982. [10] B. Li, M. R. Peterson, R. D. Freeman. The oblique effect: a neural basis in the visual cortex. J. Neurophysiol., 90, 204217, 2003. [11] D. Ganguli and E.P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations. In Adv. Neural Information Processing Systems NIPS 23, vol. 23:658–666, 2011. [12] M. D. McDonnell, N. G. Stocks. Maximally Informative Stimuli and Tuning Curves for Sigmoidal RateCoding Neurons and Populations. Phys Rev Lett., 101(5):058103, 2008. [13] H Helmholtz. Treatise on Physiological Optics (transl.). Thoemmes Press, Bristol, U.K., 2000. Original publication 1867. [14] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [15] D.C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University Press, 1996. [16] A R Girshick, M S Landy, and E P Simoncelli. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat Neurosci, 14(7):926–932, Jul 2011. [17] M. Jazayeri and M.N. Shadlen. Temporal context calibrates interval timing. Nature Neuroscience, 13(8):914–916, 2010. [18] A.A. Stocker and E.P. Simoncelli. Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, pages 578–585, April 2006. [19] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. [20] D.M. Coppola, H.R. Purves, A.N. McCoy, and D. Purves The distribution of oriented contours in the real world. Proc Natl Acad Sci U S A., 95(7): 4002–4006, 1998. [21] N. Brunel and J.-P. Nadal. Mutual information, Fisher information and population coding. Neural Computation, 10, 7, 1731–1757, 1998. [22] X-X. Wei and A.A. Stocker. Bayesian inference with efficient neural population codes. In Lecture Notes in Computer Science, Artificial Neural Networks and Machine Learning - ICANN 2012, Lausanne, Switzerland, volume 7552, pages 523–530, 2012. [23] A.A. Stocker and E.P. Simoncelli. Sensory adaptation within a Bayesian framework for perception. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages o 1291–1298. MIT Press, Cambridge, MA, 2006. Oral presentation. [24] D.C. Knill. Robust cue integration: A Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. Journal of Vision, 7(7):1–24, 2007. [25] Deep Ganguli. Efficient coding and Bayesian inference with neural populations. PhD thesis, Center for Neural Science, New York University, New York, NY, September 2012. [26] B. Fischer. Bayesian estimates from heterogeneous population codes. In Proc. IEEE Intl. Joint Conf. on Neural Networks. IEEE, 2010. 9

5 0.6460948 227 nips-2012-Multiclass Learning with Simplex Coding

Author: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-jeacques Slotine

Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows us to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, we develop a relaxation error analysis that avoids constraints on the considered hypotheses class. Moreover, using this setting we derive the first provably consistent regularized method with training/tuning complexity that is independent to the number of classes. We introduce tools from convex analysis that can be used beyond the scope of this paper. 1

6 0.63885939 188 nips-2012-Learning from Distributions via Support Measure Machines

7 0.63776875 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

8 0.63303465 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

9 0.63241446 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

10 0.63044327 197 nips-2012-Learning with Recursive Perceptual Representations

11 0.63036519 294 nips-2012-Repulsive Mixtures

12 0.63008064 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

13 0.63008016 126 nips-2012-FastEx: Hash Clustering with Exponential Families

14 0.62960768 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

15 0.62958288 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

16 0.62956059 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

17 0.62927645 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

18 0.62911111 279 nips-2012-Projection Retrieval for Classification

19 0.62907517 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

20 0.62850046 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes