nips nips2004 nips2004-46 knowledge-graph by maker-knowledge-mining

46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

Source: pdf

Author: Alan Stocker, Eero P. Simoncelli

Abstract: It has been demonstrated that basic aspects of human visual motion perception are qualitatively consistent with a Bayesian estimation framework, where the prior probability distribution on velocity favors slow speeds. Here, we present a reﬁned probabilistic model that can account for the typical trial-to-trial variabilities observed in psychophysical speed perception experiments. We also show that data from such experiments can be used to constrain both the likelihood and prior functions of the model. Speciﬁcally, we measured matching speeds and thresholds in a two-alternative forced choice speed discrimination task. Parametric ﬁts to the data reveal that the likelihood function is well approximated by a LogNormal distribution with a characteristic contrast-dependent variance, and that the prior distribution on velocity exhibits signiﬁcantly heavier tails than a Gaussian, and approximately follows a power-law function. Humans do not perceive visual motion veridically. Various psychophysical experiments have shown that the perceived speed of visual stimuli is affected by stimulus contrast, with low contrast stimuli being perceived to move slower than high contrast ones [1, 2]. Computational models have been suggested that can qualitatively explain these perceptual effects. Commonly, they assume the perception of visual motion to be optimal either within a deterministic framework with a regularization constraint that biases the solution toward zero motion [3, 4], or within a probabilistic framework of Bayesian estimation with a prior that favors slow velocities [5, 6]. The solutions resulting from these two frameworks are similar (and in some cases identical), but the probabilistic framework provides a more principled formulation of the problem in terms of meaningful probabilistic components. Speciﬁcally, Bayesian approaches rely on a likelihood function that expresses the relationship between the noisy measurements and the quantity to be estimated, and a prior distribution that expresses the probability of encountering any particular value of that quantity. A probabilistic model can also provide a richer description, by deﬁning a full probability density over the set of possible “percepts”, rather than just a single value. Numerous analyses of psychophysical experiments have made use of such distributions within the framework of signal detection theory in order to model perceptual behavior [7]. Previous work has shown that an ideal Bayesian observer model based on Gaussian forms µ posterior low contrast probability density probability density high contrast likelihood prior a posterior likelihood prior v ˆ v ˆ visual speed µ b visual speed Figure 1: Bayesian model of visual speed perception. a) For a high contrast stimulus, the likelihood has a narrow width (a high signal-to-noise ratio) and the prior induces only a small shift µ of the mean v of the posterior. b) For a low contrast stimuli, the measurement ˆ is noisy, leading to a wider likelihood. The shift µ is much larger and the perceived speed lower than under condition (a). for both likelihood and prior is sufﬁcient to capture the basic qualitative features of global translational motion perception [5, 6]. But the behavior of the resulting model deviates systematically from human perceptual data, most importantly with regard to trial-to-trial variability and the precise form of interaction between contrast and perceived speed. A recent article achieved better ﬁts for the model under the assumption that human contrast perception saturates [8]. In order to advance the theory of Bayesian perception and provide signiﬁcant constraints on models of neural implementation, it seems essential to constrain quantitatively both the likelihood function and the prior probability distribution. In previous work, the proposed likelihood functions were derived from the brightness constancy constraint [5, 6] or other generative principles [9]. Also, previous approaches deﬁned the prior distribution based on general assumptions and computational convenience, typically choosing a Gaussian with zero mean, although a Laplacian prior has also been suggested [4]. In this paper, we develop a more general form of Bayesian model for speed perception that can account for trial-to-trial variability. We use psychophysical speed discrimination data in order to constrain both the likelihood and the prior function. 1 1.1 Probabilistic Model of Visual Speed Perception Ideal Bayesian Observer Assume that an observer wants to obtain an estimate for a variable v based on a measurement m that she/he performs. A Bayesian observer “knows” that the measurement device is not ideal and therefore, the measurement m is affected by noise. Hence, this observer combines the information gained by the measurement m with a priori knowledge about v. Doing so (and assuming that the prior knowledge is valid), the observer will – on average – perform better in estimating v than just trusting the measurements m. According to Bayes’ rule 1 p(v|m) = p(m|v)p(v) (1) α the probability of perceiving v given m (posterior) is the product of the likelihood of v for a particular measurements m and the a priori knowledge about the estimated variable v (prior). α is a normalization constant independent of v that ensures that the posterior is a proper probability distribution. ^ ^ P(v2 > v1) 1 + Pcum=0.5 0 a b Pcum=0.875 vmatch vthres v2 Figure 2: 2AFC speed discrimination experiment. a) Two patches of drifting gratings were displayed simultaneously (motion without movement). The subject was asked to ﬁxate the center cross and decide after the presentation which of the two gratings was moving faster. b) A typical psychometric curve obtained under such paradigm. The dots represent the empirical probability that the subject perceived stimulus2 moving faster than stimulus1. The speed of stimulus1 was ﬁxed while v2 is varied. The point of subjective equality, vmatch , is the value of v2 for which Pcum = 0.5. The threshold velocity vthresh is the velocity for which Pcum = 0.875. It is important to note that the measurement m is an internal variable of the observer and is not necessarily represented in the same space as v. The likelihood embodies both the mapping from v to m and the noise in this mapping. So far, we assume that there is a monotonic function f (v) : v → vm that maps v into the same space as m (m-space). Doing so allows us to analytically treat m and vm in the same space. We will later propose a suitable form of the mapping function f (v). An ideal Bayesian observer selects the estimate that minimizes the expected loss, given the posterior and a loss function. We assume a least-squares loss function. Then, the optimal estimate v is the mean of the posterior in Equation (1). It is easy to see why this model ˆ of a Bayesian observer is consistent with the fact that perceived speed decreases with contrast. The width of the likelihood varies inversely with the accuracy of the measurements performed by the observer, which presumably decreases with decreasing contrast due to a decreasing signal-to-noise ratio. As illustrated in Figure 1, the shift in perceived speed towards slow velocities grows with the width of the likelihood, and thus a Bayesian model can qualitatively explain the psychophysical results [1]. 1.2 Two Alternative Forced Choice Experiment We would like to examine perceived speeds under a wide range of conditions in order to constrain a Bayesian model. Unfortunately, perceived speed is an internal variable, and it is not obvious how to design an experiment that would allow subjects to express it directly 1 . Perceived speed can only be accessed indirectly by asking the subject to compare the speed of two stimuli. For a given trial, an ideal Bayesian observer in such a two-alternative forced choice (2AFC) experimental paradigm simply decides on the basis of the two trial estimates v1 (stimulus1) and v2 (stimulus2) which stimulus moves faster. Each estimate v is based ˆ ˆ ˆ on a particular measurement m. For a given stimulus with speed v, an ideal Bayesian observer will produce a distribution of estimates p(ˆ|v) because m is noisy. Over trials, v the observers behavior can be described by classical signal detection theory based on the distributions of the estimates, hence e.g. the probability of perceiving stimulus2 moving 1 Although see [10] for an example of determining and even changing the prior of a Bayesian model for a sensorimotor task, where the estimates are more directly accessible. faster than stimulus1 is given as the cumulative probability Pcum (ˆ2 > v1 ) = v ˆ ∞ 0 p(ˆ2 |v2 ) v v2 ˆ 0 p(ˆ1 |v1 ) dˆ1 dˆ2 v v v (2) Pcum describes the full psychometric curve. Figure 2b illustrates the measured psychometric curve and its ﬁt from such an experimental situation. 2 Experimental Methods We measured matching speeds (Pcum = 0.5) and thresholds (Pcum = 0.875) in a 2AFC speed discrimination task. Subjects were presented simultaneously with two circular patches of horizontally drifting sine-wave gratings for the duration of one second (Figure 2a). Patches were 3deg in diameter, and were displayed at 6deg eccentricity to either side of a ﬁxation cross. The stimuli had an identical spatial frequency of 1.5 cycle/deg. One stimulus was considered to be the reference stimulus having one of two different contrast values (c1 =[0.075 0.5]) and one of ﬁve different speed values (u1 =[1 2 4 8 12] deg/sec) while the second stimulus (test) had one of ﬁve different contrast values (c2 =[0.05 0.1 0.2 0.4 0.8]) and a varying speed that was determined by an interleaved staircase procedure. For each condition there were 96 trials. Conditions were randomly interleaved, including a random choice of stimulus identity (test vs. reference) and motion direction (right vs. left). Subjects were asked to ﬁxate during stimulus presentation and select the faster moving stimulus. The threshold experiment differed only in that auditory feedback was given to indicate the correctness of their decision. This did not change the outcome of the experiment but increased signiﬁcantly the quality of the data and thus reduced the number of trials needed. 3 Analysis With the data from the speed discrimination experiments we could in principal apply a parametric ﬁt using Equation (2) to derive the prior and the likelihood, but the optimization is difﬁcult, and the ﬁt might not be well constrained given the amount of data we have obtained. The problem becomes much more tractable given the following weak assumptions: • We consider the prior to be relatively smooth. • We assume that the measurement m is corrupted by additive Gaussian noise with a variance whose dependence on stimulus speed and contrast is separable. • We assume that there is a mapping function f (v) : v → vm that maps v into the space of m (m-space). In that space, the likelihood is convolutional i.e. the noise in the measurement directly deﬁnes the width of the likelihood. These assumptions allow us to relate the psychophysical data to our probabilistic model in a simple way. The following analysis is in the m-space. The point of subjective equality (Pcum = 0.5) is deﬁned as where the expected values of the speed estimates are equal. We write E vm,1 ˆ vm,1 − E µ1 = E vm,2 ˆ = vm,2 − E µ2 (3) where E µ is the expected shift of the perceived speed compared to the veridical speed. For the discrimination threshold experiment, above assumptions imply that the variance var vm of the speed estimates vm is equal for both stimuli. Then, (2) predicts that the ˆ ˆ discrimination threshold is proportional to the standard deviation, thus vm,2 − vm,1 = γ var vm ˆ (4) likelihood a b prior vm Figure 3: Piece-wise approximation We perform a parametric ﬁt by assuming the prior to be piece-wise linear and the likelihood to be LogNormal (Gaussian in the m-space). where γ is a constant that depends on the threshold criterion Pcum and the exact shape of p(ˆm |vm ). v 3.1 Estimating the prior and likelihood In order to extract the prior and the likelihood of our model from the data, we have to ﬁnd a generic local form of the prior and the likelihood and relate them to the mean and the variance of the speed estimates. As illustrated in Figure 3, we assume that the likelihood is Gaussian with a standard deviation σ(c, vm ). Furthermore, the prior is assumed to be wellapproximated by a ﬁrst-order Taylor series expansion over the velocity ranges covered by the likelihood. We parameterize this linear expansion of the prior as p(vm ) = avm + b. We now can derive a posterior for this local approximation of likelihood and prior and then deﬁne the perceived speed shift µ(m). The posterior can be written as 2 vm 1 1 p(m|vm )p(vm ) = [exp(− )(avm + b)] α α 2σ(c, vm )2 where α is the normalization constant ∞ b p(m|vm )p(vm )dvm = π2σ(c, vm )2 α= 2 −∞ p(vm |m) = (5) (6) We can compute µ(m) as the ﬁrst order moment of the posterior for a given m. Exploiting the symmetries around the origin, we ﬁnd ∞ a(m) µ(m) = σ(c, vm )2 vp(vm |m)dvm ≡ (7) b(m) −∞ The expected value of µ(m) is equal to the value of µ at the expected value of the measurement m (which is the stimulus velocity vm ), thus a(vm ) σ(c, vm )2 E µ = µ(m)|m=vm = (8) b(vm ) Similarly, we derive var vm . Because the estimator is deterministic, the variance of the ˆ estimate only depends on the variance of the measurement m. For a given stimulus, the variance of the estimate can be well approximated by ∂ˆm (m) v var vm = var m ( ˆ |m=vm )2 (9) ∂m ∂µ(m) |m=vm )2 ≈ var m = var m (1 − ∂m Under the assumption of a locally smooth prior, the perceived velocity shift remains locally constant. The variance of the perceived speed vm becomes equal to the variance of the ˆ measurement m, which is the variance of the likelihood (in the m-space), thus var vm = σ(c, vm )2 ˆ (10) With (3) and (4), above derivations provide a simple dependency of the psychophysical data to the local parameters of the likelihood and the prior. 3.2 Choosing a Logarithmic speed representation We now want to choose the appropriate mapping function f (v) that maps v to the m-space. We deﬁne the m-space as the space in which the likelihood is Gaussian with a speedindependent width. We have shown that discrimination threshold is proportional to the width of the likelihood (4), (10). Also, we know from the psychophysics literature that visual speed discrimination approximately follows a Weber-Fechner law [11, 12], thus that the discrimination threshold increases roughly proportional with speed and so would the likelihood. A logarithmic speed representation would be compatible with the data and our choice of the likelihood. Hence, we transform the linear speed-domain v into a normalized logarithmic domain according to v + v0 vm = f (v) = ln( ) (11) v0 where v0 is a small normalization constant. The normalization is chosen to account for the expected deviation of equal variance behavior at the low end. Surprisingly, it has been found that neurons in the Medial Temporal area (Area MT) of macaque monkeys have speed-tuning curves that are very well approximated by Gaussians of constant width in above normalized logarithmic space [13]. These neurons are known to play a central role in the representation of motion. It seems natural to assume that they are strongly involved in tasks such as our performed psychophysical experiments. 4 Results Figure 4 shows the contrast dependent shift of speed perception and the speed discrimination threshold data for two subjects. Data points connected with a dashed line represent the relative matching speed (v2 /v1 ) for a particular contrast value c2 of the test stimulus as a function of the speed of the reference stimulus. Error bars are the empirical standard deviation of ﬁts to bootstrapped samples of the data. Clearly, low contrast stimuli are perceived to move slower. The effect, however, varies across the tested speed range and tends to become smaller for higher speeds. The relative discrimination thresholds for two different contrasts as a function of speed show that the Weber-Fechner law holds only approximately. The data are in good agreement with other data from the psychophysics literature [1, 11, 8]. For each subject, data from both experiments were used to compute a parametric leastsquares ﬁt according to (3), (4), (7), and (10). In order to test the assumption of a LogNormal likelihood we allowed the standard deviation to be dependent on contrast and speed, thus σ(c, vm ) = g(c)h(vm ). We split the speed range into six bins (subject2: ﬁve) and parameterized h(vm ) and the ratio a/b accordingly. Similarly, we parameterized g(c) for the seven contrast values. The resulting ﬁts are superimposed as bold lines in Figure 4. Figure 5 shows the ﬁtted parametric values for g(c) and h(v) (plotted in the linear domain), and the reconstructed prior distribution p(v) transformed back to the linear domain. The approximately constant values for h(v) provide evidence that a LogNormal distribution is an appropriate functional description of the likelihood. The resulting values for g(c) suggest for the likelihood width a roughly exponential decaying dependency on contrast with strong saturation for higher contrasts. discrimination threshold (relative) reference stimulus contrast c1: 0.075 0.5 subject 1 normalized matching speed 1.5 contrast c2 1 0.5 1 10 0.075 0.5 0.79 0.5 0.4 0.3 0.2 0.1 0 10 1 contrast: 1 10 discrimination threshold (relative) normalized matching speed subject 2 1.5 contrast c2 1 0.5 10 1 a 0.5 0.4 0.3 0.2 0.1 10 1 1 b speed of reference stimulus [deg/sec] 10 stimulus speed [deg/sec] Figure 4: Speed discrimination data for two subjects. a) The relative matching speed of a test stimulus with different contrast levels (c2 =[0.05 0.1 0.2 0.4 0.8]) to achieve subjective equality with a reference stimulus (two different contrast values c1 ). b) The relative discrimination threshold for two stimuli with equal contrast (c1,2 =[0.075 0.5]). reconstructed prior subject 1 p(v) [unnormalized] 1 Gaussian Power-Law g(c) 1 h(v) 2 0.9 1.5 0.8 0.1 n=-1.41 0.7 1 0.6 0.01 0.5 0.5 0.4 0.3 1 p(v) [unnormalized] subject 2 10 0.1 1 1 1 1 10 1 10 2 0.9 n=-1.35 0.1 1.5 0.8 0.7 1 0.6 0.01 0.5 0.5 0.4 1 speed [deg/sec] 10 0.3 0 0.1 1 contrast speed [deg/sec] Figure 5: Reconstructed prior distribution and parameters of the likelihood function. The reconstructed prior for both subjects show much heavier tails than a Gaussian (dashed ﬁt), approximately following a power-law function with exponent n ≈ −1.4 (bold line). 5 Conclusions We have proposed a probabilistic framework based on a Bayesian ideal observer and standard signal detection theory. We have derived a likelihood function and prior distribution for the estimator, with a fairly conservative set of assumptions, constrained by psychophysical measurements of speed discrimination and matching. The width of the resulting likelihood is nearly constant in the logarithmic speed domain, and decreases approximately exponentially with contrast. The prior expresses a preference for slower speeds, and approximately follows a power-law distribution, thus has much heavier tails than a Gaussian. It would be interesting to compare the here derived prior distributions with measured true distributions of local image velocities that impinge on the retina. Although a number of authors have measured the spatio-temporal structure of natural images [14, e.g. ], it is clearly difﬁcult to extract therefrom the true prior distribution because of the feedback loop formed through movements of the body, head and eyes. Acknowledgments The authors thank all subjects for their participation in the psychophysical experiments. References [1] P. Thompson. Perceived rate of movement depends on contrast. Vision Research, 22:377–380, 1982. [2] L.S. Stone and P. Thompson. Human speed perception is contrast dependent. Vision Research, 32(8):1535–1549, 1992. [3] A. Yuille and N. Grzywacz. A computational theory for the perception of coherent visual motion. Nature, 333(5):71–74, May 1988. [4] Alan Stocker. Constraint Optimization Networks for Visual Motion Perception - Analysis and Synthesis. PhD thesis, Dept. of Physics, Swiss Federal Institute of Technology, Z¨ rich, Switzeru land, March 2002. [5] Eero Simoncelli. Distributed analysis and representation of visual motion. PhD thesis, MIT, Dept. of Electrical Engineering, Cambridge, MA, 1993. [6] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [7] D.M. Green and J.A. Swets. Signal Detection Theory and Psychophysics. Wiley, New York, 1966. [8] F. H¨ rlimann, D. Kiper, and M. Carandini. Testing the Bayesian model of perceived speed. u Vision Research, 2002. [9] Y. Weiss and D.J. Fleet. Probabilistic Models of the Brain, chapter Velocity Likelihoods in Biological and Machine Vision, pages 77–96. Bradford, 2002. [10] K. Koerding and D. Wolpert. Bayesian integration in sensorimotor learning. 427(15):244–247, January 2004. Nature, [11] Leslie Welch. The perception of moving plaids reveals two motion-processing stages. Nature, 337:734–736, 1989. [12] S. McKee, G. Silvermann, and K. Nakayama. Precise velocity discrimintation despite random variations in temporal frequency and contrast. Vision Research, 26(4):609–619, 1986. [13] C.H. Anderson, H. Nover, and G.C. DeAngelis. Modeling the velocity tuning of macaque MT neurons. Journal of Vision/VSS abstract, 2003. [14] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network: Computation in Neural Systems, 6:345–358, 1995.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract It has been demonstrated that basic aspects of human visual motion perception are qualitatively consistent with a Bayesian estimation framework, where the prior probability distribution on velocity favors slow speeds. [sent-6, score-0.699]

2 Here, we present a reﬁned probabilistic model that can account for the typical trial-to-trial variabilities observed in psychophysical speed perception experiments. [sent-7, score-0.718]

3 We also show that data from such experiments can be used to constrain both the likelihood and prior functions of the model. [sent-8, score-0.305]

4 Speciﬁcally, we measured matching speeds and thresholds in a two-alternative forced choice speed discrimination task. [sent-9, score-0.754]

5 Various psychophysical experiments have shown that the perceived speed of visual stimuli is affected by stimulus contrast, with low contrast stimuli being perceived to move slower than high contrast ones [1, 2]. [sent-12, score-1.689]

6 Speciﬁcally, Bayesian approaches rely on a likelihood function that expresses the relationship between the noisy measurements and the quantity to be estimated, and a prior distribution that expresses the probability of encountering any particular value of that quantity. [sent-16, score-0.391]

7 Numerous analyses of psychophysical experiments have made use of such distributions within the framework of signal detection theory in order to model perceptual behavior [7]. [sent-18, score-0.238]

8 a) For a high contrast stimulus, the likelihood has a narrow width (a high signal-to-noise ratio) and the prior induces only a small shift µ of the mean v of the posterior. [sent-20, score-0.526]

9 b) For a low contrast stimuli, the measurement ˆ is noisy, leading to a wider likelihood. [sent-21, score-0.225]

10 The shift µ is much larger and the perceived speed lower than under condition (a). [sent-22, score-0.714]

11 for both likelihood and prior is sufﬁcient to capture the basic qualitative features of global translational motion perception [5, 6]. [sent-23, score-0.488]

12 But the behavior of the resulting model deviates systematically from human perceptual data, most importantly with regard to trial-to-trial variability and the precise form of interaction between contrast and perceived speed. [sent-24, score-0.413]

13 A recent article achieved better ﬁts for the model under the assumption that human contrast perception saturates [8]. [sent-25, score-0.283]

14 In order to advance the theory of Bayesian perception and provide signiﬁcant constraints on models of neural implementation, it seems essential to constrain quantitatively both the likelihood function and the prior probability distribution. [sent-26, score-0.45]

15 In previous work, the proposed likelihood functions were derived from the brightness constancy constraint [5, 6] or other generative principles [9]. [sent-27, score-0.134]

16 Also, previous approaches deﬁned the prior distribution based on general assumptions and computational convenience, typically choosing a Gaussian with zero mean, although a Laplacian prior has also been suggested [4]. [sent-28, score-0.277]

17 In this paper, we develop a more general form of Bayesian model for speed perception that can account for trial-to-trial variability. [sent-29, score-0.535]

18 We use psychophysical speed discrimination data in order to constrain both the likelihood and the prior function. [sent-30, score-1.014]

19 1 Probabilistic Model of Visual Speed Perception Ideal Bayesian Observer Assume that an observer wants to obtain an estimate for a variable v based on a measurement m that she/he performs. [sent-32, score-0.317]

20 A Bayesian observer “knows” that the measurement device is not ideal and therefore, the measurement m is affected by noise. [sent-33, score-0.522]

21 Hence, this observer combines the information gained by the measurement m with a priori knowledge about v. [sent-34, score-0.317]

22 Doing so (and assuming that the prior knowledge is valid), the observer will – on average – perform better in estimating v than just trusting the measurements m. [sent-35, score-0.375]

23 According to Bayes’ rule 1 p(v|m) = p(m|v)p(v) (1) α the probability of perceiving v given m (posterior) is the product of the likelihood of v for a particular measurements m and the a priori knowledge about the estimated variable v (prior). [sent-36, score-0.221]

24 875 vmatch vthres v2 Figure 2: 2AFC speed discrimination experiment. [sent-40, score-0.601]

25 a) Two patches of drifting gratings were displayed simultaneously (motion without movement). [sent-41, score-0.17]

26 The subject was asked to ﬁxate the center cross and decide after the presentation which of the two gratings was moving faster. [sent-42, score-0.203]

27 The dots represent the empirical probability that the subject perceived stimulus2 moving faster than stimulus1. [sent-44, score-0.355]

28 The speed of stimulus1 was ﬁxed while v2 is varied. [sent-45, score-0.39]

29 The threshold velocity vthresh is the velocity for which Pcum = 0. [sent-48, score-0.357]

30 It is important to note that the measurement m is an internal variable of the observer and is not necessarily represented in the same space as v. [sent-50, score-0.317]

31 The likelihood embodies both the mapping from v to m and the noise in this mapping. [sent-51, score-0.159]

32 So far, we assume that there is a monotonic function f (v) : v → vm that maps v into the same space as m (m-space). [sent-52, score-0.55]

33 Doing so allows us to analytically treat m and vm in the same space. [sent-53, score-0.529]

34 An ideal Bayesian observer selects the estimate that minimizes the expected loss, given the posterior and a loss function. [sent-55, score-0.312]

35 It is easy to see why this model ˆ of a Bayesian observer is consistent with the fact that perceived speed decreases with contrast. [sent-58, score-0.853]

36 The width of the likelihood varies inversely with the accuracy of the measurements performed by the observer, which presumably decreases with decreasing contrast due to a decreasing signal-to-noise ratio. [sent-59, score-0.411]

37 As illustrated in Figure 1, the shift in perceived speed towards slow velocities grows with the width of the likelihood, and thus a Bayesian model can qualitatively explain the psychophysical results [1]. [sent-60, score-1.064]

38 2 Two Alternative Forced Choice Experiment We would like to examine perceived speeds under a wide range of conditions in order to constrain a Bayesian model. [sent-62, score-0.351]

39 Unfortunately, perceived speed is an internal variable, and it is not obvious how to design an experiment that would allow subjects to express it directly 1 . [sent-63, score-0.715]

40 Perceived speed can only be accessed indirectly by asking the subject to compare the speed of two stimuli. [sent-64, score-0.825]

41 For a given trial, an ideal Bayesian observer in such a two-alternative forced choice (2AFC) experimental paradigm simply decides on the basis of the two trial estimates v1 (stimulus1) and v2 (stimulus2) which stimulus moves faster. [sent-65, score-0.517]

42 For a given stimulus with speed v, an ideal Bayesian observer will produce a distribution of estimates p(ˆ|v) because m is noisy. [sent-67, score-0.872]

43 the probability of perceiving stimulus2 moving 1 Although see [10] for an example of determining and even changing the prior of a Bayesian model for a sensorimotor task, where the estimates are more directly accessible. [sent-70, score-0.271]

44 Subjects were presented simultaneously with two circular patches of horizontally drifting sine-wave gratings for the duration of one second (Figure 2a). [sent-76, score-0.146]

45 One stimulus was considered to be the reference stimulus having one of two different contrast values (c1 =[0. [sent-80, score-0.547]

46 5]) and one of ﬁve different speed values (u1 =[1 2 4 8 12] deg/sec) while the second stimulus (test) had one of ﬁve different contrast values (c2 =[0. [sent-82, score-0.689]

47 8]) and a varying speed that was determined by an interleaved staircase procedure. [sent-87, score-0.43]

48 Conditions were randomly interleaved, including a random choice of stimulus identity (test vs. [sent-89, score-0.191]

49 Subjects were asked to ﬁxate during stimulus presentation and select the faster moving stimulus. [sent-92, score-0.303]

50 3 Analysis With the data from the speed discrimination experiments we could in principal apply a parametric ﬁt using Equation (2) to derive the prior and the likelihood, but the optimization is difﬁcult, and the ﬁt might not be well constrained given the amount of data we have obtained. [sent-95, score-0.722]

51 • We assume that the measurement m is corrupted by additive Gaussian noise with a variance whose dependence on stimulus speed and contrast is separable. [sent-97, score-0.856]

52 • We assume that there is a mapping function f (v) : v → vm that maps v into the space of m (m-space). [sent-98, score-0.575]

53 the noise in the measurement directly deﬁnes the width of the likelihood. [sent-101, score-0.192]

54 These assumptions allow us to relate the psychophysical data to our probabilistic model in a simple way. [sent-102, score-0.231]

55 5) is deﬁned as where the expected values of the speed estimates are equal. [sent-105, score-0.421]

56 We write E vm,1 ˆ vm,1 − E µ1 = E vm,2 ˆ = vm,2 − E µ2 (3) where E µ is the expected shift of the perceived speed compared to the veridical speed. [sent-106, score-0.714]

57 For the discrimination threshold experiment, above assumptions imply that the variance var vm of the speed estimates vm is equal for both stimuli. [sent-107, score-1.889]

58 1 Estimating the prior and likelihood In order to extract the prior and the likelihood of our model from the data, we have to ﬁnd a generic local form of the prior and the likelihood and relate them to the mean and the variance of the speed estimates. [sent-111, score-1.238]

59 As illustrated in Figure 3, we assume that the likelihood is Gaussian with a standard deviation σ(c, vm ). [sent-112, score-0.7]

60 Furthermore, the prior is assumed to be wellapproximated by a ﬁrst-order Taylor series expansion over the velocity ranges covered by the likelihood. [sent-113, score-0.272]

61 We parameterize this linear expansion of the prior as p(vm ) = avm + b. [sent-114, score-0.171]

62 We now can derive a posterior for this local approximation of likelihood and prior and then deﬁne the perceived speed shift µ(m). [sent-115, score-1.025]

63 The posterior can be written as 2 vm 1 1 p(m|vm )p(vm ) = [exp(− )(avm + b)] α α 2σ(c, vm )2 where α is the normalization constant ∞ b p(m|vm )p(vm )dvm = π2σ(c, vm )2 α= 2 −∞ p(vm |m) = (5) (6) We can compute µ(m) as the ﬁrst order moment of the posterior for a given m. [sent-116, score-1.717]

64 Because the estimator is deterministic, the variance of the ˆ estimate only depends on the variance of the measurement m. [sent-118, score-0.217]

65 For a given stimulus, the variance of the estimate can be well approximated by ∂ˆm (m) v var vm = var m ( ˆ |m=vm )2 (9) ∂m ∂µ(m) |m=vm )2 ≈ var m = var m (1 − ∂m Under the assumption of a locally smooth prior, the perceived velocity shift remains locally constant. [sent-119, score-1.491]

66 2 Choosing a Logarithmic speed representation We now want to choose the appropriate mapping function f (v) that maps v to the m-space. [sent-122, score-0.436]

67 We deﬁne the m-space as the space in which the likelihood is Gaussian with a speedindependent width. [sent-123, score-0.134]

68 We have shown that discrimination threshold is proportional to the width of the likelihood (4), (10). [sent-124, score-0.437]

69 Also, we know from the psychophysics literature that visual speed discrimination approximately follows a Weber-Fechner law [11, 12], thus that the discrimination threshold increases roughly proportional with speed and so would the likelihood. [sent-125, score-1.339]

70 A logarithmic speed representation would be compatible with the data and our choice of the likelihood. [sent-126, score-0.434]

71 Hence, we transform the linear speed-domain v into a normalized logarithmic domain according to v + v0 vm = f (v) = ln( ) (11) v0 where v0 is a small normalization constant. [sent-127, score-0.599]

72 Surprisingly, it has been found that neurons in the Medial Temporal area (Area MT) of macaque monkeys have speed-tuning curves that are very well approximated by Gaussians of constant width in above normalized logarithmic space [13]. [sent-129, score-0.171]

73 It seems natural to assume that they are strongly involved in tasks such as our performed psychophysical experiments. [sent-131, score-0.154]

74 4 Results Figure 4 shows the contrast dependent shift of speed perception and the speed discrimination threshold data for two subjects. [sent-132, score-1.345]

75 Data points connected with a dashed line represent the relative matching speed (v2 /v1 ) for a particular contrast value c2 of the test stimulus as a function of the speed of the reference stimulus. [sent-133, score-1.202]

76 Clearly, low contrast stimuli are perceived to move slower. [sent-135, score-0.412]

77 The effect, however, varies across the tested speed range and tends to become smaller for higher speeds. [sent-136, score-0.411]

78 The relative discrimination thresholds for two different contrasts as a function of speed show that the Weber-Fechner law holds only approximately. [sent-137, score-0.638]

79 In order to test the assumption of a LogNormal likelihood we allowed the standard deviation to be dependent on contrast and speed, thus σ(c, vm ) = g(c)h(vm ). [sent-140, score-0.808]

80 We split the speed range into six bins (subject2: ﬁve) and parameterized h(vm ) and the ratio a/b accordingly. [sent-141, score-0.39]

81 Figure 5 shows the ﬁtted parametric values for g(c) and h(v) (plotted in the linear domain), and the reconstructed prior distribution p(v) transformed back to the linear domain. [sent-144, score-0.221]

82 The resulting values for g(c) suggest for the likelihood width a roughly exponential decaying dependency on contrast with strong saturation for higher contrasts. [sent-146, score-0.338]

83 discrimination threshold (relative) reference stimulus contrast c1: 0. [sent-147, score-0.584]

84 1 0 10 1 contrast: 1 10 discrimination threshold (relative) normalized matching speed subject 2 1. [sent-159, score-0.704]

85 1 10 1 1 b speed of reference stimulus [deg/sec] 10 stimulus speed [deg/sec] Figure 4: Speed discrimination data for two subjects. [sent-166, score-1.384]

86 a) The relative matching speed of a test stimulus with different contrast levels (c2 =[0. [sent-167, score-0.755]

87 8]) to achieve subjective equality with a reference stimulus (two different contrast values c1 ). [sent-172, score-0.439]

88 b) The relative discrimination threshold for two stimuli with equal contrast (c1,2 =[0. [sent-173, score-0.425]

89 reconstructed prior subject 1 p(v) [unnormalized] 1 Gaussian Power-Law g(c) 1 h(v) 2 0. [sent-176, score-0.224]

90 1 1 contrast speed [deg/sec] Figure 5: Reconstructed prior distribution and parameters of the likelihood function. [sent-202, score-0.757]

91 The reconstructed prior for both subjects show much heavier tails than a Gaussian (dashed ﬁt), approximately following a power-law function with exponent n ≈ −1. [sent-203, score-0.378]

92 5 Conclusions We have proposed a probabilistic framework based on a Bayesian ideal observer and standard signal detection theory. [sent-205, score-0.338]

93 We have derived a likelihood function and prior distribution for the estimator, with a fairly conservative set of assumptions, constrained by psychophysical measurements of speed discrimination and matching. [sent-206, score-1.018]

94 The width of the resulting likelihood is nearly constant in the logarithmic speed domain, and decreases approximately exponentially with contrast. [sent-207, score-0.696]

95 The prior expresses a preference for slower speeds, and approximately follows a power-law distribution, thus has much heavier tails than a Gaussian. [sent-208, score-0.328]

96 It would be interesting to compare the here derived prior distributions with measured true distributions of local image velocities that impinge on the retina. [sent-209, score-0.21]

97 Acknowledgments The authors thank all subjects for their participation in the psychophysical experiments. [sent-213, score-0.214]

98 A computational theory for the perception of coherent visual motion. [sent-227, score-0.224]

99 The perception of moving plaids reveals two motion-processing stages. [sent-268, score-0.192]

100 Precise velocity discrimintation despite random variations in temporal frequency and contrast. [sent-274, score-0.147]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vm', 0.529), ('speed', 0.39), ('perceived', 0.24), ('pcum', 0.232), ('observer', 0.2), ('stimulus', 0.191), ('discrimination', 0.165), ('psychophysical', 0.154), ('velocity', 0.147), ('perception', 0.145), ('likelihood', 0.134), ('prior', 0.125), ('measurement', 0.117), ('contrast', 0.108), ('bayesian', 0.106), ('var', 0.105), ('motion', 0.084), ('shift', 0.084), ('lognormal', 0.081), ('visual', 0.079), ('width', 0.075), ('gratings', 0.069), ('speeds', 0.065), ('stimuli', 0.064), ('threshold', 0.063), ('heavier', 0.06), ('velocities', 0.06), ('subjects', 0.06), ('ideal', 0.06), ('reference', 0.057), ('psychometric', 0.055), ('subjective', 0.055), ('reconstructed', 0.054), ('posterior', 0.052), ('variance', 0.05), ('measurements', 0.05), ('tails', 0.049), ('moving', 0.047), ('avm', 0.046), ('drifting', 0.046), ('vmatch', 0.046), ('xate', 0.046), ('constrain', 0.046), ('subject', 0.045), ('logarithmic', 0.044), ('parametric', 0.042), ('matching', 0.041), ('expresses', 0.041), ('dvm', 0.04), ('interleaved', 0.04), ('deviation', 0.037), ('perceiving', 0.037), ('forced', 0.035), ('perceptual', 0.035), ('eero', 0.034), ('thresholds', 0.033), ('qualitatively', 0.033), ('ts', 0.032), ('psychophysics', 0.032), ('patches', 0.031), ('estimates', 0.031), ('alan', 0.031), ('macaque', 0.031), ('sensorimotor', 0.031), ('approximately', 0.03), ('human', 0.03), ('simoncelli', 0.029), ('probabilistic', 0.029), ('affected', 0.028), ('unnormalized', 0.028), ('favors', 0.028), ('detection', 0.028), ('equality', 0.028), ('slow', 0.028), ('vision', 0.027), ('mt', 0.027), ('gaussian', 0.027), ('assumptions', 0.027), ('bold', 0.026), ('normalization', 0.026), ('experiment', 0.025), ('relative', 0.025), ('mapping', 0.025), ('measured', 0.025), ('weiss', 0.025), ('law', 0.025), ('displayed', 0.024), ('slower', 0.023), ('decreases', 0.023), ('faster', 0.023), ('movement', 0.022), ('relate', 0.021), ('asked', 0.021), ('approximated', 0.021), ('ve', 0.021), ('maps', 0.021), ('varies', 0.021), ('dependency', 0.021), ('presentation', 0.021), ('signal', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

Author: Alan Stocker, Eero P. Simoncelli

2 0.13201421 140 nips-2004-Optimal Information Decoding from Neuronal Populations with Specific Stimulus Selectivity

Author: Marcelo A. Montemurro, Stefano Panzeri

Abstract: A typical neuron in visual cortex receives most inputs from other cortical neurons with a roughly similar stimulus preference. Does this arrangement of inputs allow efﬁcient readout of sensory information by the target cortical neuron? We address this issue by using simple modelling of neuronal population activity and information theoretic tools. We ﬁnd that efﬁcient synaptic information transmission requires that the tuning curve of the afferent neurons is approximately as wide as the spread of stimulus preferences of the afferent neurons reaching the target neuron. By meta analysis of neurophysiological data we found that this is the case for cortico-cortical inputs to neurons in visual cortex. We suggest that the organization of V1 cortico-cortical synaptic inputs allows optimal information transmission. 1

3 0.097302981 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

Author: Máté Lengyel, Peter Dayan

Abstract: Areas of the brain involved in various forms of memory exhibit patterns of neural activity quite unlike those in canonical computational models. We show how to use well-founded Bayesian probabilistic autoassociative recall to derive biologically reasonable neuronal dynamics in recurrently coupled models, together with appropriate values for parameters such as the membrane time constant and inhibition. We explicitly treat two cases. One arises from a standard Hebbian learning rule, and involves activity patterns that are coded by graded ﬁring rates. The other arises from a spike timing dependent learning rule, and involves patterns coded by the phase of spike times relative to a coherent local ﬁeld potential oscillation. Our model offers a new and more complete understanding of how neural dynamics may support autoassociation. 1

4 0.09104725 84 nips-2004-Inference, Attention, and Decision in a Bayesian Neural Architecture

Author: Angela J. Yu, Peter Dayan

Abstract: We study the synthesis of neural coding, selective attention and perceptual decision making. A hierarchical neural architecture is proposed, which implements Bayesian integration of noisy sensory input and topdown attentional priors, leading to sound perceptual discrimination. The model offers an explicit explanation for the experimentally observed modulation that prior information in one stimulus feature (location) can have on an independent feature (orientation). The network’s intermediate levels of representation instantiate known physiological properties of visual cortical neurons. The model also illustrates a possible reconciliation of cortical and neuromodulatory representations of uncertainty. 1

5 0.090970382 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons

Author: Rajesh P. Rao

Abstract: There is growing evidence from psychophysical and neurophysiological studies that the brain utilizes Bayesian principles for inference and decision making. An important open question is how Bayesian inference for arbitrary graphical models can be implemented in networks of spiking neurons. In this paper, we show that recurrent networks of noisy integrate-and-ﬁre neurons can perform approximate Bayesian inference for dynamic and hierarchical graphical models. The membrane potential dynamics of neurons is used to implement belief propagation in the log domain. The spiking probability of a neuron is shown to approximate the posterior probability of the preferred state encoded by the neuron, given past inputs. We illustrate the model using two examples: (1) a motion detection network in which the spiking probability of a direction-selective neuron becomes proportional to the posterior probability of motion in a preferred direction, and (2) a two-level hierarchical network that produces attentional effects similar to those observed in visual cortical areas V2 and V4. The hierarchical model offers a new Bayesian interpretation of attentional modulation in V2 and V4. 1

6 0.07090041 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account

7 0.070727371 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

8 0.070025653 148 nips-2004-Probabilistic Computation in Spiking Populations

9 0.068332382 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces

10 0.064957723 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern

11 0.063298434 21 nips-2004-An Information Maximization Model of Eye Movements

12 0.052871846 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

13 0.052731439 68 nips-2004-Face Detection --- Efficient and Rank Deficient

14 0.050992243 163 nips-2004-Semi-parametric Exponential Family PCA

15 0.050911069 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

16 0.049599051 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities

17 0.049586717 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution

18 0.045746952 155 nips-2004-Responding to Modalities with Different Latencies

19 0.04500702 193 nips-2004-Theories of Access Consciousness

20 0.043502901 164 nips-2004-Semi-supervised Learning by Entropy Minimization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.152), (1, -0.09), (2, -0.051), (3, -0.046), (4, 0.003), (5, 0.035), (6, 0.064), (7, -0.027), (8, 0.056), (9, -0.037), (10, 0.02), (11, 0.05), (12, 0.069), (13, -0.114), (14, 0.145), (15, -0.04), (16, -0.038), (17, -0.028), (18, 0.024), (19, 0.011), (20, 0.005), (21, 0.016), (22, 0.089), (23, -0.162), (24, 0.036), (25, -0.027), (26, 0.101), (27, -0.111), (28, -0.009), (29, 0.12), (30, 0.027), (31, -0.078), (32, -0.04), (33, -0.027), (34, -0.017), (35, 0.007), (36, -0.015), (37, -0.005), (38, 0.21), (39, 0.085), (40, -0.052), (41, -0.009), (42, 0.013), (43, 0.086), (44, 0.023), (45, 0.063), (46, 0.015), (47, 0.161), (48, 0.098), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96896422 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

Author: Alan Stocker, Eero P. Simoncelli

2 0.62893587 140 nips-2004-Optimal Information Decoding from Neuronal Populations with Specific Stimulus Selectivity

Author: Marcelo A. Montemurro, Stefano Panzeri

3 0.61331898 84 nips-2004-Inference, Attention, and Decision in a Bayesian Neural Architecture

Author: Angela J. Yu, Peter Dayan

4 0.58706784 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account

Author: Aaron C. Courville, Nathaniel D. Daw, David S. Touretzky

Abstract: We propose a probabilistic, generative account of conﬁgural learning phenomena in classical conditioning. Conﬁgural learning experiments probe how animals discriminate and generalize between patterns of simultaneously presented stimuli (such as tones and lights) that are differentially predictive of reinforcement. Previous models of these issues have been successful more on a phenomenological than an explanatory level: they reproduce experimental ﬁndings but, lacking formal foundations, provide scant basis for understanding why animals behave as they do. We present a theory that clariﬁes seemingly arbitrary aspects of previous models while also capturing a broader set of data. Key patterns of data, e.g. concerning animals’ readiness to distinguish patterns with varying degrees of overlap, are shown to follow from statistical inference.

5 0.47836113 193 nips-2004-Theories of Access Consciousness

Author: Michael D. Colagrosso, Michael C. Mozer

Abstract: Theories of access consciousness address how it is that some mental states but not others are available for evaluation, choice behavior, and verbal report. Farah, O’Reilly, and Vecera (1994) argue that quality of representation is critical; Dehaene, Sergent, and Changeux (2003) argue that the ability to communicate representations is critical. We present a probabilistic information transmission or PIT model that suggests both of these conditions are essential for access consciousness. Having successfully modeled data from the repetition priming literature in the past, we use the PIT model to account for data from two experiments on subliminal priming, showing that the model produces priming even in the absence of accessibility and reportability of internal states. The model provides a mechanistic basis for understanding the dissociation of priming and awareness. Philosophy has made many attempts to identify distinct aspects of consciousness. Perhaps the most famous effort is Block’s (1995) delineation of phenomenal and access consciousness. Phenomenal consciousness has to do with “what it is like” to experience chocolate or a pin prick. Access consciousness refers to internal states whose content is “(1) inferentially promiscuous, i.e., poised to be used as a premise in reasoning, (2) poised for control of action, and (3) poised for rational control of speech.” (p. 230) The scientiﬁc study of consciousness has exploded in the past six years, and an important catalyst for this explosion has been the decision to focus on the problem of access consciousness: how is it that some mental states but not others become available for evaluation, choice behavior, verbal report, and storage in working memory. Another reason for the recent explosion of consciousness research is the availability of functional imaging techniques to explore differences in brain activation between conscious and unconscious states, as well as the development of clever psychological experiments that show that a stimulus that is not consciously perceived can nonetheless inﬂuence cognition, which we describe shortly. 1 Subliminal Priming The phenomena we address utilize an experimental paradigm known as repetition priming. Priming refers to an improvement in efﬁciency in processing a stimulus item as a result of previous exposure to the item. Efﬁciency is deﬁned in terms of shorter response times, lower error rates, or both. A typical long-term perceptual priming experiment consists of a study phase during which participants are asked to read aloud a list of words, and a test phase during which participants must name or categorize a series of words, presented one at a time. Reaction time is lower and/or accuracy is higher for test words that were also on the study list. Repetition priming occurs without strategic effort on the part of participants, and therefore appears to be a low level mechanism of learning, which likely serves as the mechanism underlying the reﬁnement of cognitive skills with practice. In traditional studies, priming is supraliminal—the prime is consciously perceived. In the studies we model here, primes are subliminal. Subliminal priming addresses fundamental issues concerning conscious access: How is it that a word or image that cannot be identiﬁed, detected, or even discriminated in forced choice can nonetheless inﬂuence the processing of a subsequent stimulus word? Answering this question in a computational framework would be a signiﬁcant advance toward understanding the nature of access consciousness. 2 Models of Conscious and Unconscious Processing In contrast to the wealth of experimental data, and the large number of speculative and philosophical papers on consciousness, concrete computational models are rare. The domain of consciousness is particularly ripe for theoretical perspectives, because it is a signiﬁcant contribution to simply provide an existence proof of a mechanism that can explain speciﬁc experimental data. Ordinarily, a theorist faces skepticism when presenting a model; it often seems that hundreds of alternative, equally plausible accounts must exist. However, when addressing data deemed central to issues of consciousness, simply providing a concrete handle on the phenomena serves to demystify consciousness and bring it into the realm of scientiﬁc understanding. We are familiar with only three computational models that address speciﬁc experimental data in the domain of consciousness. We summarize these models, and then present a novel model and describe its relationship to the previous efforts. Farah, O’Reilly, and Vecera (1994) were the ﬁrst to model speciﬁc phenomena pertaining to consciousness in a computational framework. The phenomena involve prosopagnosia, a deﬁcit of overt face recognition following brain damage. Nonetheless, prosopagnosia patients exhibit residual covert recognition by a variety of tests. For example, when patients are asked to categorize names as famous or nonfamous, their response times are faster to a famous name when the name is primed by a picture of a semantically related face (e.g., the name “Bill Clinton” when preceded by a photograph of Hillary), despite the fact that they could not identify the related face. Farah et al. model face recognition in a neural network, and show that when the network is damaged, it loses the ability to perform tasks requiring high ﬁdelity representations (e.g., identiﬁcation) but not tasks requiring only coarse information (e.g., semantic priming). They argue that conscious perception is associated with a certain minimal quality of representation. Dehaene and Naccache (2001) outline a framework based on Baars’ (1989) notion of conscious states as residing in a global workspace. They describe the workspace as a “distributed neural system...with long-distance connectivity that can potentially interconnect multiple specialized brain areas in a coordinated, though variable manner.” (p. 13) Dehaene, Sergent, and Changeaux (2003) implement this framework in a complicated architecture of integrate-and-ﬁre neurons and show that the model can qualitatively account for the attentional blink phenomenon. The attentional blink is observed in experiments where participants are shown a rapid series of stimuli, which includes two targets (T1 and T2). If T2 appears shortly after T1, the ability to report T2 drops, as if attention is distracted. Dehane et al. explain this phenomenon as follows. When T1 is presented, its activation propagates to frontal cortical areas (the global workspace). Feedback connections lead to a resonance between frontal and posterior areas, which strengthen T1 but block T2 from entering the workspace. If the T1-T2 lag is sufﬁciently great, habituation of T1 sufﬁciently weakens the representation such that T2 can enter the workspace and suppress T1. In this account, conscious access is achieved via resonance between posterior and frontal areas. Although the Farah et al. and Dehaene et al. models might not seem to have much in common, they both make claims concerning what is required to achieve functional connectivity between perceptual and response systems. Farah et al. focus on aspects of the representation; Dehaene et al. focus on a pathway through which representations can be communicated. These two aspects are not incompatible, and in fact, a third model incorporates both. Mathis and Mozer (1996) describe an architecture with processing modules for perceptual and response processes, implemented as attractor neural nets. They argue that in order for a representation in some perceptual module to be assured of inﬂuencing a response module, (a) it must have certain characteristics–temporal persistence and well-formedness– which is quite similar to Farah et al.’s notion of quality, and (b) the two modules must be interconnected—which is the purpose of Dehaene et al.’s global workspace. The model has two limitations that restrict its value as a contemporary account of conscious access. First, it addressed classical subliminal priming data, but more reliable data has recently been reported. Second, like the other two models, Mathis and Mozer used a complex neural network architecture with arbitrary assumptions built in, and the sensitivity of the model’s behavior to these assumptions is far from clear. In this paper, we present a model that embodies the same assumptions as Mathis and Mozer, but overcomes its two limitations, and explains subliminal-priming data that has yet to be interpreted via a computational model. 3 The Probabilistic Information Transmission (PIT) Framework Our model is based on the probabilistic information transmission or PIT framework of Mozer, Colagrosso, and Huber (2002, 2003). The framework characterizes the transmission of information from perceptual to response systems, and how the time course of information transmission changes with experience (i.e., priming). Mozer et al. used this framework to account for a variety of facilitation effects from supraliminal repetition priming. The framework describes cognition in terms of a collection of information-processing pathways, and supposes that any act of cognition involves coordination among multiple pathways. For example, to model a letter-naming task where a letter printed in upper or lower case is presented visually and the letter must be named, the framework would assume a perceptual pathway that maps the visual input to an identity representation, and a response pathway that maps a identity representation to a naming response. The framework is formalized as a probabilistic model: the pathway input and output are random variables and microinference in a pathway is carried out by Bayesian belief revision. The framework captures the time course of information processing for a single experimental trial. To elaborate, consider a pathway whose input at time t is a discrete random variable, denoted X(t), which can assume values x1 , x2 , x3 , . . . , xnx corresponding to alternative input states. Similarly, the output of the pathway at time t is a discrete random variable, denoted Y (t), which can assume values y1 , y2 , y3 , . . . , yny . For example, in the letter-naming task, the input to the perceptual pathway would be one of nx = 52 visual patterns corresponding to the upper- and lower-case letters of the alphabet, and the output is one of ny = 26 letter identities. To present a particular input alternative, say xi , to the model for T time steps, we specify X(t) = xi for t = 1 . . . T , and allow the model to compute P(Y (t) | X(1) . . . X(t)). A pathway is modeled as a dynamic Bayes network; the minimal version of the model used in the present simulations is simply a hidden Markov model, where the X(t) are observations and the Y (t) are inferred state (see Figure 1a). In typical usage, an HMM is presented with a sequence of distinct inputs, whereas we maintain the same input for many successive time steps; and an HMM transitions through a sequence of distinct hidden states, whereas we attempt to converge with increasing conﬁdence on a single state. Figure 1b illustrates the time course of inference in a single pathway with 52 input and 26 output alternatives and two-to-one associations. The solid line in the Figure shows, as a function of time t, P(Y (t) = yi | X(1) = x2i . . . X(t) = x2i ), i.e., the probability that input i (say, the visual pattern of an upper case O) will produce its target output (the letter identity). Evidence for the target output accumulates gradually over time, yielding a speed-accuracy curve that relates the number of iterations to the accuracy of identiﬁcation. Y0 Y1 X1 Y2 X2 P(Output) 1 Y3 X3 (a) (b) 0.8 0.6 O 0.4 0.2 0 Q Time Figure 1: (a) basic pathway architecture—a hidden Markov model; (b) time course of inference in a pathway when the letter O is presented, causing activation of both O and the visually similar Q. The exact shape of the speed-accuracy curve—the pathway dynamics—are determined by three probability distributions, which embody the knowledge and past experience of the model. First, P(Y (0)) is the prior distribution over outputs in the absence of any information about the input. Second, P(Y (t) | Y (t − 1)) characterizes how the pathway output evolves over time. We assume the transition probability matrix serves as a memory with diffusion, i.e., P(Y (t) = yi |Y (t − 1) = yj ) = (1 − β)δij + βP(Y (0) = yi ), where β is the diffusion constant and δij is the Kronecker delta. Third, P(X(t) | Y (t)) characterizes the strength of association between inputs and outputs. The greater the association strength, the more rapidly that information about X will be communicated to Y . We parameterize this distribution as P(X(t) = xi |Y (t) = yj ) ∼ 1 + k γik αkj , where αij indicates the frequency of experience with the association between states xi and yj , and γik speciﬁes the similarity between states xi and xk . (Although the representation of states is localist, the γ terms allow us to design in the similarity structure inherent in a distributed representation.) These association strengths are highly constrained by the task structure and the similarity structure and familiarity of the inputs. Fundamental to the framework is the assumption that with each experience, a pathway becomes more efﬁcient at processing an input. Efﬁciency is reﬂected by a shift in the speedaccuracy curve to the left. In Mozer, Colagrosso, and Huber (2002, 2003), we propose two distinct mechanisms to model phenomena of supraliminal priming. First, the association frequencies, αij , are increased following a trial in which xi leads to activation of yj , resulting in more efﬁcient transmission of information, corresponding to an increased slope of the solid line in Figure 1b. The increase is Hebbian, based on the maximum activation achieved by xi and yj : ∆αij = η maxt P(X(t) = xi )P(Y (t) = yj ), where η is a step size. Second, the priors, which serve as a model of the environment, are increased to indicate a greater likelihood of the same output occurring again in the future. In modeling data from supraliminal priming, we found that the increases to association frequencies are long lasting, but the increases to the priors decay over the course of a few minutes or a few trials. As a result, the prior updating does not play into the simulation we report here; we refer the reader to Mozer, Colagrosso, and Huber (2003) for details. 4 Access Consciousness and PIT We have described the operation of a single pathway, but to model any cognitive task, we require a series of pathways in cascade. For a simple choice task, we use a percpetual pathway cascaded to a response pathway. The interconnection between the pathways is achieved by copying the output of the perceptual pathway, Y p (t), to the input of the response pathway, X r (t), at each time t. This multiple-pathway architecture allows us to characterize the notion of access consciousness. Considering the output of the perceptual pathway, access is achieved when: (1) the output representation is sufﬁcient to trigger the correct behavior in the response pathway, and (2) the perceptual and response pathways are functionally interconnected. In more general terms, access for a perceptual pathway output requires that these two condi- tions be met not just for a speciﬁc response pathway, but for arbitrary response pathways (e.g., pathways for naming, choice, evaluation, working memory, etc.). In Mozer and Colagrosso (in preparation) we characterize the sufﬁciency requirements of condition 1; they involve a representation of low entropy that stays active for long enough that the representation can propagate to the next pathway. As we will show, a brieﬂy presented stimulus fails to achieve a representation that supports choice and naming responses. Nonetheless, the stimulus evokes activity in the perceptual pathway. Because perceptual priming depends on the magnitude of the activation in the perceptual pathway, not on the activation being communicated to response pathways, the framework is consistent with the notion of priming occurring in the absence of awareness. 4.1 Simulation of Bar and Biederman (1998) Bar and Biederman (1998) presented a sequence of masked line drawings of objects and asked participants to name the objects, even if they had to guess. If the guess was incorrect, participants were required to choose the object name from a set of four alternatives. Unbeknownst to the participant, some of the drawings in the series were repeated, and Bar and Biederman were interested in whether participants would beneﬁt from the ﬁrst presentation even if it could not be identiﬁed. The repeated objects could be the same or a different exemplar of the object, and it could appear in either the same or a different display position. Participants were able to name 13.5% of drawings on presentation 1, but accuracy jumped to 34.5% on presentation 2. Accuracy did improve, though not as much, if the same shape was presented in a different position, but not if a different drawing of the same object was presented, suggesting a locus of priming early in the visual stream. The improvement in accuracy is not due to practice in general, because accuracy rose only 4.0% for novel control objects over the course of the experiment. The priming is ﬁrmly subliminal, because participants were not only unable to name objects on the ﬁrst presentation, but their fouralternative forced choice (4AFC) performance was not much above chance (28.5%). To model these phenomena, we created a response pathway with ﬁfty states representing names of objects that are used in the experiment, e.g., chair and lamp. We also created a perceptual pathway with states representing visual patterns that correspond to the names in the response pathway. Following the experimental design, every object identity was instantiated in two distinct shapes, and every shape could be in one of nine different visualﬁeld positions, leading to 900 distinct states in the perceptual pathway to model the possible visual stimuli. The following parameters were ﬁt to the data. If two perceptual states, xi and xk are the same shape in different positions, they are assigned a similarity coefﬁcient γik = 0.95; all other similarity coefﬁcients are zero. The association frequency, α, for valid associations in the perceptual pathway was 22, and the response pathway 18. Other parameters were β p = .05, β r = .01, and η = 1.0. The PIT model achieves a good ﬁt to the human experimental data (Figure 2). Speciﬁcally, priming is greatest for the same shape in the same position, some priming occurs for the same shape in a different position, and no substantial priming occurs for the different shape. Figure 3a shows the time course of activation of a stimulus representation in the perceptual pathway when the stimulus is presented for 50 iterations, on both the ﬁrst and third presentations. The third presentation was chosen instead of the second to make the effect of priming clearer. Even though a shape cannot be named on the ﬁrst presentation, partial information about the shape may nonetheless be available for report. The 4AFC test of Bar and Biederman provides a more sensitive measure of residual stimulus information. In past work, we modeled forced-choice tasks using a response pathway with only the alternatives under consideration. However, in this experiment, forced-choice performance must be estimated conditional on incorrect naming. In PIT framework, we achieve this using naming and 40 40 First Block Second Block 30 25 20 15 10 5 First Block 35 Percent Correct Naming Percent Correct Naming 35 Second Block 30 25 20 15 10 5 0 0 Control Objects Prime SHAPE: Same Objects POSITION: Same Same Different Different Different Second Same Different Control Control Objects Prime SHAPE: Same Objects POSITION: Same Same Different Different Different Second Same Different Control Figure 2: (left panel) Data from Bar and Biederman (1998) (right panel) Simulation of PIT. White bar: accuracy on ﬁrst presentation of a prime object. Black bars: the accuracy when the object is repeated, either with the same or different shape, and in the same or different position. Grey bars: accuracy for control objects at the beginning and the end of the experiment. forced-choice output pathways having output distributions N (t) and F (t), which are linked via the perceptual state, Y p (t). F (t) must be reestimated with the evidence that N (t) is not the target state. This inference problem is intractable. We therefore used a shortcut in which a single response pathway is used, augmented with a simple three-node belief net (Figure 3b) to capture the dependence between naming and forced choice. The belief net has a response pathway node Y r (t) connected to F (t) and N (t), with conditional distribution P (N (t) = ni |Y r (t) = yj ) = θδij + (1 − θ)/|Y r |, and an analogous distribution for P (F (t) = fi |Y r (t) = yj ). The free parameter θ determines how veridically naming and forced-choice actions reﬂect response-pathway output. Over a range of θ, θ < 1, the model obtains forced-choice performance near chance on the ﬁrst presentation when the naming response is incorrect. For example, with θ = 0.72, the model produces a forced-choice accuracy on presentation 1 of 26.1%. (Interestingly, the model also produces below chance performance on presentation 2 if the object is not named correctly—23.5%—which is also found in the human data—20.0%.) Thus, by the stringent criterion of 4AFC, the model shows no access consciousness, and therefore illustrates a dissociation between priming and access consciousness. In our simulation, we followed the procedure of Bar and Biederman by including distractor alternatives with visual and semantic similarity to the target. These distractors are critical: with unrelated distractors, the model’s 4AFC performance is signiﬁcantly above chance, illustrating that a perceptual representation can be adequate to support some responses but not others, as Farah et al. (1994) also argued. 4.2 Simulation of Abrams and Greenwald (2000) During an initial phase of the experiment, participants categorized 24 clearly visible target words as pleasant (e.g., HUMOR) or unpleasant (e.g., SMUT). They became quite familiar with the task by categorizing each word a total of eight times. In a second phase, participants were asked to classify the same targets and were given a response deadline to induce errors. The targets were preceded by masked primes that could not be identiﬁed. Of interest is the effective valence (or EV) of the target for different prime types, deﬁned as the error rate difference between unpleasant and pleasant targets. A positive (negative) EV indicates that responses are biased toward a pleasant (unpleasant) interpretation by the prime. As one would expect, pleasant primes resulted in a positive EV, unpleasant primes in a negative EV. Of critical interest is the ﬁnding that a nonword prime formed by recombining two pleasant targets (e.g., HULIP from HUMOR and TULIP) or unpleasant targets (e.g., BIUT from BILE and SMUT ) also served to bias the targets. More surprising, a positive EV resulted from unpleasant prime words formed by recombining two pleasant targets (TUMOR from TULIP and HUMOR ), indicating that subliminal priming arises from word fragments, not words as unitary entities, and providing further evidence for an early locus of subliminal priming. Note that the results depend critically on the ﬁrst phase of the experiment, which gave participants extensive practice on a relatively small set of words that were then used as and recombined to form primes. Words not studied in the ﬁrst phase (orphans) provided Probability 0.6 0.5 0.4 0.3 0.2 0.1 0 object, first presentation object, third presentation different object N(t) F(t) Yr(t) 1 50 1000 (a) (b) Time (msec) Figure 3: (a) Activation of the perceptual representation in PIT as a function of processing iterations Effective Valence on the ﬁrst (thin solid line) and third (thick solid line) presentations of target. (b) Bayes net for performing 4AFC conditional on incorrect naming response. 0.4 Experiment Model 0.3 0.2 0.1 0 targets hulip-type tumor-type orphans Figure 4: Effective valence of primes in the Abrams and Greenwald (2000) experiment for human subjects (black bars) and PIT model (grey bars). HULIP-type primes are almost as strong as target repetitions, and TUMOR-type primes have a positive valence, contrary to the meaning of the word. no signiﬁcant EV effect when used as primes. In this simulation, we used a three pathway model: a perceptual pathway that maps visual patterns to orthography with 200 input states corresponding both to words, nonwords, and nonword recombinations of words; a semantic pathway that maps to 100 distinct lexical/semantic states; and a judgement pathway that maps to two responses, pleasant and unpleasant. In the perceptual pathway, similarity structure was based on letter overlap, so that HULIP was similar to both TULIP and HUMOR, with γ = 0.837. No similarity was assumed in the semantic state representation; consistent with the previous simulation, β p = .05, β s = .01, β j = .01, and η = .01. At the outset of the simulation, α frequencies for correct associations were 15, 19, and 25 in the perceptual, semantic, and judgement pathways. The initial phase of the experiment was simulated by repeated supraliminal presentation of words, which increased the association frequencies in all three pathways through the ∆αij learning rule. Long-term supraliminal priming is essential in establishing the association strengths, as we’ll explain. Short-term subliminal priming also plays a key role in the experiment. During the second phase of the experiment, residual activity from the prime—primarily in the judgement pathway—biases the response to the target. Residual activation of the prime is present even if the representation of the prime does not reach sufﬁcient strength that it could be named or otherwise reported. The outcome of the simulation is consistent with the human data (Figure 4). When a HULIP -type prime is presented, HUMOR and TULIP become active in the semantic pathway because of their visual similarity to HULIP. Partial activation of these two practiced words pushes the judgement pathway toward a pleasant response, resulting in a positive EV. When a TUMOR-type prime is presented, three different words become active in the semantic pathway: HUMOR, TULIP, and TUMOR itself. Although TUMOR is more active, it was not one of the words studied during the initial phase of the experiment, and as a result, it has a relatively weak association to the unpleasant judgement, in contrast to the other two words which have strong associations to the pleasant judgement. Orphan primes have little effect because they were not studied during the initial phase of the experiment, and consequently their association to pleasant and unpleasant judgements is also weak. In summary, activation of the prime along a critical, well-practiced pathway may not be sufﬁcient to support an overt naming response, yet it may be sufﬁcient to bias the processing of the immediately following target. 5 Discussion An important contribution of this work has been to demonstrate that speciﬁc experimental results relating to access consciousness and subliminal priming can be interpreted in a concrete computational framework. By necessity, the PIT framework, which we previously used to model supraliminal priming data, predicts the existence of subliminal priming, because the mechanisms giving rise to priming depend on degree of activation of a representation, whereas the processes giving rise to access consciousness also depend on the temporal persistence of a representation. Another contribution of this work has been to argue that two previous computational models each tell only part of the story. Farah et al. argue that quality of representation is critical; Dehaene et al. argue that pathways to communicate representations is critical. The PIT framework argues that both of these features are necessary for access consciousness. Although the PIT framework is not completely developed, it nonetheless makes a clear prediction: that subliminal priming is can never be stronger than supraliminal priming, because the maximal activation of subliminal primes is never greater than that of supraliminal primes. One might argue that many theoretical frameworks might predict the same, but no other computational model is sufﬁciently well developed—in terms of addressing both priming and access consciousness—to make this prediction. In its current stage of development, a weakness of the PIT framework is that it is silent as to how perceptual and response pathways become ﬂexibly interconnected based on task demands. However, the PIT framework is not alone in failing to address this critical issue: The Dehaene et al. model suggests that once a representation enters the global workspace, all response modules can access it, but the model does not specify how the appropriate perceptual module wins the competition to enter the global workspace, or how the appropriate response module is activated. Clearly, ﬂexible cognitive control structures that perform these functions are intricately related to mechanisms of consciousness. Acknowledgments This research was supported by NIH/IFOPAL R01 MH61549–01A1. References Abrams, R. L., & Greenwald, A. G. (2000). Parts outweigh the whole (word) in unconscious analysis of meaning. Psychological Science, 11(2), 118–124. Baars, B. (1989). A cognitive theory of consciousness. Cambridge: Cambridge University Press. Bar, M., & Biederman, I. (1998). Subliminal visual priming. Psychological Science, 9(6), 464–468. Block, N. (1995). On a confusion about a function of consciousness. Brain and Behavioral Sciences, 18(2), 227–247. Dehaene, S., & Naccache, L. (2001). Towards a cognitive neuroscience of consciousness: basic evidence and a workspace framework. Cognition, 79, 1–37. Dehaene, S., Sergent, C., & Changeux, J.-P. (2003). A neuronal network model linking subjective reports and objective physiological data during conscious perception. Proceedings of the National Academy of Sciences, 100, 8520–8525. Farah, M. J., O’Reilly, R. C., & Vecera, S. P. (1994). Dissociated overt and covert recognition as an emergent property of a lesioned neural network. Psychological Review, 100, 571–588. Mathis, D. W., & Mozer, M. C. (1996). Conscious and unconscious perception: a computational theory. In G. Cottrell (Ed.), Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society (pp. 324–328). Hillsdale, NJ: Erlbaum & Associates. Mozer, M. C., Colagrosso, M. D., & Huber, D. E. (2002). A rational analysis of cognitive control in a speeded discrimination task. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press. Mozer, M. C., Colagrosso, M. D., & Huber, D. E. (2003). Mechanisms of long-term repetition priming and skill reﬁnement: A probabilistic pathway model. In Proceedings of the TwentyFifth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum Associates.

6 0.44884071 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

7 0.43997046 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons

8 0.41563293 21 nips-2004-An Information Maximization Model of Eye Movements

9 0.40250602 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data

10 0.34106818 35 nips-2004-Chemosensory Processing in a Spiking Model of the Olfactory Bulb: Chemotopic Convergence and Center Surround Inhibition

11 0.33463299 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process

12 0.33059657 158 nips-2004-Sampling Methods for Unsupervised Learning

13 0.32987836 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data

14 0.32456863 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

15 0.32402602 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

16 0.31759909 136 nips-2004-On Semi-Supervised Classification

17 0.31515375 155 nips-2004-Responding to Modalities with Different Latencies

18 0.31403363 163 nips-2004-Semi-parametric Exponential Family PCA

19 0.31290302 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces

20 0.31174517 184 nips-2004-The Cerebellum Chip: an Analog VLSI Implementation of a Cerebellar Model of Classical Conditioning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.098), (15, 0.099), (26, 0.052), (31, 0.035), (33, 0.161), (35, 0.052), (39, 0.021), (50, 0.033), (56, 0.326), (71, 0.015), (87, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81027907 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

Author: Alan Stocker, Eero P. Simoncelli

2 0.75372082 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images

Author: Erik G. Learned-miller, Parvez Ahammad

Abstract: The correction of bias in magnetic resonance images is an important problem in medical image processing. Most previous approaches have used a maximum likelihood method to increase the likelihood of the pixels in a single image by adaptively estimating a correction to the unknown image bias ﬁeld. The pixel likelihoods are deﬁned either in terms of a pre-existing tissue model, or non-parametrically in terms of the image’s own pixel values. In both cases, the speciﬁc location of a pixel in the image is not used to calculate the likelihoods. We suggest a new approach in which we simultaneously eliminate the bias from a set of images of the same anatomy, but from different patients. We use the statistics from the same location across different images, rather than within an image, to eliminate bias ﬁelds from all of the images simultaneously. The method builds a “multi-resolution” non-parametric tissue model conditioned on image location while eliminating the bias ﬁelds associated with the original image set. We present experiments on both synthetic and real MR data sets, and present comparisons with other methods. 1

3 0.74362808 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are speciﬁed. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semideﬁnite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and signiﬁcantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1

4 0.64736646 145 nips-2004-Parametric Embedding for Class Visualization

Author: Tomoharu Iwata, Kazumi Saito, Naonori Ueda, Sean Stromsten, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. PE simultaneously embeds both objects and their classes in a low-dimensional space. PE takes as input a set of class posterior vectors for given data points, and tries to preserve the posterior structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a Gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classiﬁer’s behavior in supervised, semi-supervised and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. 1

5 0.57348591 125 nips-2004-Multiple Relational Embedding

Author: Roland Memisevic, Geoffrey E. Hinton

Abstract: We describe a way of using multiple different types of similarity relationship to learn a low-dimensional embedding of a dataset. Our method chooses different, possibly overlapping representations of similarity by individually reweighting the dimensions of a common underlying latent space. When applied to a single similarity relation that is based on Euclidean distances between the input data points, the method reduces to simple dimensionality reduction. If additional information is available about the dataset or about subsets of it, we can use this information to clean up or otherwise improve the embedding. We demonstrate the potential usefulness of this form of semi-supervised dimensionality reduction on some simple examples. 1

6 0.56586957 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations

7 0.56501353 131 nips-2004-Non-Local Manifold Tangent Learning

8 0.56449848 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

9 0.56261373 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields

10 0.56075287 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

11 0.56011546 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

12 0.55941015 192 nips-2004-The power of feature clustering: An application to object detection

13 0.5592714 163 nips-2004-Semi-parametric Exponential Family PCA

14 0.55927122 102 nips-2004-Learning first-order Markov models for control

15 0.55837262 160 nips-2004-Seeing through water

16 0.5583533 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

17 0.55769104 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

18 0.55587018 69 nips-2004-Fast Rates to Bayes for Kernel Machines

19 0.55584651 127 nips-2004-Neighbourhood Components Analysis

20 0.55570889 28 nips-2004-Bayesian inference in spiking neurons