nips nips2009 nips2009-216 knowledge-graph by maker-knowledge-mining

216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

Source: pdf

Author: Matthew Wilder, Matt Jones, Michael C. Mozer

Abstract: Across a wide range of cognitive tasks, recent experience inﬂuences behavior. For example, when individuals repeatedly perform a simple two-alternative forcedchoice task (2AFC), response latencies vary dramatically based on the immediately preceding trial sequence. These sequential effects have been interpreted as adaptation to the statistical structure of an uncertain, changing environment (e.g., Jones and Sieck, 2003; Mozer, Kinoshita, and Shettel, 2007; Yu and Cohen, 2008). The Dynamic Belief Model (DBM) (Yu and Cohen, 2008) explains sequential effects in 2AFC tasks as a rational consequence of a dynamic internal representation that tracks second-order statistics of the trial sequence (repetition rates) and predicts whether the upcoming trial will be a repetition or an alternation of the previous trial. Experimental results suggest that ﬁrst-order statistics (base rates) also inﬂuence sequential effects. We propose a model that learns both ﬁrst- and second-order sequence properties, each according to the basic principles of the DBM but under a uniﬁed inferential framework. This model, the Dynamic Belief Mixture Model (DBM2), obtains precise, parsimonious ﬁts to data. Furthermore, the model predicts dissociations in behavioral (Maloney, Martello, Sahm, and Spillmann, 2005) and electrophysiological studies (Jentzsch and Sommer, 2002), supporting the psychological and neurobiological reality of its two components. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sequential effects reﬂect parallel learning of multiple environmental regularities Matthew H. [sent-1, score-0.164]

2 For example, when individuals repeatedly perform a simple two-alternative forcedchoice task (2AFC), response latencies vary dramatically based on the immediately preceding trial sequence. [sent-6, score-0.384]

3 These sequential effects have been interpreted as adaptation to the statistical structure of an uncertain, changing environment (e. [sent-7, score-0.356]

4 Furthermore, the model predicts dissociations in behavioral (Maloney, Martello, Sahm, and Spillmann, 2005) and electrophysiological studies (Jentzsch and Sommer, 2002), supporting the psychological and neurobiological reality of its two components. [sent-14, score-0.144]

5 Sequential effects play a ubiquitous role in our lives—our actions are constantly affected by our recent experiences. [sent-21, score-0.164]

6 In controlled environments, sequential effects have been observed across a wide range of tasks and experimental paradigms, and aspects of cognition ranging from perception to memory to language to decision making. [sent-22, score-0.332]

7 Sequential effects often occur without awareness and cannot be overriden by instructions, suggesting a robust cognitive inclination to adapt behavior in an ongoing manner. [sent-23, score-0.164]

8 4899) (b) The ﬁt to the same data obtained from DBM2 in which probability estimates are derived from both ﬁrst-order and second-order trial statistics. [sent-33, score-0.242]

9 Progress toward understanding the intricate complexities of sequential effects will no doubt provide important insights into the ways in which individuals adapt to their environment and make predictions about future outcomes. [sent-39, score-0.467]

10 One classic domain where reliable sequential effects have been observed is in two-alternative forcedchoice (2AFC) tasks (e. [sent-40, score-0.33]

11 In this type of task, participants are shown one of two different stimuli, which we denote as X and Y, and are instructed to respond as quickly as possible by mapping the stimulus to a corresponding response, say pressing the left button for X and the right button for Y. [sent-44, score-0.181]

12 To measure sequential effects, the RT is conditioned on the recent trial history. [sent-46, score-0.409]

13 ) Consider a sequence such as XY Y XX, where the rightmost symbol is the current trial (X), and the symbols to the left are successively earlier trials. [sent-49, score-0.269]

14 Such a four-back trial history can be represented in a manner that focuses not on the trial identities, but on whether trials are repeated or alternated. [sent-50, score-0.577]

15 With R and A denoting repetitions and alternations, respectively, the trial sequence XY Y XX can be encoded as ARAR. [sent-51, score-0.3]

16 Along the abscissa in Figure 1a are all four-back sequence histories ordered according to the R/A encoding. [sent-55, score-0.167]

17 The left half of the graph represents cases where the current trial is a repetition of the previous, and the right half represents cases where the current trial is an alternation. [sent-56, score-0.753]

18 The trial histories are ordered along the abscissa so that the left half is monotonically increasing and the right half is monotonically decreasing following the same line of intuition, i. [sent-63, score-0.382]

19 2 Toward A Rational Model Of Sequential Effects Many models have been proposed to capture sequential effects, including Estes (1950), Anderson (1960), Laming (1969), and Cho et al. [sent-66, score-0.172]

20 Other models have interpreted sequential effects as adaptation to the statistical structure of a dynamic environment (e. [sent-68, score-0.387]

21 In this same vein, Yu and Cohen (2008) recently suggested a rational 2 C t-1 C t-1 C t-1 Ct γ t-1 γ R t-1 Ct γ t-1 t Rt (a) φ t-1 γ S t-1 (b) Ct φt γ t-1 t St γt S t-1 St (c) Figure 2: Three graphical models that capture sequential dependencies. [sent-71, score-0.207]

22 (b) A reformulation of DBM in which the output variable, St , is the actual stimulus identity instead of the repetition/alternation representation used in DBM. [sent-73, score-0.214]

23 explanation for sequential effects such as those observed in Cho et al. [sent-76, score-0.336]

24 The key contribution of this work is that it provides a rational justiﬁcation for sequential effects that have been previously viewed as resulting from low-level brain mechanisms such as residual neural activation. [sent-79, score-0.409]

25 DBM describes performance in 2AFC tasks as Bayesian inference over whether the next trial in the sequence will be a repetition or an alternation of the previous trial, conditioned on the trial history. [sent-80, score-0.913]

26 If Rt is the Bernoulli random variable that denotes whether trial t is a repetition (Rt = 1) or alternation (Rt = 0) of the previous trial, DBM determines P (Rt |Rt−1 ), where Rt−1 denotes the trial sequence preceding trial t, i. [sent-81, score-1.128]

27 According to the generative model, the environment is nonstationary and γt can either retain the same value as on trial t − 1 or it can change. [sent-89, score-0.32]

28 Before each trial t of a 2AFC task, DBM computes the probability of the upcoming stimulus conditioned on the trial history. [sent-94, score-0.723]

29 of P (Rt = R|Rt−1 ) on repetition trials and of P (Rt = A|Rt−1 ) = 1 - P (Rt = R|Rt−1 ) on alternation trials. [sent-97, score-0.426]

30 1 Intuiting DBM predictions Another contribution of Yu and Cohen (2008) is the mathematical demonstration that DBM is approximately equivalent to an exponential ﬁlter over trial histories. [sent-103, score-0.274]

31 That is, the probability that the current stimulus is a repetition is a weighted sum of past observations, with repetitions being scored as 1 and alternations as 0, and with weights decaying exponentially as a function of lag. [sent-104, score-0.533]

32 The exponential ﬁlter gives insight into how DBM probabilities will vary as a function of trial history. [sent-105, score-0.242]

33 Consider two 4-back trial histories: an alternation followed by two repetitions (ARR−) and two alternations followed by a repetition (AAR−), where the − indicates that the current trial type is unknown. [sent-106, score-0.942]

34 An exponential ﬁlter predicts that ARR− will always create a stronger expectation for an R on the current trial than AAR− will, because the former includes an additional past repetition. [sent-107, score-0.284]

35 Thus, if the current trial is in fact a repetition, the model predicts a faster RT for ARR− compared to AAR− (i. [sent-108, score-0.284]

36 Conversely, if the current trial is an alternation, the model 3 predicts RTARRA > RTAARA . [sent-111, score-0.284]

37 Because DBM functions approximately as an exponential ﬁlter, and the repetition in the trial history is more recent for ARAR than for RAAR, DBM predicts RTARAR < RTRAAR . [sent-121, score-0.595]

38 To understand this mismatch, we consider an alternative representation of the trial history: the ﬁrstorder sequence, i. [sent-123, score-0.242]

39 The two R/A sequences ARAR and RAAR correspond to stimulus sequences XY Y XX and XXY XX. [sent-126, score-0.237]

40 If we consider an exponential ﬁlter on the actual stimulus sequence, we obtain the opposite prediction from that of DBM: RTXY Y XX > RTXXY XX because there are more recent occurrences of X in the latter sequence. [sent-127, score-0.214]

41 Again, DBM also makes a prediction inconsistent with the data, that RTARAA > RTRAAA , whereas an exponential ﬁlter on stimulus values predicts the opposite outcome—RTXY Y XY < RTXXY XY . [sent-129, score-0.223]

42 Of course this analysis leads to predictions for other pairs of points where DBM is consistent with the data and a stimulus based exponential ﬁlter is inconsistent. [sent-130, score-0.213]

43 Nevertheless, the variations in the data suggest that more importance should be given to the actual stimulus values. [sent-131, score-0.214]

44 In general, we can divide the sequential effects observed in the data into two classes: ﬁrst- and second-order effects. [sent-132, score-0.304]

45 First-order sequential effects result from the priming of speciﬁc stimulus or response values. [sent-133, score-0.551]

46 We refer to this as a ﬁrst-order effect because it depends only on the stimulus values rather than a higher-order representation such as the repetition/alternation nature of a trial. [sent-134, score-0.181]

47 These effects correspond to the estimation of the baserate of each stimulus or response value. [sent-135, score-0.476]

48 They are observed in a wide range of experimental paradigms and are referred to as stimulus priming or response priming. [sent-136, score-0.247]

49 the triangular pattern in RT data, can be thought of as a second-order effect because it reﬂects learning of the correlation structure between the current trial and the previous trial. [sent-139, score-0.242]

50 In second-order effects, the actual stimulus value is irrelevant and all that matters is whether the stimulus was a repetition of the previous trial. [sent-140, score-0.664]

51 As DBM proposes, these effects essentially arise from an attempt to estimate the repetition rate of the sequence. [sent-141, score-0.433]

52 DBM naturally produces second-order sequential effects because it abstracts over the stimulus level of description: observations in the model are R and A instead of the actual stimuli X and Y . [sent-142, score-0.555]

53 To gain an understanding of how ﬁrst-order effects could be integrated into this type of Bayesian framework, we reformulate the DBM architecture. [sent-144, score-0.164]

54 Figure 2b shows an equivalent depiction of DBM in which the generative process on trial t produces the actual stimulus value, denoted St . [sent-145, score-0.456]

55 St is conditioned on both the repetition probability, γt , and the previous stimulus value, St−1 . [sent-146, score-0.477]

56 An additional beneﬁt of this reformulated architecture is that it can represent ﬁrst-order effects if we switch the meaning of γ. [sent-150, score-0.206]

57 In particular, we can treat γ as the probability of the stimulus taking on a speciﬁc value (X or Y ) instead of the probability of a repetition. [sent-151, score-0.181]

58 4 3 Dynamic Belief Mixture Model The complex contributions of ﬁrst- and second-order effects to the full pattern of observed sequential effects suggest the need for a model with more explanatory power than DBM. [sent-155, score-0.468]

59 We have shown that the DBM architecture can be reformulated to generate ﬁrst-order effects by having it infer the baserate instead of the repetition rate of the sequence, but the empirical data suggest both mechanisms are present simultaneously. [sent-157, score-0.578]

60 Thus the challenge is to merge these two effects into one model that performs joint inference over both environmental statistics. [sent-158, score-0.164]

61 Importantly, the observed variable, S, is the actual stimulus value instead of the repetition/alternation representation used in DBM. [sent-165, score-0.214]

62 This architecture allows for explicit representation of the baserate, through the direct inﬂuence of φt on the physical stimulus value St , as well as representation of the repetition rate through the joint inﬂuence of γt and the previous stimulus St−1 on St . [sent-166, score-0.673]

63 (2) The iterative prior for the next trial is then a mixture of the posterior from the current trial, weighted by 1 − α, and the reset prior, weighted by α (the probability of change in φ and γ). [sent-171, score-0.316]

64 To account for the overall advantage of repetition trials over alternation trials in the data, a repetition bias had to be built into the reset prior in DBM. [sent-178, score-0.875]

65 In DBM2, the ﬁrst-order component naturally introduces an advantage for repetition trials. [sent-179, score-0.269]

66 The nonuniform reset prior in DBM allows it to be biased either for repetition or alternation. [sent-183, score-0.343]

67 3591) bias, but a replication we performed—with the same stimuli and same responses—obtained a strong alternation bias. [sent-193, score-0.143]

68 It is our hunch that the bias should not be cast as part of the computational theory (speciﬁcally, the prior); rather, the bias reﬂects attentional and perceptual mechanisms at play, which can introduce varying degrees of an alternation bias. [sent-194, score-0.283]

69 Speciﬁcally, four classic effects have been reported in the literature that make it difﬁcult for individuals to process the same stimulus two times in a row at a short lag: attentional blink Raymond et al. [sent-195, score-0.482]

70 (1992), inhibition of return Posner and Cohen (1984), repetition blindness Kanwisher (1987), and the Ranschburg effect Jahnke (1969). [sent-196, score-0.308]

71 For example, with repetition blindness, processing of an item is impaired if it occurs within 500 ms of another instance of the same item in a rapid serial stream; this condition is often satisﬁed with 2AFC. [sent-197, score-0.269]

72 In support of our view that fast-acting secondary mechanisms are at play in 2AFC, Jentzsch and Sommer (Experiment 2) found that using a very short lag between each response and the next stimulus modulated sequential effects in a difﬁcult-to-interpret manner. [sent-198, score-0.589]

73 To allow for various patterns of bias across experiments, we introduced an additional parameter to our model, an offset speciﬁcally for repetition trials, which can serve as a means of removing the inﬂuence of the effects listed above. [sent-200, score-0.515]

74 Notably we see a slight advantage on alternation trials, as opposed to the repetition bias seen in Cho et al. [sent-207, score-0.462]

75 Surprisingly, DBM2 is able to account for the sequential effects in other binary decision tasks that do not ﬁt into the 2AFC paradigm. [sent-208, score-0.304]

76 found that this bias in perceiving rotation was inﬂuenced by the recent trial history. [sent-216, score-0.326]

77 Figure 3b shows the data for this experiment rearranged to be consistent with the R/A orderings used elsewhere (the sequences on the abscissa show the physical stimulus values, ending with Trial t − 1). [sent-217, score-0.291]

78 The bias, conditioned on the 4-back trial history, follows a similar pattern to that seen with RTs in Cho et al. [sent-218, score-0.301]

79 Before each trial, we computed the model’s probability that the next stimulus would be P, and then converted this probability to the PSI bias measure using an afﬁne transform similar to our RT transform. [sent-228, score-0.236]

80 5 EEG evidence for ﬁrst-order and second-order predictions DBM2 proposes that subjects in binary choice tasks track both the baserate and the repetition rate in the sequence. [sent-234, score-0.394]

81 The S-LRP interval measures the time from stimulus onset to response activation on each trial. [sent-240, score-0.247]

82 Interestingly, the S-LRP and LRP-R data exhibit different patterns of sequential effects when conditioned on the 4-back trial histories, as shown in Figure 4. [sent-243, score-0.573]

83 RT on repetition trials increases as more alternations appear in the recent history, and RT on alternation trials shows the opposite dependence. [sent-247, score-0.529]

84 Each ambiguous test stimulus followed three stimuli for which the direction of rotation was unambiguous and to which the subject made no response. [sent-279, score-0.218]

85 The responses to the test stimuli were grouped according to the 3-back stimulus history, and a PSI value was computed for each of the eight histories to measure subjects’ bias toward perceiving positive vs. [sent-280, score-0.448]

86 As in Figure 3b, the histories on the abscissa show the physical stimulus values, ending with Trial t − 1, and the arrangement of these histories is consistent with the R/A orderings used elsewhere in this paper. [sent-283, score-0.409]

87 DBM2’s explanation of Jentzsch and Sommer’s EEG results indicates that ﬁrst-order sequential effects arise in response processing and second-order effects arise in stimulus processing. [sent-284, score-0.715]

88 Therefore, the model predicts that, in the absence of prior responses, sequential effects will follow a pure second-order pattern. [sent-285, score-0.346]

89 Just as in the S-LRP data of Jentzsch and Sommer (2002), the ﬁrst-order effects have mostly disappeared, and the data are well explained by a pure second-order effect (i. [sent-288, score-0.164]

90 , a stronger bias for alternation when there are more alternations in the history, and vice versa). [sent-290, score-0.213]

91 For this experiment we used different afﬁne transformation values than in Experiment 1 because the modiﬁcations in the experimental design led to a generally weaker sequential effect, which we speculate to have been due to lesser engagement by subjects when fewer responses were needed. [sent-294, score-0.227]

92 the systematic discrepancies from DBM) pointed to an improved rational analysis and an elaborated generative model (DBM2) that is grounded in both ﬁrst- and second-order sequential statistics. [sent-300, score-0.207]

93 In turn, the conceptual organization of the new rational model suggested a psychological architecture (i. [sent-301, score-0.147]

94 , separate representation of baserates and repetition rates) that was borne out in further data. [sent-303, score-0.295]

95 ’s intermittent-response experiment suggest that the statistics individuals track are differentially tied to the stimuli and responses in the task. [sent-306, score-0.146]

96 That is, rather than learning statistics of the abstract trial sequence, individuals learn the baserates (i. [sent-307, score-0.318]

97 , marginal probabilities) of responses and the repetition rates (i. [sent-309, score-0.298]

98 This division suggests further hypotheses about both the empirical nature and the psychological representation of stimulus sequences and of response sequences, which future experiments and statistical analyses will hopefully shed light on. [sent-312, score-0.313]

99 Functional localization and mechanisms of sequential effects in serial reaction time tasks. [sent-341, score-0.376]

100 Mechanisms underlying dependencies of performance on stimulus history in a two-alternative forced-choice task. [sent-363, score-0.223]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dbm', 0.547), ('repetition', 0.269), ('trial', 0.242), ('jentzsch', 0.208), ('rt', 0.186), ('stimulus', 0.181), ('cho', 0.171), ('sommer', 0.169), ('st', 0.165), ('effects', 0.164), ('maloney', 0.148), ('lrp', 0.143), ('sequential', 0.14), ('alternation', 0.106), ('aaar', 0.104), ('rrrr', 0.104), ('arar', 0.091), ('raar', 0.091), ('rrra', 0.091), ('histories', 0.088), ('aaaa', 0.078), ('aarr', 0.078), ('rrar', 0.078), ('reset', 0.074), ('psi', 0.068), ('rational', 0.067), ('response', 0.066), ('aara', 0.065), ('araa', 0.065), ('arra', 0.065), ('arrr', 0.065), ('baserate', 0.065), ('raaa', 0.065), ('rara', 0.065), ('rarr', 0.065), ('rraa', 0.065), ('behavioral', 0.064), ('bias', 0.055), ('alternations', 0.052), ('environment', 0.052), ('abscissa', 0.052), ('arr', 0.052), ('trials', 0.051), ('ct', 0.05), ('individuals', 0.05), ('xy', 0.05), ('eeg', 0.047), ('rts', 0.046), ('federer', 0.046), ('lter', 0.043), ('architecture', 0.042), ('history', 0.042), ('jones', 0.042), ('predicts', 0.042), ('cohen', 0.042), ('xx', 0.042), ('mozer', 0.042), ('aar', 0.039), ('blindness', 0.039), ('kinoshita', 0.039), ('psychological', 0.038), ('mechanisms', 0.038), ('stimuli', 0.037), ('reaction', 0.034), ('actual', 0.033), ('et', 0.032), ('predictions', 0.032), ('upcoming', 0.031), ('circled', 0.031), ('nadal', 0.031), ('dynamic', 0.031), ('repetitions', 0.031), ('experiment', 0.03), ('attentional', 0.029), ('perceiving', 0.029), ('responses', 0.029), ('toward', 0.029), ('perception', 0.028), ('subjects', 0.028), ('belief', 0.028), ('sequences', 0.028), ('yu', 0.028), ('conditioned', 0.027), ('offset', 0.027), ('sequence', 0.027), ('nonstationary', 0.026), ('baserates', 0.026), ('blink', 0.026), ('forcedchoice', 0.026), ('martello', 0.026), ('posner', 0.026), ('ranschburg', 0.026), ('raymond', 0.026), ('rtarar', 0.026), ('rtarra', 0.026), ('rtarrr', 0.026), ('rtraar', 0.026), ('rtxxy', 0.026), ('rtxy', 0.026), ('sahm', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

Author: Matthew Wilder, Matt Jones, Michael C. Mozer

2 0.13188472 177 nips-2009-On Learning Rotations

Author: Raman Arora

Abstract: An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the von Neumann divergence. The multiplicative updates are exponentiated skew-symmetric matrices which comprise the Lie algebra of the rotation group. The orthonormality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to intuitive interpretation in terms of the differential geometry of the manifold associated with the rotation group. A complexity reduction result is presented that exploits the eigenstructure of the matrix updates to simplify matrix exponentiation to a quadratic form. 1

3 0.077288352 45 nips-2009-Beyond Convexity: Online Submodular Minimization

Author: Elad Hazan, Satyen Kale

Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efﬁcient and are Hannan-consistent in both the full information and bandit settings. 1

4 0.076021411 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge

Abstract: Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint secondorder statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural ﬁring patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and ﬁtting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the ﬁring patterns of neural ensembles and the stimuli they are processing. 1

5 0.06393937 154 nips-2009-Modeling the spacing effect in sequential category learning

Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille

Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We deﬁne conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efﬁciently perform model selection to decide whether observations are from one or multiple categories.

6 0.060726218 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

7 0.059426341 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

8 0.053901061 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

9 0.052917775 184 nips-2009-Optimizing Multi-Class Spatio-Spectral Filters via Bayes Error Estimation for EEG Classification

10 0.051768273 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

11 0.051032118 152 nips-2009-Measuring model complexity with the prior predictive

12 0.050565332 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

13 0.049939424 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

14 0.048609309 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

15 0.048100278 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

16 0.047782715 52 nips-2009-Code-specific policy gradient rules for spiking neurons

17 0.047384158 43 nips-2009-Bayesian estimation of orientation preference maps

18 0.04543734 237 nips-2009-Subject independent EEG-based BCI decoding

19 0.045003045 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

20 0.042203091 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.119), (1, -0.052), (2, 0.103), (3, -0.037), (4, 0.046), (5, 0.043), (6, -0.015), (7, 0.011), (8, -0.01), (9, -0.029), (10, 0.026), (11, -0.012), (12, 0.017), (13, -0.054), (14, 0.068), (15, 0.01), (16, -0.024), (17, 0.072), (18, -0.143), (19, 0.076), (20, 0.027), (21, -0.01), (22, 0.052), (23, -0.064), (24, 0.038), (25, 0.003), (26, 0.012), (27, 0.041), (28, 0.006), (29, -0.011), (30, 0.011), (31, 0.024), (32, -0.057), (33, 0.021), (34, -0.02), (35, 0.07), (36, 0.065), (37, -0.013), (38, -0.084), (39, 0.104), (40, 0.07), (41, 0.034), (42, 0.017), (43, 0.004), (44, 0.015), (45, 0.003), (46, -0.16), (47, 0.087), (48, 0.129), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94253838 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

Author: Matthew Wilder, Matt Jones, Michael C. Mozer

2 0.59129626 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer

Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a signiﬁcant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the inﬂuence of a particular study schedule on retention for speciﬁc material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that ﬂuctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1

3 0.49446303 25 nips-2009-Adaptive Design Optimization in Experiments with People

Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt

Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efﬁciently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to ﬁnd designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1

4 0.49422902 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

Author: Mark Steyvers, Brent Miller, Pernille Hemmer, Michael D. Lee

Abstract: When individuals independently recollect events or retrieve facts from memory, how can we aggregate these retrieved memories to reconstruct the actual set of events or facts? In this research, we report the performance of individuals in a series of general knowledge tasks, where the goal is to reconstruct from memory the order of historic events , or the order of items along some physical dimension. We introduce two Bayesian models for aggregating order information based on a Thurstonian approach and Mallows model. Both models assume that each individual's reconstruction is based on either a random permutation of the unobserved ground truth, or by a pure guessing strategy. We apply MCMC to make inferences about the underlying truth and the strategies employed by individuals. The models demonstrate a

5 0.47154644 152 nips-2009-Measuring model complexity with the prior predictive

Author: Wolf Vanpaemel

Abstract: In the last few decades, model complexity has received a lot of press. While many methods have been proposed that jointly measure a model’s descriptive adequacy and its complexity, few measures exist that measure complexity in itself. Moreover, existing measures ignore the parameter prior, which is an inherent part of the model and affects the complexity. This paper presents a stand alone measure for model complexity, that takes the number of parameters, the functional form, the range of the parameters and the parameter prior into account. This Prior Predictive Complexity (PPC) is an intuitive and easy to compute measure. It starts from the observation that model complexity is the property of the model that enables it to ﬁt a wide range of outcomes. The PPC then measures how wide this range exactly is. keywords: Model Selection & Structure Learning; Model Comparison Methods; Perception The recent revolution in model selection methods in the cognitive sciences was driven to a large extent by the observation that computational models can differ in their complexity. Differences in complexity put models on unequal footing when their ability to approximate empirical data is assessed. Therefore, models should be penalized for their complexity when their adequacy is measured. The balance between descriptive adequacy and complexity has been termed generalizability [1, 2]. Much attention has been devoted to developing, advocating, and comparing different measures of generalizability (for a recent overview, see [3]). In contrast, measures of complexity have received relatively little attention. The aim of the current paper is to propose and illustrate a stand alone measure of model complexity, called the Prior Predictive Complexity (PPC). The PPC is based on the intuitive idea that a complex model can predict many outcomes and a simple model can predict a few outcomes only. First, I discuss existing approaches to measuring model complexity and note some of their limitations. In particular, I argue that currently existing measures ignore one important aspect of a model: the prior distribution it assumes over the parameters. I then introduce the PPC, which, unlike the existing measures, is sensitive to the parameter prior. Next, the PPC is illustrated by calculating the complexities of two popular models of information integration. 1 Previous approaches to measuring model complexity A ﬁrst approach to assess the (relative) complexity of models relies on simulated data. Simulationbased methods differ in how these artiﬁcial data are generated. A ﬁrst, atheoretical approach uses random data [4, 5]. In the semi-theoretical approach, the data are generated from some theoretically ∗ I am grateful to Michael Lee and Liz Bonawitz. 1 interesting functions, such as the exponential or the logistic function [4]. Using these approaches, the models under consideration are equally complex if each model provides the best optimal ﬁt to roughly the same number of data sets. A ﬁnal approach to generating artiﬁcial data is a theoretical one, in which the data are generated from the models of interest themselves [6, 7]. The parameter sets used in the generation can either be hand-picked by the researcher, estimated from empirical data or drawn from a previously speciﬁed distribution. If the models under consideration are equally complex, each model should provide the best optimal ﬁt to self-generated data more often than the other models under consideration do. One problem with this simulation-based approach is that it is very labor intensive. It requires generating a large amount of artiﬁcial data sets, and ﬁtting the models to all these data sets. Further, it relies on choices that are often made in an arbitrary fashion that nonetheless bias the results. For example, in the semi-theoretical approach, a crucial choice is which functions to use. Similarly, in the theoretical approach, results are heavily inﬂuenced by the parameter values used in generating the data. If they are ﬁxed, on what basis? If they are estimated from empirical data, from which data? If they are drawn randomly, from which distribution? Further, a simulation study only gives a rough idea of complexity differences but provides no direct measure reﬂecting the complexity. A number of proposals have been made to measure model complexity more directly. Consider a model M with k parameters, summarized in the parameter vector θ = (θ1 , θ2 , . . . , θk , ) which has a range indicated by Ω. Let d denote the data and p(d|θ, M ) the likelihood. The most straightforward measure of model complexity is the parametric complexity (PC), which simply counts the number of parameters: PC = k. (1) PC is attractive as a measure of model complexity since it is very easy to calculate. Further, it has a direct and well understood relation toward complexity: the more parameters, the more complex the model. It is included as the complexity term of several generalizability measures such as AIC [8] and BIC [9], and it is at the heart of the Likelihood Ratio Test. Despite this intuitive appeal, PC is not free from problems. One problem with PC is that it reﬂects only a single aspect of complexity. Also the parameter range and the functional form (the way the parameters are combined in the model equation) inﬂuence a model’s complexity, but these dimensions of complexity are ignored in PC [2, 6]. A complexity measure that takes these three dimensions into account is provided by the geometric complexity (GC) measure, which is inspired by differential geometry [10]. In GC, complexity is conceptualized as the number of distinguishable probability distributions a model can generate. It is deﬁned by GC = k n ln + ln 2 2π det I(θ|M )dθ, (2) Ω where n indicates the size of the data sample and I(θ) is the Fisher Information Matrix: Iij (θ|M ) = −Eθ ∂ 2 ln p(d|θ, M ) . ∂θi ∂θj (3) Note that I(θ|M ) is determined by the likelihood function p(d|θ, M ), which is in turn determined by the model equation. Hence GC is sensitive to the number of parameters (through k), the functional form (through I), and the range (through Ω). Quite surprisingly, GC turns out to be equal to the complexity term used in one version of Minimum Description Length (MDL), a measure of generalizability developed within the domain of information theory [2, 11, 12, 13]. GC contrasts favorably with PC, in the sense that it takes three dimensions of complexity into account rather than a single one. A major drawback of GC is that, unlike PC, it requires considerable technical sophistication to be computed, as it relies on the second derivative of the likelihood. A more important limitation of both PC and GC is that these measures are insensitive to yet another important dimension contributing to model complexity: the prior distribution over the model parameters. The relation between the parameter prior distribution and model complexity is discussed next. 2 2 Model complexity and the parameter prior The growing popularity of Bayesian methods in psychology has not only raised awareness that model complexity should be taken into account when testing models [6], it has also drawn attention to the fact that in many occasions, relevant prior information is available [14]. In Bayesian methods, there is room to incorporate this information in two different ﬂavors: as a prior distribution over the models, or as a prior distribution over the parameters. Specifying a model prior is a daunting task, so almost invariably, the model prior is taken to be uniform (but see [15] for an exception). In contrast, information regarding the parameter is much easier to include, although still challenging (e.g., [16]). There are two ways to formalize prior information about a model’s parameters: using the parameter prior range (often referred to as simply the range) and using the parameter prior distribution (often referred to as simply the prior). The prior range indicates which parameter values are allowed and which are forbidden. The prior distribution indicates which parameter values are likely and which are unlikely. Models that share the same equation and the same range but differ in the prior distribution can be considered different models (or at least different model versions), just like models that share the same equation but differ in range are different model versions. Like the parameter prior range, the parameter prior distribution inﬂuences the model complexity. In general, a model with a vague parameter prior distribution is more complex than a model with a sharply peaked parameter prior distribution, much as a model with a broad-ranged parameter is more complex than the same model where the parameter is heavily restricted. To drive home the point that the parameter prior should be considered when model complexity is assessed, consider the following “fair coin” model Mf and a “biased coin” model Mb . There is a clear intuitive complexity difference between these models: Mb is more complex than Mf . The most straightforward way to formalize these models is as follows, where ph denotes the probability of observing heads: ph = 1/2, (4) ph = θ 0≤θ≤1 p(θ) = 1, (5) for model Mf and the triplet of equations jointly deﬁne model Mb . The range forbids values smaller than 0 or greater than 1 because ph is a proportion. As Mf and Mb have a different number of parameters, both PC and GC, being sensitive to the number of parameters, pick up the difference in model complexity between the models. Alternatively, model Mf could be deﬁned as follows: ph = θ 0≤θ≤1 1 p(θ) = δ(θ − ), 2 (6) where δ(x) is the Dirac delta. Note that the model formalized in Equation 6 is exactly identical the model formalized in Equation 4. However, relying on the formulation of model Mf in Equation 6, PC and GC now judge Mf and Mb to be equally complex: both models share the same model equation (which implies they have the same number of parameters and the same functional form) and the same range for the parameter. Hence, PC and GC make an incorrect judgement of the complexity difference between both models. This misjudgement is a direct result of the insensitivity of these measures to the parameter prior. As models Mf and Mb have different prior distributions over their parameter, a measure sensitive to the prior would pick up the complexity difference between these models. Such a measure is introduced next. 3 The Prior Predictive Complexity Model complexity refers to the property of the model that enables it to predict a wide range of data patterns [2]. The idea of the PPC is to measure how wide this range exactly is. A complex model 3 can predict many outcomes, and a simple model can predict a few outcomes only. Model simplicity, then, refers to the property of placing restrictions on the possible outcomes: the greater restrictions, the greater the simplicity. To understand how model complexity is measured in the PPC, it is useful to think about the universal interval (UI) and the predicted interval (PI). The universal interval is the range of outcomes that could potentially be observed, irrespective of any model. For example, in an experiment with n binomial trials, it is impossible to observe less that zero successes, or more than n successes, so the range of possible outcomes is [0, n] . Similarly, the universal interval for a proportion is [0, 1]. The predicted interval is the interval containing all outcomes the model predicts. An intuitive way to gauge model complexity is then the cardinality of the predicted interval, relative to the cardinality of the universal interval, averaged over all m conditions or stimuli: PPC = 1 m m i=1 |PIi | . |UIi | (7) A key aspect of the PPC is deriving the predicted interval. For a parameterized likelihood-based model, prediction takes the form of a distribution over all possible outcomes for some future, yet-tobe-observed data d under some model M . This distribution is called the prior predictive distribution (ppd) and can be calculated using the law of total probability: p(d|M ) = p(d|θ, M )p(θ|M )dθ. (8) Ω Predicting the probability of unseen future data d arising under the assumption that model M is true involves integrating the probability of the data for each of the possible parameter values, p(d|θ, M ), as weighted by the prior probability of each of these values, p(θ|M ). Note that the ppd relies on the number of parameters (through the number of integrals and the likelihood), the model equation (through the likelihood), and the parameter range (through Ω). Therefore, as GC, the PPC is sensitive to all these aspects. In contrast to GC, however, the ppd, and hence the PPC, also relies on the parameter prior. Since predictions are made probabilistically, virtually all outcomes will be assigned some prior weight. This implies that, in principle, the predicted interval equals the universal interval. However, for some outcomes the assigned weight will be extremely small. Therefore, it seems reasonable to restrict the predicted interval to the smallest interval that includes some predetermined amount of the prior mass. For example, the 95% predictive interval is deﬁned by those outcomes with the highest prior mass that together make up 95% of the prior mass. Analytical solutions to the integral deﬁning the ppd are rarely available. Instead, one should rely on approximations to the ppd by drawing samples from it. In the current study, sampling was performed using WinBUGS [17, 18], a highly versatile, user friendly, and freely available software package. It contains sophisticated and relatively general-purpose Markov Chain Monte Carlo (MCMC) algorithms to sample from any distribution of interest. 4 An application example The PPC is illustrated by comparing the complexity of two popular models of information integration, which attempt to account for how people merge potentially ambiguous or conﬂicting information from various sensorial sources to create subjective experience. These models either assume that the sources of information are combined additively (the Linear Integration Model; LIM; [19]) or multiplicatively (the Fuzzy Logical Model of Perception; FLMP; [20, 21]). 4.1 Information integration tasks A typical information integration task exposes participants simultaneously to different sources of information and requires this combined experience to be identiﬁed in a forced-choice identiﬁcation task. The presented stimuli are generated from a factorial manipulation of the sources of information by systematically varying the ambiguity of each of the sources. The relevant empirical data consist 4 of, for each of the presented stimuli, the counts km of the number of times the mth stimulus was identiﬁed as one of the response alternatives, out of the tm trials on which it was presented. For example, an experiment in phonemic identiﬁcation could involve two phonemes to be identiﬁed, /ba/ and /da/ and two sources of information, auditory and visual. Stimuli are created by crossing different levels of audible speech, varying between /ba/ and /da/, with different levels of visible speech, also varying between these alternatives. The resulting set of stimuli spans a continuum between the two syllables. The participant is then asked to listen and to watch the speaker, and based on this combined audiovisual experience, to identify the syllable as being either /ba/ or /da/. In the so-called expanded factorial design, not only bimodal stimuli (containing both auditory and visual information) but also unimodal stimuli (providing only a single source of information) are presented. 4.2 Information integration models In what follows, the formal description of the LIM and the FLMP is outlined for a design with two response alternatives (/da/ or /ba/) and two sources (auditory and visual), with I and J levels, respectively. In such a two-choice identiﬁcation task, the counts km follow a Binomial distribution: km ∼ Binomial(pm , tm ), (9) where pm indicates the probability that the mth stimulus is identiﬁed as /da/. 4.2.1 Model equation The probability for the stimulus constructed with the ith level of the ﬁrst source and the jth level of the second being identiﬁed as /da/ is computed according to the choice rule: pij = s (ij, /da/) , s (ij, /da/) + s (ij, /ba/) (10) where s (ij, /da/) represents the overall degree of support for the stimulus to be /da/. The sources of information are assumed to be evaluated independently, implying that different parameters are used for the different modalities. In the present example, the degree of auditory support for /da/ is denoted by ai (i = 1, . . . , I) and the degree of visual support for /da/ by bj (j = 1, . . . , J). When a unimodal stimulus is presented, the overall degree of support for each alternative is given by s (i∗, /da/) = ai and s (∗j, /da/) = bj , where the asterisk (*) indicates the absence of information, implying that Equation 10 reduces to pi∗ = ai and p∗j = bj . (11) When a bimodal stimulus is presented, the overall degree of support for each alternative is based on the integration or blending of both these sources. Hence, for bimodal stimuli, s (ij, /da/) = ai bj , where the operator denotes the combination of both sources. Hence, Equation 10 reduces to ai bj . (12) pij = ai bj + (1 − ai ) (1 − bj ) = +, so Equation 12 becomes The LIM assumes an additive combination, i.e., pij = ai + bj . 2 (13) The FLMP, in contrast, assumes a multiplicative combination, i.e., = ×, so Equation 12 becomes ai bj . ai bj + (1 − ai )(1 − bj ) (14) pij = 5 4.2.2 Parameter prior range and distribution Each level of auditory and visual support for /da/ (i.e., ai and bj , respectively) is associated with a free parameter, which implies that the FLMP and the LIM have an equal number of free parameters, I + J. Each of these parameters is constrained to satisfy 0 ≤ ai , bj ≤ 1. The original formulations of the LIM and FLMP unfortunately left the parameter priors unspeciﬁed. However, an implicit assumption that has been commonly used is a uniform prior for each of the parameters. This assumption implicitly underlies classical and widely adopted methods for model evaluation using accounted percentage of variance or maximum likelihood. ai ∼ Uniform(0, 1) and bi ∼ Uniform(0, 1) for i = 1, . . . , I; j = 1, . . . , J. (15) The models relying on this set of uniform priors will be referred to as LIMu and FLMPu . Note that LIMu and FLMPu treat the different parameters as independent. This approach misses important information. In particular, the experimental design is such that the amount of support for each level i + 1 is always higher than for level i. Because parameter ai (or bi ) corresponds to the degree of auditory (or visual) support for a unimodal stimulus at the ith level, it seems reasonable to expect the following orderings among the parameters to hold (see also [6]): aj > ai and bj > bi for j > i. (16) The models relying on this set of ordered priors will be referred to as LIMo and FLMPo . 4.3 Complexity and experimental design It is tempting to consider model complexity as an inherent characteristic of a model. For some models and for some measures of complexity this is clearly the case. Consider, for example, model Mb . In any experimental design (i.e., a number of coin tosses), PCMb = 1. However, more generally, this is not the case. Focusing on the FLMP and the LIM, it is clear that even a simple measure as PC depends crucially on (some aspects of) the experimental design. In particular, every level corresponds to a new parameter, so PC = I + J . Similarly, GC is dependent on design choices. The PPC is not different in this respect. The design sensitivity implies that one can only make sensible conclusions about differences in model complexity by using different designs. In an information integration task, the design decisions include the type of design (expanded or not), the number of sources, the number of response alternatives, the number of levels for each source, and the number of observations for each stimulus (sample size). The present study focuses on the expanded factorial designs with two sources and two response alternatives. The additional design features were varied: both a 5 × 5 and a 8 × 2 design were considered, using three different sample sizes (20, 60 and 150, following [2]). 4.4 Results Figure 1 shows the 99% predicted interval in the 8×2 design with n = 150. Each panel corresponds to a different model. In each panel, each of the 26 stimuli is displayed on the x-axis. The ﬁrst eight stimuli correspond to the stimuli with the lowest level of visual support, and are ordered in increasing order of auditory support. The next eight stimuli correspond to the stimuli with the highest level of visual support. The next eight stimuli correspond to the unimodal stimuli where only auditory information is provided (again ranked in increasing order). The ﬁnal two stimuli are the unimodal visual stimuli. Panel A shows that the predicted interval of LIMu nearly equals the universal interval, ranging between 0 and 1. This indicates that almost all outcomes are given a non-negligible prior mass by LIMu , making it almost maximally complex. FLMPu is even more complex. The predicted interval, shown in Panel B, virtually equals the universal interval, indicating that the model predicts virtually every possible outcome. Panels C and D show the dramatic effect of incorporating relevant prior information in the models. The predicted intervals of both LIMo and FLMPo are much smaller than their counterparts using the uniform priors. Focusing on the comparison between LIM and FLMP, the PPC indicates that the latter is more complex than the former. This observation holds irrespective of the model version (assuming uniform 6 0.9 0.8 0.8 Proportion of /da/ responses 1 0.9 Proportion of /da/ responses 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 11 21 A 1* 0 *1 11 21 B 1* *1 1* *1 0.8 Proportion of /da/ responses 0.9 0.8 21 1 0.9 Proportion of /da/ responses 1 11 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 11 21 C 1* 0 *1 D Figure 1: The 99% predicted interval for each of the 26 stimuli (x-axis) according to LIMu (Panel A), FLMPu (Panel B), LIMo (Panel C), and FLMPo (Panel D). Table 1: PPC, based on the 99% predicted interval, for four models across six different designs. 20 LIMu FLMPu LIMo FLMPo 5×5 60 150 20 8×2 60 150 0.97 1 0.75 0.83 0.94 1 0.67 0.80 .97 1 0.77 0.86 0.95 1 0.69 0.82 0.93 0.99 0.64 0.78 7 0.94 0.99 0.66 0.81 vs. ordered priors). The smaller complexity of LIM is in line with previous attempts to measure the relative complexities of LIM and FLMP, such as the atheoretical simulation-based approach ([4] but see [5]), the semi-theoretical simulation-based approach [4], the theoretical simulation-based approach [2, 6, 22], and a direct computation of the GC [2]. The PPC’s for all six designs considered are displayed in Table 1. It shows that the observations made for the 8 × 2, n = 150 design holds across the ﬁve remaining designs as well: LIM is simpler than FLMP; and models assuming ordered priors are simpler than models assuming uniform priors. Note that these conclusions would not have been possible based on PC or GC. For PC, all four models have the same complexity. GC, in contrast, would detect complexity differences between LIM and FLMP (i.e., the ﬁrst conclusion), but due to its insensitivity to the parameter prior, the complexity differences between LIMu and LIMo on the one hand, and FLMPu and FLMPo on the other hand (i.e., the second conclusion) would have gone unnoticed. 5 Discussion A theorist deﬁning a model should clearly and explicitly specify at least the three following pieces of information: the model equation, the parameter prior range, and the parameter prior distribution. If any of these pieces is missing, the model should be regarded as incomplete, and therefore untestable. Consequently, any measure of generalizability should be sensitive to all three aspects of the model deﬁnition. Many currently popular generalizability measures do not satisfy this criterion, including AIC, BIC and MDL. A measure of generalizability that does take these three aspects of a model into account is the marginal likelihood [6, 7, 14, 23]. Often, the marginal likelihood is criticized exactly for its sensitivity to the prior range and distribution (e.g., [24]). However, in the light of the fact that the prior is a part of the model deﬁnition, I see the sensitivity of the marginal likelihood to the prior as an asset rather than a nuisance. It is precisely the measures of generalizability that are insensitive to the prior that miss an important aspect of the model. Similarly, any stand alone measure of model complexity should be sensitive to all three aspects of the model deﬁnition, as all three aspects contribute to the model’s complexity (with the model equation contributing two factors: the number of parameters and the functional form). Existing measures of complexity do not satisfy this requirement and are therefore incomplete. PC takes only part of the model equation into account, whereas GC takes only the model equation and the range into account. In contrast, the PPC currently proposed is sensitive to all these three aspects. It assesses model complexity using the predicted interval which contains all possible outcomes a model can generate. A narrow predicted interval (relative to the universal interval) indicates a simple model; a complex model is characterized by a wide predicted interval. There is a tight coupling between the notions of information, knowledge and uncertainty, and the notion of model complexity. As parameters correspond to unknown variables, having more information available leads to fewer parameters and hence to a simpler model. Similarly, the more information there is available, the sharper the parameter prior, implying a simpler model. To put it differently, the less uncertainty present in a model, the narrower its predicted interval, and the simpler the model. For example, in model Mb , there is maximal uncertainty. Nothing but the range is known about θ, so all values of θ are equally likely. In contrast, in model Mf , there is minimal uncertainty. In fact, ph is known for sure, so only a single value of θ is possible. This difference in uncertainty is translated in a difference in complexity. The same is true for the information integration models. Incorporating the order constraints in the priors reduces the uncertainty compared to the models without these constraints (it tells you, for example, that parameter a1 is smaller than a2 ). This reduction in uncertainty is reﬂected by a smaller complexity. There are many different sources of prior information that can be translated in a range or distribution. The illustration using the information integration models highlighted that prior information can reﬂect meaningful information in the design. Alternatively, priors can be informed by previous applications of similar models in similar settings. Probably the purest form of priors are those that translate theoretical assumptions made by a model (see [16]). The fact that it is often difﬁcult to formalize this prior information may not be used as an excuse to leave the prior unspeciﬁed. Sure it is a challenging task, but so is translating theoretical assumptions into the model equation. Formalizing theory, intuitions, and information is what model building is all about. 8 References [1] Myung, I. J. (2000) The importance of complexity in model selection. Journal of Mathematical Psychology, 44, 190–204. [2] Pitt, M. A., Myung, I. J., and Zhang, S. (2002) Toward a method of selecting among computational models of cognition. Psychological Review, 109, 472–491. [3] Shiffrin, R. M., Lee, M. D., Kim, W., and Wagenmakers, E. J. (2008) A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32, 1248–1284. [4] Cutting, J. E., Bruno, N., Brady, N. P., and Moore, C. (1992) Selectivity, scope, and simplicity of models: A lesson from ﬁtting judgments of perceived depth. Journal of Experimental Psychology: General, 121, 364–381. [5] Dunn, J. (2000) Model complexity: The ﬁt to random data reconsidered. Psychological Research, 63, 174–182. [6] Myung, I. J. and Pitt, M. A. (1997) Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review, 4, 79–95. [7] Vanpaemel, W. and Storms, G. (in press) Abstraction and model evaluation in category learning. Behavior Research Methods. [8] Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. Petrov, B. and Csaki, B. (eds.), Second International Symposium on Information Theory, pp. 267–281, Academiai Kiado. [9] Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics, 6, 461–464. [10] Myung, I. J., Balasubramanian, V., and Pitt, M. A. (2000) Counting probability distributions: Differential geometry and model selection. Proceedings of the National Academy of Sciences, 97, 11170–11175. [11] Lee, M. D. (2002) Generating additive clustering models with minimal stochastic complexity. Journal of Classiﬁcation, 19, 69–85. [12] Rissanen, J. (1996) Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42, 40–47. [13] Gr¨ nwald, P. (2000) Model selection based on minimum description length. Journal of Mathematical u Psychology, 44, 133–152. [14] Lee, M. D. and Wagenmakers, E. J. (2005) Bayesian statistical inference in psychology: Comment on Traﬁmow (2003). Psychological Review, 112, 662–668. [15] Lee, M. D. and Vanpaemel, W. (2008) Exemplars, prototypes, similarities and rules in category representation: An example of hierarchical Bayesian analysis. Cognitive Science, 32, 1403–1424. [16] Vanpaemel, W. and Lee, M. D. (submitted) Using priors to formalize theory: Optimal attention and the generalized context model. [17] Lee, M. D. (2008) Three case studies in the Bayesian analysis of cognitive models. Psychonomic Bulletin & Review, 15, 1–15. [18] Spiegelhalter, D., Thomas, A., Best, N., and Lunn, D. (2004) WinBUGS User Manual Version 2.0. Medical Research Council Biostatistics Unit. Institute of Public Health, Cambridge. [19] Anderson, N. H. (1981) Foundations of information integration theory. Academic Press. [20] Oden, G. C. and Massaro, D. W. (1978) Integration of featural information in speech perception. Psychological Review, 85, 172–191. [21] Massaro, D. W. (1998) Perceiving Talking Faces: From Speech Perception to a Behavioral Principle. MIT Press. [22] Massaro, D. W., Cohen, M. M., Campbell, C. S., and Rodriguez, T. (2001) Bayes factor of model selection validates FLMP. Psychonomic Bulletin and Review, 8, 1–17. [23] Kass, R. E. and Raftery, A. E. (1995) Bayes factors. Journal of the American Statistical Association, 90, 773–795. [24] Liu, C. C. and Aitkin, M. (2008) Bayes factors: Prior sensitivity and model generalizability. Journal of Mathematical Psychology, 53, 362–375. 9

6 0.46943247 233 nips-2009-Streaming Pointwise Mutual Information

7 0.45011684 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

8 0.43546143 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

9 0.41805127 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

10 0.41606739 177 nips-2009-On Learning Rotations

11 0.40983742 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

12 0.4031854 124 nips-2009-Lattice Regression

13 0.38147524 115 nips-2009-Individuation, Identification and Object Discovery

14 0.37395129 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

15 0.37115324 154 nips-2009-Modeling the spacing effect in sequential category learning

16 0.36642888 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control

17 0.35922652 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

18 0.35901803 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

19 0.3582803 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

20 0.35812917 163 nips-2009-Neurometric function analysis of population codes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.014), (24, 0.026), (25, 0.057), (35, 0.034), (36, 0.055), (39, 0.048), (55, 0.401), (58, 0.052), (61, 0.027), (71, 0.075), (81, 0.023), (86, 0.067), (91, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78294152 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

Author: Matthew Wilder, Matt Jones, Michael C. Mozer

2 0.75252879 89 nips-2009-FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs

Author: Andrew McCallum, Karl Schultz, Sameer Singh

Abstract: Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to deﬁne these structures in a powerful and ﬂexible way. Rather than using a declarative language, such as SQL or ﬁrst-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative deﬁnitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain signiﬁcant efﬁciencies. We have implemented such imperatively deﬁned factor graphs in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we ﬁnd our approach to be 3-15 times faster while reducing error by 20-25%—achieving a new state of the art. 1

3 0.660254 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Speciﬁcally, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1

4 0.57166266 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

Author: Xiaolin Yang, Seyoung Kim, Eric P. Xing

Abstract: Multitask learning addresses the problem of learning related tasks that presumably share some commonalities on their input-output mapping functions. Previous approaches to multitask learning usually deal with homogeneous tasks, such as purely regression tasks, or entirely classiﬁcation tasks. In this paper, we consider the problem of learning multiple related tasks of predicting both continuous and discrete outputs from a common set of input variables that lie in a highdimensional feature space. All of the tasks are related in the sense that they share the same set of relevant input variables, but the amount of inﬂuence of each input on different outputs may vary. We formulate this problem as a combination of linear regressions and logistic regressions, and model the joint sparsity as L1 /L∞ or L1 /L2 norm of the model parameters. Among several possible applications, our approach addresses an important open problem in genetic association mapping, where the goal is to discover genetic markers that inﬂuence multiple correlated traits jointly. In our experiments, we demonstrate our method in this setting, using simulated and clinical asthma datasets, and we show that our method can effectively recover the relevant inputs with respect to all of the tasks. 1

5 0.39753053 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

Author: Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael J. Black

Abstract: Large, relational factor graphs with structure deﬁned by ﬁrst-order logic or other languages give rise to notoriously difﬁcult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring conﬁgurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efﬁcient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain. 1

6 0.37008098 196 nips-2009-Quantification and the language of thought

7 0.36427319 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

8 0.35050607 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

9 0.34675381 204 nips-2009-Replicated Softmax: an Undirected Topic Model

10 0.34662092 111 nips-2009-Hierarchical Modeling of Local Image Features through $L p$-Nested Symmetric Distributions

11 0.34597406 154 nips-2009-Modeling the spacing effect in sequential category learning

12 0.345862 46 nips-2009-Bilinear classifiers for visual recognition

13 0.34548709 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

14 0.34455428 112 nips-2009-Human Rademacher Complexity

15 0.34435412 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

16 0.34411898 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

17 0.34395295 113 nips-2009-Improving Existing Fault Recovery Policies

18 0.34374031 226 nips-2009-Spatial Normalized Gamma Processes

19 0.34320441 96 nips-2009-Filtering Abstract Senses From Image Search Results

20 0.34149382 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability