nips nips2007 nips2007-125 knowledge-graph by maker-knowledge-mining

125 nips-2007-Markov Chain Monte Carlo with People


Source: pdf

Author: Adam Sanborn, Thomas L. Griffiths

Abstract: Many formal models of cognition implicitly use subjective probability distributions to capture the assumptions of human learners. Most applications of these models determine these distributions indirectly. We propose a method for directly determining the assumptions of human learners by sampling from subjective probability distributions. Using a correspondence between a model of human choice and Markov chain Monte Carlo (MCMC), we describe a method for sampling from the distributions over objects that people associate with different categories. In our task, subjects choose whether to accept or reject a proposed change to an object. The task is constructed so that these decisions follow an MCMC acceptance rule, defining a Markov chain for which the stationary distribution is the category distribution. We test this procedure for both artificial categories acquired in the laboratory, and natural categories acquired from experience. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Many formal models of cognition implicitly use subjective probability distributions to capture the assumptions of human learners. [sent-5, score-0.457]

2 We propose a method for directly determining the assumptions of human learners by sampling from subjective probability distributions. [sent-7, score-0.428]

3 Using a correspondence between a model of human choice and Markov chain Monte Carlo (MCMC), we describe a method for sampling from the distributions over objects that people associate with different categories. [sent-8, score-0.749]

4 In our task, subjects choose whether to accept or reject a proposed change to an object. [sent-9, score-0.173]

5 The task is constructed so that these decisions follow an MCMC acceptance rule, defining a Markov chain for which the stationary distribution is the category distribution. [sent-10, score-0.766]

6 We test this procedure for both artificial categories acquired in the laboratory, and natural categories acquired from experience. [sent-11, score-0.294]

7 Subjective probability distributions are used to model the degrees of belief that learners assign to hypotheses in many domains, including categorization, decision making, and memory [1, 2, 3, 4]. [sent-13, score-0.194]

8 A common way to learn about a probability distribution is to draw samples from it. [sent-15, score-0.17]

9 In the machine learning and statistics literature, drawing samples from probability distributions is a major area of research, and is often done using Markov chain Monte Carlo (MCMC) algorithms. [sent-16, score-0.381]

10 In this paper, we describe a method for directly obtaining information about subjective probability distributions, by having people act as elements of an MCMC algorithm. [sent-17, score-0.405]

11 Our approach is to design a task that will allow us to sample from a particular subjective probability distribution. [sent-18, score-0.249]

12 Much research has been devoted to relating the magnitude of psychological responses to choice probabilities, resulting in mathematical models of these tasks. [sent-19, score-0.211]

13 We point out an equivalence between a model of human choice behavior and an MCMC acceptance function, and use this equivalence to develop a method for obtaining samples from a subjective distribution. [sent-20, score-0.753]

14 In Section 2, we describe MCMC in general and the Metropolis method and Barker acceptance function in particular. [sent-23, score-0.24]

15 In Section 4, we present an experiment showing that this method can be used to recover trained category distributions from human judgments. [sent-25, score-0.43]

16 Section 5 gives a demonstration of our MCMC method applied to recovering natural categories of animal shape. [sent-26, score-0.25]

17 1 2 Markov chain Monte Carlo Models of physical phenomena used by scientists are often expressed in terms of complex probability distributions over different events. [sent-28, score-0.291]

18 Generating samples from these distributions can be an efficient way to determine their properties, indicating which events are assigned high probabilities and providing a way to approximate various statistics of interest. [sent-29, score-0.217]

19 One of the most successful methods of this kind is Markov chain Monte Carlo. [sent-32, score-0.187]

20 An MCMC algorithm constructs a Markov chain that has the target distribution, from which we want to sample, as its stationary distribution. [sent-33, score-0.287]

21 This Markov chain can be initialized with any state, being guaranteed to converge to its stationary distribution after many iterations of stochastic transitions between states. [sent-34, score-0.297]

22 After convergence, the states visited by the Markov chain can be used similarly to samples from the target distribution (see [5] for details). [sent-35, score-0.392]

23 The canonical MCMC algorithm is the Metropolis method [6], in which transitions between states have two parts: a proposal distribution and an acceptance function. [sent-36, score-0.438]

24 Based on the current state, a candidate for the next state is sampled from the proposal distribution. [sent-37, score-0.166]

25 The acceptance function gives the probability of accepting this proposal. [sent-38, score-0.277]

26 If the proposal is rejected, then the current state is taken as the next state. [sent-39, score-0.166]

27 A variety of acceptance functions guarantee that the stationary distribution of the resulting Markov chain is the target distribution [7]. [sent-40, score-0.613]

28 3 An acceptance function from human behavior While our approach can be applied to any subjective probability distribution, our experiments focused on sampling from the distributions over objects associated with different categories. [sent-42, score-0.788]

29 The way people group objects into categories has been studied extensively, producing a number of formal models of human categorization [3, 4, 9, 10, 11], almost all of which can be interpreted as defining a category as a probability distribution over objects [4]. [sent-44, score-1.004]

30 In this section, we consider how to lead people to choose between two objects in a way that would correspond to a valid acceptance function for an MCMC algorithm with the distribution over objects associated with a category as its target distribution. [sent-45, score-0.909]

31 The choice between the objects is a choice between two hypotheses: The first hypothesis, h1 , is that x1 is drawn from the category distribution p(x|c) and x2 is drawn from g(x), an alternative distribution that governs the probability of other objects appearing on the screen. [sent-52, score-0.692]

32 The second hypothesis, h2 , is that x1 is from the alternative distribution and x2 is from the category distribution. [sent-53, score-0.238]

33 The second assumption is that the probabilities of the two stimuli under the alternative distribution are approximately equal, with g(x1 ) ≈ g(x2 ). [sent-58, score-0.166]

34 If people assume that the alternative distribution is uniform, then the probabilities of the two stimuli will be exactly equal. [sent-59, score-0.356]

35 2 From a task to an acceptance function The Bayesian analysis of the task described above results in a posterior probability of h1 (Equation 3) which has a similar form to the Barker acceptance function (Equation 1). [sent-63, score-0.585]

36 This equation has a long history as a model of human choice probabilities, where it is known as the Luce choice rule or the ratio rule [12, 13]. [sent-65, score-0.36]

37 This rule has been shown to provide a good fit to human data when people choose between two stimuli based on a particular property [14, 15, 16]. [sent-66, score-0.412]

38 It corresponds to a situation in which people choose alternatives based on their relative probabilities, a common behavior known as probability matching [17]. [sent-67, score-0.266]

39 The Luce choice rule has also been used to convert psychological response magnitudes into response probabilities in many models of cognition [11, 18, 19, 20, 21]. [sent-68, score-0.41]

40 3 A more flexible response rule Probability matching can be a good description of the data, but subjects have been shown to produce behavior that is more deterministic [17]. [sent-70, score-0.328]

41 This response rule can be derived by applying a soft threshold to the log odds of the two hypotheses (a sigmoid function with a gain of γ). [sent-72, score-0.168]

42 By equivalence to the Barker acceptance function, this response rule defines a Markov chain with stationary distribution π(x) ∝ p(x|c)γ . [sent-74, score-0.685]

43 (6) Thus, using the weaker assumptions of Equation 5 as a model of human behavior, we can estimate the category distribution p(x|c) up to a constant exponent. [sent-75, score-0.355]

44 4 Summary Based on the results in this section, we can define a simple method for drawing samples from a category distribution using MCMC. [sent-78, score-0.328]

45 A person chooses between the current state and the proposal to select the new state. [sent-80, score-0.166]

46 Assuming that people’s choice behavior follows the Luce choice rule, the stationary distribution of the Markov chain is the category distribution. [sent-81, score-0.633]

47 The states of the chain are samples from the category distribution, which provide information about the mental representation of that category. [sent-82, score-0.568]

48 A simple one-dimensional categorization task was used, with the height of schematic fish (see Figure 1) being the dimension along which category distributions were defined. [sent-84, score-0.417]

49 Subjects were trained on two categories of fish height – a uniform distribution and a Gaussian distribution – being told that they were learning to judge whether a fish came from the ocean (the uniform distribution) or a fish farm (the Gaussian distribution). [sent-85, score-0.66]

50 Once subjects were trained, we collected MCMC samples for the Gaussian distributions by asking subjects to judge which of two fish came from the fish farm. [sent-87, score-0.615]

51 1 Method Fifty subjects were recruited from the university community via a newspaper advertisement. [sent-89, score-0.173]

52 Data from one subject was discarded for not finishing the experiment, data from another was discarded because the chains reached a boundary, and the data of eight others were discarded because their chains did not cross (more detail below). [sent-90, score-0.657]

53 Each subject was trained to discriminate between two categories of fish: ocean fish and fish farm fish. [sent-95, score-0.462]

54 ” These instructions were meant to suggest that the ocean fish were drawn from a uniform distribution and the fish farm fish were drawn from a Gaussian distribution. [sent-98, score-0.319]

55 The uniform distribution was the same across training distributions and was bounded at 2. [sent-104, score-0.188]

56 1 cm long with heights drawn from the Gaussian and uniform distributions in training. [sent-110, score-0.301]

57 Instead, one fish was the state of a Markov chain and the other fish was the proposal. [sent-122, score-0.237]

58 The state and proposal were unlabeled and they were randomly assigned to either the left or right side of the screen. [sent-123, score-0.166]

59 Three MCMC chains were interleaved during the MCMC trials. [sent-124, score-0.222]

60 The start states of the chains were chosen to be 2. [sent-125, score-0.225]

61 Relative to the training distributions, the start states were overdispersed, facilitating assessment of Figure 1: Examples of the largest and smallest fish stimuli presented to subjects during training. [sent-129, score-0.321]

62 4 Fish Width (cm) 5 4 3 Subject 44 5 4 3 Subject 30 5 4 3 Subject 37 5 4 3 Subject 19 10 20 30 40 50 Trial Number 60 70 80 Training Kernel Gaussian Density Fit Figure 2: The four rows are subjects from each of the between-subject conditions. [sent-131, score-0.173]

63 The panels in the first column show the behavior of the three Markov chains per subject. [sent-132, score-0.225]

64 The experiment was broken up into blocks of training and MCMC trials, beginning with 120 training trials, followed by alternating blocks of 60 MCMC trials and 60 training trials. [sent-139, score-0.259]

65 Training and MCMC trials were interleaved to keep subjects from forgetting the training distributions. [sent-140, score-0.344]

66 2 Results Subjects were excluded if their chains did not converge to the stationary distribution or if the state of any chain reached the edge of the parameter range. [sent-143, score-0.533]

67 We used a heuristic for determining convergence: every chain had to cross another chain. [sent-144, score-0.213]

68 1 Figure 2 shows the chains from four subjects, one from each of the between-subject conditions. [sent-145, score-0.186]

69 Most subjects took approximately 20 trials to produce the first crossing in their chains, so these trials were discarded and the remaining 60 trials from each chain were pooled and used in further analyses. [sent-146, score-0.735]

70 The distributions estimated for the subjects shown in this figure match well with the training distribution. [sent-148, score-0.286]

71 As predicted, µ was higher for subjects trained on Gaussians with higher means, and σ was higher for subjects trained on Gaussians with higher standard deviations. [sent-151, score-0.436]

72 6 Mean of Estimates from MCMC Samples (cm) Figure 3: The bar plots show the mean of µ and σ across the MCMC samples produced by subjects in all four training conditions. [sent-172, score-0.309]

73 noise (increasing the effective variation in stimuli associated with a category) or choices being made in a way consistent with the exponentiated choice rule with γ < 1. [sent-175, score-0.218]

74 5 Investigating the structure of natural categories The previous experiment provided evidence that the assumptions underlying the MCMC method are approximately correct, as the samples recovered by the method matched the training distribution. [sent-176, score-0.341]

75 Now we will demonstrate this method in a much more interesting case: sampling from subjective probability distributions that have been built up from real-world experience. [sent-177, score-0.314]

76 The natural categories of the shapes of giraffes, horses, cats, and dogs were explored in a nine-dimensional stick figure space [26]. [sent-178, score-0.238]

77 For each animal, three Markov chains were started from different states. [sent-180, score-0.186]

78 Figure 4B shows the chains converging for the giraffe condition. [sent-182, score-0.186]

79 The different animal conditions converged to different areas of the parameter space (Figure 4C) and the means across samples produced stick figures that correspond well to the tested categories (Figure 4D). [sent-183, score-0.404]

80 6 Summary and conclusion We have developed a Markov chain Monte Carlo method for sampling from a subjective probability distribution. [sent-184, score-0.434]

81 This method allows a person to act as an element of an MCMC algorithm by constructing a task for which choice probabilities follow a valid acceptance function. [sent-185, score-0.385]

82 By choosing between the current state and a proposal, people produce a Markov chain with a stationary distribution matching their mental representation of a category. [sent-186, score-0.594]

83 The results of our experiment indicate that this method accurately uncovers differences in mental representations that result from training people on categories with different structures. [sent-187, score-0.472]

84 In addition, we explored the subjective probability distributions of natural animal shapes in a multidimensional parameter space. [sent-188, score-0.439]

85 Our method estimates the subjective probability distribution, while classification images estimate the decision boundary between two classes. [sent-190, score-0.215]

86 Both methods can contribute to the complete picture of how people make categorization decisions. [sent-191, score-0.283]

87 Typicality ratings are used to determine which objects are better examples of a category than other objects. [sent-193, score-0.299]

88 Our MCMC method yields the same information, but provides a way to efficiently do so when the category distribution is concentrated in a small region of a large parameter space. [sent-194, score-0.238]

89 Our MCMC method provides a way to explore the subjective probability distributions that people associate with categories. [sent-197, score-0.499]

90 Similar tasks could be used to investigate subjective probability distributions in other settings, providing a valuable tool for testing probabilistic models of cognition. [sent-198, score-0.282]

91 For instance, Gibbs sampling could 6 Figure 4: Task and results for an experiment exploring natural categories of animals using stick figure stimuli. [sent-200, score-0.323]

92 (A) Screen capture from the experiment, where people make a choice between the current state of the Markov chain and a proposed state. [sent-201, score-0.478]

93 (B) States of the Markov chain for the subject when estimating the distribution for giraffes. [sent-202, score-0.293]

94 The nine-dimensional space characterizing the stick figures is projected onto the two dimensions that best discriminate the different animal distributions using linear discriminant analysis. [sent-203, score-0.261]

95 Each chain is a different color and the start states of the chains are indicated by the filled circle. [sent-204, score-0.412]

96 The dotted lines are samples that were discarded to ensure that the Markov chains had converged, and the solid lines are the samples that were retained. [sent-205, score-0.44]

97 The samples capture the similarities and differences between the four categories of animals, and reveal the variation in the members of those categories. [sent-208, score-0.237]

98 7 be used to generate samples from a distribution, if a clever method for inducing people to sample from conditional distributions could be found. [sent-210, score-0.347]

99 Using people as the elements of a machine learning algorithm is a virtually unexplored area that should be exploited in order to more efficiently test hypotheses about the knowledge that guides human learning and inference. [sent-211, score-0.333]

100 Monte Carlo sampling methods using Markov chains and their applications. [sent-259, score-0.218]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sh', 0.463), ('mcmc', 0.361), ('acceptance', 0.24), ('category', 0.195), ('people', 0.19), ('chain', 0.187), ('chains', 0.186), ('subjective', 0.178), ('subjects', 0.173), ('categories', 0.147), ('cm', 0.122), ('barker', 0.12), ('proposal', 0.116), ('objects', 0.104), ('animal', 0.103), ('markov', 0.098), ('farm', 0.096), ('psychology', 0.096), ('categorization', 0.093), ('human', 0.091), ('samples', 0.09), ('trials', 0.089), ('monte', 0.084), ('luce', 0.084), ('ocean', 0.084), ('psychological', 0.077), ('carlo', 0.075), ('discarded', 0.074), ('rule', 0.068), ('stationary', 0.067), ('distributions', 0.067), ('stick', 0.064), ('subject', 0.063), ('fish', 0.063), ('stimuli', 0.063), ('probabilities', 0.06), ('cognition', 0.058), ('mental', 0.057), ('hypotheses', 0.052), ('choice', 0.051), ('came', 0.05), ('metropolis', 0.05), ('state', 0.05), ('ashby', 0.048), ('heights', 0.048), ('rosenbluth', 0.048), ('animals', 0.048), ('response', 0.048), ('training', 0.046), ('cognitive', 0.046), ('trained', 0.045), ('proposing', 0.044), ('distribution', 0.043), ('acoustical', 0.042), ('typicality', 0.042), ('hillsdale', 0.042), ('observers', 0.041), ('behavior', 0.039), ('states', 0.039), ('chater', 0.038), ('learners', 0.038), ('probability', 0.037), ('gaussian', 0.037), ('exponentiated', 0.036), ('interleaved', 0.036), ('trial', 0.035), ('task', 0.034), ('erlbaum', 0.034), ('asking', 0.034), ('crossing', 0.034), ('mm', 0.034), ('biometrics', 0.034), ('target', 0.033), ('told', 0.032), ('psychophysics', 0.032), ('equivalence', 0.032), ('drawn', 0.032), ('uniform', 0.032), ('sampling', 0.032), ('experiment', 0.032), ('equation', 0.031), ('america', 0.029), ('devoted', 0.029), ('variances', 0.029), ('judge', 0.028), ('height', 0.028), ('speech', 0.028), ('block', 0.028), ('hypothesis', 0.028), ('mathematical', 0.028), ('rational', 0.027), ('discriminate', 0.027), ('multidimensional', 0.027), ('stimulus', 0.027), ('black', 0.027), ('shapes', 0.027), ('associate', 0.027), ('determining', 0.026), ('responses', 0.026), ('assumptions', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 125 nips-2007-Markov Chain Monte Carlo with People

Author: Adam Sanborn, Thomas L. Griffiths

Abstract: Many formal models of cognition implicitly use subjective probability distributions to capture the assumptions of human learners. Most applications of these models determine these distributions indirectly. We propose a method for directly determining the assumptions of human learners by sampling from subjective probability distributions. Using a correspondence between a model of human choice and Markov chain Monte Carlo (MCMC), we describe a method for sampling from the distributions over objects that people associate with different categories. In our task, subjects choose whether to accept or reject a proposed change to an object. The task is constructed so that these decisions follow an MCMC acceptance rule, defining a Markov chain for which the stationary distribution is the category distribution. We test this procedure for both artificial categories acquired in the laboratory, and natural categories acquired from experience. 1

2 0.16386756 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

Author: Matthew Hoffman, Arnaud Doucet, Nando D. Freitas, Ajay Jasra

Abstract: A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.

3 0.10699984 114 nips-2007-Learning and using relational theories

Author: Charles Kemp, Noah Goodman, Joshua B. Tenenbaum

Abstract: Much of human knowledge is organized into sophisticated systems that are often called intuitive theories. We propose that intuitive theories are mentally represented in a logical language, and that the subjective complexity of a theory is determined by the length of its representation in this language. This complexity measure helps to explain how theories are learned from relational data, and how they support inductive inferences about unobserved relations. We describe two experiments that test our approach, and show that it provides a better account of human learning and reasoning than an approach developed by Goodman [1]. What is a theory, and what makes one theory better than another? Questions like these are of obvious interest to philosophers of science but are also discussed by psychologists, who have argued that everyday knowledge is organized into rich and complex systems that are similar in many respects to scientific theories. Even young children, for instance, have systematic beliefs about domains including folk physics, folk biology, and folk psychology [2]. Intuitive theories like these play many of the same roles as scientific theories: in particular, both kinds of theories are used to explain and encode observations of the world, and to predict future observations. This paper explores the nature, use and acquisition of simple theories. Consider, for instance, an anthropologist who has just begun to study the social structure of a remote tribe, and observes that certain words are used to indicate relationships between selected pairs of individuals. Suppose that term T1(·, ·) can be glossed as ancestor(·, ·), and that T2(·, ·) can be glossed as friend(·, ·). The anthropologist might discover that the first term is transitive, and that the second term is symmetric with a few exceptions. Suppose that term T3(·, ·) can be glossed as defers to(·, ·), and that the tribe divides into two castes such that members of the second caste defer to members of the first caste. In this case the anthropologist might discover two latent concepts (caste 1(·) and caste 2(·)) along with the relationship between these concepts. As these examples suggest, a theory can be defined as a system of laws and concepts that specify the relationships between the elements in some domain [2]. We will consider how these theories are learned, how they are used to encode relational data, and how they support predictions about unobserved relations. Our approach to all three problems relies on the notion of subjective complexity. We propose that theory learners prefer simple theories, that people remember relational data in terms of the simplest underlying theory, and that people extend a partially observed data set according to the simplest theory that is consistent with their observations. There is no guarantee that a single measure of subjective complexity can do all of the work that we require [3]. This paper, however, explores the strong hypothesis that a single measure will suffice. Our formal treatment of subjective complexity begins with the question of how theories are mentally represented. We suggest that theories are represented in some logical language, and propose a specific first-order language that serves as a hypothesis about the “language of thought.” We then pursue the idea that the subjective complexity of a theory corresponds to the length of its representation in this language. Our approach therefore builds on the work of Feldman [4], and is related to other psychological applications of the notion of Kolmogorov complexity [5]. The complexity measure we describe can be used to define a probability distribution over a space of theories, and we develop a model of theory acquisition by using this distribution as the prior for a Bayesian learner. We also 1 (a) Star 11 (b) Bipartite (c) Exception 22 33 44 55 66 77 88 16 26 36 46 56 21 31 41 51 61 71 81 11 17 27 37 47 57 18 28 38 48 58 R(X, X). T(6). T(7). T(8). R(X, Y) ← ¯(X), T(Y). T R(X, 1). (d) Symmetric (e) Transitive 11 22 33 44 55 66 77 13 31 12 21 24 42 56 65 26 36 46 56 17 27 37 47 57 18 28 38 48 58 T(6). T(7). T(8). R(X, Y) ← ¯(X), T(Y). T ¯ R(1, 1). R(1, 6). (f) Random 12 21 13 23 14 24 34 13 32 14 24 34 15 25 35 45 16 26 36 46 56 51 52 35 54 61 26 63 46 56 R(1, 2). R(1, 3). R(2, 4). R(5, 6). R(1, 2). R(2, 3). R(3, 4). R(5, X). R(X, 4). R(X, Y) ← R(Y, X). R(X, X). R(4, 5). R(5, 6). R(X, Z) ← R(X, Y), R(Y, Z). R(2, 1). R(1, 3). R(6, 1). R(3, 2). R(2, 6). R(3, 5). R(6, 3). R(4, 6). ¯ ¯ ¯ R(X, X). R(6, 4). R(5, 3). Figure 1: Six possible extensions for a binary predicate R(·, ·). In each case, the objects in the domain are represented as digits, and a pair such as 16 indicates that R(1, 6) is true. Below each set of pairs, the simplest theory according to our complexity measure is shown. show how the same Bayesian approach helps to explain how theories support inductive generalization: given a set of observations, future observations (e.g. whether one individual defers to another) can be predicted using the posterior distribution over the space of theories. We test our approach by developing two experiments where people learn and make predictions about binary and ternary relations. As far as we know, the approach of Goodman [1] is the only other measure of theory complexity that has previously been tested as a psychological model [6]. We show that our experiments support our approach and raise challenges for this alternative model. 1 Theory complexity: a representation length approach Intuitive theories correspond to mental representations of some sort, and our first task is to characterize the elements used to build these representations. We explore the idea that a theory is a system of statements in a logical language, and six examples are shown in Fig. 1. The theory in Fig. 1b is related to the defers to(·, ·) example already described. Here we are interested in a domain including 9 elements, and a two place predicate R(·, ·) that is true of all and only the 15 pairs shown. R is defined using a unary predicate T which is true of only three elements: 6, 7, and 8. The theory includes a clause which states that R(X, Y) is true for all pairs XY such that T(X) is false and T(Y) is true. The theory in Fig. 1c is very similar, but includes an additional clause which specifies that R(1, 1) is true, and an exception which specifies that R(1, 6) is false. Formally, each theory we consider is a collection of function-free definite clauses. All variables are universally quantified: for instance, the clause R(X, Z) ← R(X, Y), R(Y, Z) is equivalent to the logical formula ∀x ∀y ∀z (R(x, z) ← R(x, y) ∧ R(y, z)). For readability, the theories in Fig. 1 include parentheses and arrows, but note that these symbols are unnecessary and can be removed. Our proposed language includes only predicate symbols, variable symbols, constant symbols, and a period that indicates when one clause finishes and another begins. Each theory in Fig. 1 specifies the extension of one or more predicates. The extension of predicate P is defined in terms of predicate P+ (which captures the basic rules that lead to membership in P) and predicate P− (which captures exceptions to these rules). The resulting extension of P is defined 2 as P+ \ P− , or the set difference of P+ and P− .1 Once P has been defined, later clauses in the theory may refer to P or its negation ¯. To ensure that our semantics is well-defined, the predicates P in any valid theory must permit an ordering so that the definition of any predicate does not refer to predicates that follow it in the order. Formally, the definition of each predicate P+ or P− can refer only to itself (recursive definitions are allowed) and to any predicate M or ¯ where M < P. M Once we have committed to a specific language, the subjective complexity of a theory is assumed to correspond to the number of symbols in its representation. We have chosen a language where there is one symbol for each position in a theory where a predicate, variable or constant appears, and one symbol to indicate when each clause ends. Given this language, the subjective complexity c(T ) of theory T is equal to the sum of the number of clauses in the theory and the number of positions in the theory where a predicate, variable or constant appears: c(T ) = #clauses(T ) + #pred slots(T ) + #var slots(T ) + #const slots(T ). (1) For instance, the clause R(X, Z) ← R(X, Y), R(Y, Z). contributes ten symbols towards the complexity of a theory (three predicate symbols, six variable symbols, and one period). Other languages might be considered: for instance, we could use a language which uses five symbols (e.g. five bits) to represent each predicate, variable and constant, and one symbol (e.g. one bit) to indicate the end of a clause. Our approach to subjective complexity depends critically on the representation language, but once a language has been chosen the complexity measure is uniquely specified. Although our approach is closely related to the notion of Kolmogorov complexity and to Minimum Message Length (MML) and Minimum Description Length (MDL) approaches, we refer to it as a Representation Length (RL) approach. A RL approach includes a commitment to a specific language that is proposed as a psychological hypothesis, but these other approaches aspire towards results that do not depend on the language chosen.2 It is sometimes suggested that the notion of Kolmogorov complexity provides a more suitable framework for psychological research than the RL approach, precisely because it allows for results that do not depend on a specific description language [8]. We subscribe to the opposite view. Mental representations presumably rely on some particular language, and identifying this language is a central challenge for psychological research. The language we described should be considered as a tentative approximation of the language of thought. Other languages can and should be explored, but our language has several appealing properties. Feldman [4] has argued that definite clauses are psychologically natural, and working with these representations allows our approach to account for several classic results from the concept learning literature. For instance, our language leads to the prediction that conjunctive concepts are easier to learn than disjunctive concepts [9].3 Working with definite clauses also ensures that each of our theories has a unique minimal model, which means that the extension of a theory can be defined in a particularly simple way. Finally, human learners deal gracefully with noise and exceptions, and our language provides a simple way to handle exceptions. Any concrete proposal about the language of thought should make predictions about memory, learning and reasoning. Suppose that data set D lists the extensions of one or more predicates, and that a theory is a “candidate theory” for D if it correctly defines the extensions of all predicates in D. Note that a candidate theory may well include latent predicates—predicates that do not appear in D, but are useful for defining the predicates that have been observed. We will assume that humans encode D in terms of the simplest candidate theory for D, and that the difficulty of memorizing D is determined by the subjective complexity of this theory. Our approach can and should be tested against classic results from the memory literature. Unlike some other approaches to complexity [10], for instance, our model predicts that a sequence of k items is about equally easy to remember regardless of whether the items are drawn from a set of size 2, a set of size 10, or a set of size 1000 [11]. 1 The extension of P+ is the smallest set that satisfies all of the clauses that define P+ , and the extension of P is defined similarly. To simplify our notation, Fig. 1 uses P to refer to both P and P+ , and ¯ to refer to ¯ and P P P− . Any instance of P that appears in a clause defining P is really an instance of P+ , and any instance of ¯ that P appears in a clause defining ¯ is really an instance of P− . P 2 MDL approaches also commit to a specific language, but this language is often intended to be as general as possible. See, for instance, the discussion of universal codes in Gr¨ nwald et al. [7]. u 3 A conjunctive concept C(·) can be defined using a single clause: C(X) ← A(X), B(X). The shortest definition of a disjunctive concept requires two clauses: D(X) ← A(X). D(X) ← B(X). − 3 To develop a model of inductive learning and reasoning, we take a Bayesian approach, and use our complexity measure to define a prior distribution over a hypothesis space of theories: P (T ) ∝ 2−c(T ) .4 Given this prior distribution, we can use Bayesian inference to make predictions about unobserved relations and to discover the theory T that best accounts for the observations in data set D [12, 13]. Suppose that we have a likelihood function P (D|T ) which specifies how the examples in D were generated from some underlying theory T . The best explanation for the data D is the theory that maximizes the posterior distribution P (T |D) ∝ P (D|T )P (T ). If we need to predict whether ground term g is likely to be true, 5 we can sum over the space of theories: P (g|D) = P (g|T )P (T |D) = T 1 P (D) P (D|T )P (T ) (2) T :g∈T where the final sum is over all theories T that make ground term g true. 1.1 Related work The theories we consider are closely related to logic programs, and methods for Inductive Logic Programming (ILP) explore how these programs can be learned from examples [14]. ILP algorithms are often inspired by the idea of searching for the shortest theory that accounts for the available data, and ILP is occasionally cast as the problem of minimizing an explicit MDL criterion [10]. Although ILP algorithms are rarely considered as cognitive models, the RL approach has a long psychological history, and is proposed by Chomsky [15] and Leeuwenberg [16] among others. Formal measures of complexity have been developed in many fields [17], and there is at least one other psychological account of theory complexity. Goodman [1] developed a complexity measure that was originally a philosophical proposal about scientific theories, but was later tested as a model of subjective complexity [6]. A detailed description of this measure is not possible here, but we attempt to give a flavor of the approach. Suppose that a basis is a set of predicates. The starting point for Goodman’s model is the intuition that basis B1 is at least as complex as basis B2 if B1 can be used to define B2. Goodman argues that this intuition is flawed, but his model is founded on a refinement of this intuition. For instance, since the binary predicate in Fig. 1b can be defined in terms of two unary predicates, Goodman’s approach requires that the complexity of the binary predicate is no more than the sum of the complexities of the two unary predicates. We will use Goodman’s model as a baseline for evaluating our own approach, and a comparison between these two models should be informed by both theoretical and empirical considerations. On the theoretical side, our approach relies on a simple principle for deciding which structural properties are relevant to the measurement of complexity: the relevant properties are those with short logical representations. Goodman’s approach incorporates no such principle, and he proposes somewhat arbitrarily that reflexivity and symmetry are among the relevant structural properties but that transitivity is not. A second reason for preferring our model is that it makes contact with a general principle—the idea that simplicity is related to representation length—that has found many applications across psychology, machine learning, and philosophy. 2 Experimental results We designed two experiments to explore settings where people learn, remember, and make inductive inferences about relational data. Although theories often consist of systems of many interlocking relations, we keep our experiments simple by asking subjects to learn and reason about a single relation at a time. Despite this restriction, our experiments still make contact with several issues raised by systems of relations. As the defers to(·, ·) example suggests, a single relation may be best explained as the observable tip of a system involving several latent predicates (e.g. caste 1(·) and caste 2(·)). 4 To ensure that this distribution can be normalized, we assume that there is some upper bound on the number of predicate symbols, variable symbols, and constants, and on the length of the theories we will consider. There will therefore be a finite number of possible theories, and our prior will be a valid probability distribution. 5 A ground term is a term such as R(8, 9) that does not include any variables. 4 Learning time Complexity (RL) Complexity (Human) 6 300 Complexity (Goodman) 4 20 0 0 0 star bprt excp sym trans rand 2 0 star bprt excp sym trans rand 2 star bprt excp sym trans rand 100 200 star bprt excp sym trans rand 4 40 Figure 2: (a) Average time in seconds to learn the six sets in Fig. 1. (b) Average ratings of set complexity. (c) Complexity scores according to our representation length (RL) model. (d) Complexity scores according to Goodman’s model. 2.1 Experiment 1: memory and induction In our first experiment, we studied the subjective complexity of six binary relations that display a range of structural properties, including reflexivity, symmetry, and transitivity. Materials and Methods. 18 adults participated in this experiment. Subjects were required to learn the 6 sets shown in Fig. 1, and to make inductive inferences about each set. Although Fig. 1 shows pairs of digits, the experiment used letter pairs, and the letters for each condition and the order in which these conditions were presented were randomized across subjects. The pairs for each condition were initially laid out randomly on screen, and subjects could drag them around and organize them to help them understand the structure of the set. At any stage, subjects could enter a test phase where they were asked to list the 15 pairs belonging to the current set. Subjects who made an error on the test were returned to the learning phase. After 9 minutes had elapsed, subjects were allowed to pass the test regardless of how many errors they made. After passing the test, subjects were asked to rate the complexity of the set compared to other sets with 15 pairs. Ratings were provided on a 7 point scale. Subjects were then asked to imagine that a new letter (e.g. letter 9) had belonged to the current alphabet, and were given two inductive tasks. First they were asked to enter between 1 and 10 novel pairs that they might have expected to see (each novel pair was required to include the new letter). Next they were told about a novel pair that belonged to the set (e.g. pair 91), and were again asked to enter up to 10 additional pairs that they might have expected to see. Results. The average time needed to learn each set is shown in Fig. 2a, and ratings of set complexity are shown in Fig. 2b. It is encouraging that these measures yield converging results, but they may be confounded since subjects rated the complexity of a set immediately after learning it. The complexities plotted in Fig. 2c are the complexities of the theories shown in Fig. 1, which we believe to be the simplest theories according to our complexity measure. The final plot in Fig. 2 shows complexities according to Goodman’s model, which assigns each binary relation an integer between 0 and 4. There are several differences between these models: for instance, Goodman’s account incorrectly predicts that the exception case is the hardest of the six, but our model acknowledges that a simple theory remains simple if a handful of exceptions are added. Goodman’s account also predicts that transitivity is not an important structural regularity, but our model correctly predicts that the transitive set is simpler than the same set with some of the pairs reversed (the random set). Results for the inductive task are shown in Fig. 3. The first two columns show the number of subjects who listed each novel pair. The remaining two columns show the probability of set membership predicted by our model. To generate these predictions, we applied Equation 2 and summed over a set of theories created by systematically extending the theories shown in Fig. 1. Each extended theory includes up to one additional clause for each predicate in the base theory, and each additional clause includes at most two predicate slots. For instance, each extended theory for the bipartite case is created by choosing whether or not to add the clause T(9), and adding up to one clause for predicate R.6 For the first inductive task, the likelihood term P (D|T ) (see Equation 2) is set to 0 for all theories that are not consistent with the pairs observed during training, and to a constant for all remaining theories. For the second task we assumed in addition that the novel pair observed is 6 R(9, X), ¯(2, 9), and R(X, 9) ← R(X, 2) are three possible additions. R 5 18 9 9 0.5 trans symm excep bipart 0 91 random r=0.99 1 star 18 99 19 0 91 89 99 19 89 0.5 0 91 18 18 1 9 9 99 19 r=0.96 89 0.5 0 91 99 19 0 91 89 99 19 89 18 9 1 99 19 89 r=0.98 1 9 0.5 0 91 99 19 0 91 89 99 19 89 18 9 99 19 89 0.5 0 81 88 18 0 78 81 88 18 78 0 18 18 9 0 0 71 77 17 67 71 77 17 67 18 18 81 9 88 18 r=0.62 78 71 77 17 67 Human (no examples) 0 71 77 17 67 Human (1 example) 0 0 91 99 19 89 r=0.99 0 81 88 18 78 71 77 17 67 1 71 77 17 67 r=0.38 0.5 0 89 r=0.93 0.5 1 9 99 19 r=0.99 1 0.5 0 89 r=0.99 0.5 1 9 0 91 1 r=0.88 1 9 99 19 0.5 0 91 18 0 91 0.5 0 91 18 r=0.99 1 0 r=0.74 1 0.5 71 77 17 67 RL (no examples) 0 71 77 17 67 RL (one example) Figure 3: Data and model predictions for the induction task in Experiment 1. Columns 1 and 3 show predictions before any pairs involving the new letter are observed. Columns 2 and 4 show predictions after a single novel pair (marked with a gray bar) is observed to belong to the set. The model plots for each condition include correlations with the human data. sampled at random from all pairs involving the new letter.7 All model predictions were computed using Mace4 [18] to generate the extension of each theory considered. The supporting material includes predictions for a model based on the Goodman complexity measure and an exemplar model which assumes that the new letter will be just like one of the old letters.8 The exemplar model outperforms our model in the random condition, and makes accurate predictions about three other conditions. Overall, however, our model performs better than the two baselines. Here we focus on two important predictions that are not well handled by the exemplar model. In the symmetry condition, almost all subjects predict that 78 belongs to the set after learning that 87 belongs to the set, suggesting that they have learned an abstract rule. In the transitive condition, most subjects predict that pairs 72 through 76 belong to the set after learning that 71 belongs to the set. Our model accounts for this result, but the exemplar model has no basis for making predictions about letter 7, since this letter is now known to be unlike any of the others. 2.2 Experiment 2: learning from positive examples During the learning phase of our first experiment, subjects learned a theory based on positive examples (the theory included all pairs they had seen) and negative examples (the theory ruled out all pairs they had not seen). Often, however, humans learn theories based on positive examples alone. Suppose, for instance, that our anthropologist has spent only a few hours with a new tribe. She may have observed several pairs who are obviously friends, but should realize that many other pairs of friends have not yet interacted in her presence. 7 For the second task, P (D|T ) is set to 0 for theories that are inconsistent with the training pairs and theories 1 which do not include the observed novel pair. For all remaining theories, P (D|T ) is set to n , where n is the total number of novel pairs that are consistent with T . 8 Supporting material is available at www.charleskemp.com 6 1 221 331 441 551 c) 7 1 R(X, X, Y). 221 443 552 663 d) 7 1 R(X, Y, Z). 231 456 615 344 e) 7 1 −10 −5 −0.1 −10 −20 −20 −10 −0.2 −20 777 771 778 789 237 777 771 778 789 237 −10 231 234 235 236 0 777 771 778 789 237 0 777 771 778 789 237 0 777 771 778 789 237 0 RL 0 R(2, 3, X). 777 771 778 789 237 7 R(X, X, 1). 777 771 778 789 237 b) 777 771 778 789 237 1 111 222 333 444 777 771 778 789 237 7 R(X, X, X). 777 771 778 789 237 Human a) Figure 4: Data and model predictions for Experiment 2. The four triples observed for each set are shown at the top of the figure. The first row of plots shows average ratings on a scale from 1 (very unlikely to belong to the set) to 7 (very likely). Model predictions are plotted as log probabilities. Our framework can handle cases like these if we assume that the data D in Equation 2 are sampled from the ground terms that are true according to the underlying theory. We follow [10] and [13] and use a distribution P (D|T ) which assumes that the examples in D are randomly sampled with replacement from the ground terms that are true. This sampling assumption encourages our model to identify the theory with the smallest extension that is compatible with all of the training examples. We tested this approach by designing an experiment where learners were given sets of examples that were compatible with several underlying theories. Materials and Methods. 15 adults participated in this experiment immediately after taking Experiment 1. In each of five conditions, subjects were told about a set of triples built from an alphabet of 9 letters. They were shown four triples that belonged to the set (Fig. 4), and told that the set might include triples that they had not seen. Subjects then gave ratings on a seven point scale to indicate whether five additional triples (see Fig. 4) were likely to belong to the set. Results. Average ratings and model predictions are shown in Fig. 4. Model predictions for each condition were computed using Equation 2 and summing over a space of theories that included the five theories shown at the top of Fig. 4, variants of these five theories which stated that certain pairs of slots could not be occupied by the same constant,9 and theories that included no variables but merely enumerated up to 5 triples.10 Although there are general theories like R(X, Y, Z) that are compatible with the triples observed in all five conditions, Fig. 4 shows that people were sensitive to different regularities in each case.11 We focus on one condition (Fig. 4b) that exposes the strengths and weaknesses of our model. According to our model, the two most probable theories given the triples for this condition are R(X, X, 1) and the closely related variant that rules out R(1, 1, 1). The next most probable theory is R(X, X, Y). These predictions are consistent with people’s judgments that 771 is very likely to belong to the set, and that 778 is the next most likely option. Unlike our model, however, people consider 777 to be substantially less likely than 778 to belong to the set. This result may suggest that the variant of R(X, X, Y) that rules out R(X, X, X) deserves a higher prior probability than our model recognizes. To better account for cases like this, it may be worth considering languages where any two variables that belong to the same clause but have different names must refer to different entities. 3 Discussion and Conclusion There are many psychological models of concept learning [4, 12, 13], but few that use representations rich enough to capture the content of intuitive theories. We suggested that intuitive theories are mentally represented in a first-order logical language, and proposed a specific hypothesis about 9 ¯ One such theory includes two clauses: R(X, X, Y). R(X, X, X). One such theory is the following list of clauses: R(2, 2, 1). R(3, 3, 1). R(4, 4, 1). R(5, 5, 1). R(7, 7, 7). 11 Similar results have been found with 9-month old infants. Cases like Figs. 4b and 4c have been tested in an infant language-learning study where the stimuli were three-syllable strings [19]. 9-month old infants exposed to strings like the four in Fig. 4c generalized to other strings consistent with the theory R(X, X, Y), but infants in the condition corresponding to Fig. 4b generalized only to strings consistent with the theory R(X, X, 1). 10 7 this “language of thought.” We assumed that the subjective complexity of a theory depends on the length of its representation in this language, and described experiments which suggest that the resulting complexity measure helps to explain how theories are learned and used for inductive inference. Our experiments deliberately used stimuli that minimize the influence of prior knowledge. Theories, however, are cumulative, and the theory that seems simplest to a learner will often depend on her background knowledge. Our approach provides a natural place for background knowledge to be inserted. A learner can be supplied with a stock of background predicates, and the shortest representation for a data set will depend on which background predicates are available. Since different sets of predicates will lead to different predictions about subjective complexity, empirical results can help to determine the background knowledge that people bring to a given class of problems. Future work should aim to refine the representation language and complexity measure we proposed. We expect that something like our approach will be suitable for modeling a broad class of intuitive theories, but the specific framework presented here can almost certainly be improved. Future work should also consider different strategies for searching the space of theories. Some of the strategies developed in the ILP literature should be relevant [14], but a detailed investigation of search algorithms seems premature until our approach has held up to additional empirical tests. It is comparatively easy to establish whether the theories that are simple according to our approach are also considered simple by people, and our experiments have made a start in this direction. It is much harder to establish that our approach captures most of the theories that are subjectively simple, and more exhaustive experiments are needed before this conclusion can be drawn. Boolean concept learning has been studied for more than fifty years [4, 9], and many psychologists have made empirical and theoretical contributions to this field. An even greater effort will be needed to crack the problem of theory learning, since the space of intuitive theories is much richer than the space of Boolean concepts. The difficulty of this problem should not be underestimated, but computational approaches can contribute part of the solution. Acknowledgments Supported by the William Asbjornsen Albert memorial fellowship (CK), the James S. McDonnell Foundation Causal Learning Collaborative Initiative (NDG, JBT) and the Paul E. Newton chair (JBT). References [1] N. Goodman. The structure of appearance. 2nd edition, 1961. [2] S. Carey. Conceptual change in childhood. MIT Press, Cambridge, MA, 1985. [3] H. A. Simon. Complexity and the representation of patterned sequences of symbols. Psychological Review, 79:369–382, 1972. [4] J. Feldman. An algebra of human concept learning. JMP, 50:339–368, 2006. [5] N. Chater and P. Vitanyi. Simplicity: a unifying principle in cognitive science. TICS, 7:19–22, 2003. [6] J. T. Krueger. A theory of structural simplicity and its relevance to aspects of memory, perception, and conceptual naturalness. PhD thesis, University of Pennsylvania, 1979. [7] P. Gr¨ nwald, I. J. Myung, and M. Pitt, editors. Advances in Minimum Description Length: Theory and u Applications. 2005. [8] N. Chater. Reconciling simplicity and likelihood principles in perceptual organization. Psychological Review, 103:566–581, 1996. [9] J. A. Bruner, J. S. Goodnow, and G. J. Austin. A study of thinking. Wiley, 1956. [10] D. Conklin and I. H. Witten. Complexity-based induction. Machine Learning, 16(3):203–225, 1994. [11] G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(1):81–97, 1956. [12] N. D. Goodman, T. L. Griffiths, J. Feldman, and J. B. Tenenbaum. A rational analysis of rule-based concept learning. In CogSci, 2007. [13] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. BBS, 24:629–641, 2001. [14] S. Muggleton and L. De Raedt. Inductive logic programming: theory and methods. Journal of Logic Programming, 19-20:629–679, 1994. [15] N. Chomsky. The logical structure of linguistic theory. University of Chicago Press, Chicago, 1975. [16] E. L. J. Leeuwenberg. A perceptual coding language for visual and auditory patterns. American Journal of Psychology, 84(3):307–349, 1971. [17] B. Edmonds. Syntactic measures of complexity. PhD thesis, University of Manchester, 1999. [18] W. McCune. Mace4 reference manual and guide. Technical Report ANL/MCS-TM-264, Argonne National Laboratory, 2003. [19] L. Gerken. Decisions, decisions: infant language learning when multiple generalizations are possible. Cognition, 98(3):67–74, 2006. 8

4 0.1067856 203 nips-2007-The rat as particle filter

Author: Aaron C. Courville, Nathaniel D. Daw

Abstract: Although theorists have interpreted classical conditioning as a laboratory model of Bayesian belief updating, a recent reanalysis showed that the key features that theoretical models capture about learning are artifacts of averaging over subjects. Rather than learning smoothly to asymptote (reflecting, according to Bayesian models, the gradual tradeoff from prior to posterior as data accumulate), subjects learn suddenly and their predictions fluctuate perpetually. We suggest that abrupt and unstable learning can be modeled by assuming subjects are conducting inference using sequential Monte Carlo sampling with a small number of samples — one, in our simulations. Ensemble behavior resembles exact Bayesian models since, as in particle filters, it averages over many samples. Further, the model is capable of exhibiting sophisticated behaviors like retrospective revaluation at the ensemble level, even given minimally sophisticated individuals that do not track uncertainty in their beliefs over trials. 1

5 0.10665813 3 nips-2007-A Bayesian Model of Conditioned Perception

Author: Alan Stocker, Eero P. Simoncelli

Abstract: unkown-abstract

6 0.098545805 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

7 0.094071202 213 nips-2007-Variational Inference for Diffusion Processes

8 0.08520972 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

9 0.082941987 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

10 0.080931552 143 nips-2007-Object Recognition by Scene Alignment

11 0.079132795 198 nips-2007-The Noisy-Logical Distribution and its Application to Causal Inference

12 0.069656372 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression

13 0.061952781 7 nips-2007-A Kernel Statistical Test of Independence

14 0.060127176 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

15 0.058179747 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

16 0.057853375 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

17 0.057408869 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI

18 0.052562367 97 nips-2007-Hidden Common Cause Relations in Relational Learning

19 0.050822172 145 nips-2007-On Sparsity and Overcompleteness in Image Models

20 0.050506823 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.185), (1, -0.007), (2, 0.037), (3, -0.08), (4, -0.037), (5, 0.132), (6, 0.004), (7, 0.049), (8, -0.097), (9, -0.116), (10, -0.036), (11, -0.06), (12, -0.027), (13, -0.025), (14, 0.027), (15, -0.018), (16, -0.021), (17, 0.044), (18, 0.094), (19, -0.095), (20, 0.133), (21, 0.099), (22, -0.065), (23, -0.036), (24, -0.008), (25, -0.01), (26, -0.098), (27, -0.023), (28, 0.114), (29, -0.001), (30, 0.007), (31, -0.059), (32, 0.012), (33, -0.041), (34, 0.187), (35, 0.082), (36, 0.021), (37, 0.13), (38, -0.138), (39, -0.131), (40, -0.026), (41, 0.141), (42, -0.011), (43, -0.091), (44, -0.123), (45, -0.049), (46, -0.046), (47, 0.206), (48, 0.023), (49, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9661172 125 nips-2007-Markov Chain Monte Carlo with People

Author: Adam Sanborn, Thomas L. Griffiths

Abstract: Many formal models of cognition implicitly use subjective probability distributions to capture the assumptions of human learners. Most applications of these models determine these distributions indirectly. We propose a method for directly determining the assumptions of human learners by sampling from subjective probability distributions. Using a correspondence between a model of human choice and Markov chain Monte Carlo (MCMC), we describe a method for sampling from the distributions over objects that people associate with different categories. In our task, subjects choose whether to accept or reject a proposed change to an object. The task is constructed so that these decisions follow an MCMC acceptance rule, defining a Markov chain for which the stationary distribution is the category distribution. We test this procedure for both artificial categories acquired in the laboratory, and natural categories acquired from experience. 1

2 0.68632585 3 nips-2007-A Bayesian Model of Conditioned Perception

Author: Alan Stocker, Eero P. Simoncelli

Abstract: unkown-abstract

3 0.60652834 203 nips-2007-The rat as particle filter

Author: Aaron C. Courville, Nathaniel D. Daw

Abstract: Although theorists have interpreted classical conditioning as a laboratory model of Bayesian belief updating, a recent reanalysis showed that the key features that theoretical models capture about learning are artifacts of averaging over subjects. Rather than learning smoothly to asymptote (reflecting, according to Bayesian models, the gradual tradeoff from prior to posterior as data accumulate), subjects learn suddenly and their predictions fluctuate perpetually. We suggest that abrupt and unstable learning can be modeled by assuming subjects are conducting inference using sequential Monte Carlo sampling with a small number of samples — one, in our simulations. Ensemble behavior resembles exact Bayesian models since, as in particle filters, it averages over many samples. Further, the model is capable of exhibiting sophisticated behaviors like retrospective revaluation at the ensemble level, even given minimally sophisticated individuals that do not track uncertainty in their beliefs over trials. 1

4 0.57549536 114 nips-2007-Learning and using relational theories

Author: Charles Kemp, Noah Goodman, Joshua B. Tenenbaum

Abstract: Much of human knowledge is organized into sophisticated systems that are often called intuitive theories. We propose that intuitive theories are mentally represented in a logical language, and that the subjective complexity of a theory is determined by the length of its representation in this language. This complexity measure helps to explain how theories are learned from relational data, and how they support inductive inferences about unobserved relations. We describe two experiments that test our approach, and show that it provides a better account of human learning and reasoning than an approach developed by Goodman [1]. What is a theory, and what makes one theory better than another? Questions like these are of obvious interest to philosophers of science but are also discussed by psychologists, who have argued that everyday knowledge is organized into rich and complex systems that are similar in many respects to scientific theories. Even young children, for instance, have systematic beliefs about domains including folk physics, folk biology, and folk psychology [2]. Intuitive theories like these play many of the same roles as scientific theories: in particular, both kinds of theories are used to explain and encode observations of the world, and to predict future observations. This paper explores the nature, use and acquisition of simple theories. Consider, for instance, an anthropologist who has just begun to study the social structure of a remote tribe, and observes that certain words are used to indicate relationships between selected pairs of individuals. Suppose that term T1(·, ·) can be glossed as ancestor(·, ·), and that T2(·, ·) can be glossed as friend(·, ·). The anthropologist might discover that the first term is transitive, and that the second term is symmetric with a few exceptions. Suppose that term T3(·, ·) can be glossed as defers to(·, ·), and that the tribe divides into two castes such that members of the second caste defer to members of the first caste. In this case the anthropologist might discover two latent concepts (caste 1(·) and caste 2(·)) along with the relationship between these concepts. As these examples suggest, a theory can be defined as a system of laws and concepts that specify the relationships between the elements in some domain [2]. We will consider how these theories are learned, how they are used to encode relational data, and how they support predictions about unobserved relations. Our approach to all three problems relies on the notion of subjective complexity. We propose that theory learners prefer simple theories, that people remember relational data in terms of the simplest underlying theory, and that people extend a partially observed data set according to the simplest theory that is consistent with their observations. There is no guarantee that a single measure of subjective complexity can do all of the work that we require [3]. This paper, however, explores the strong hypothesis that a single measure will suffice. Our formal treatment of subjective complexity begins with the question of how theories are mentally represented. We suggest that theories are represented in some logical language, and propose a specific first-order language that serves as a hypothesis about the “language of thought.” We then pursue the idea that the subjective complexity of a theory corresponds to the length of its representation in this language. Our approach therefore builds on the work of Feldman [4], and is related to other psychological applications of the notion of Kolmogorov complexity [5]. The complexity measure we describe can be used to define a probability distribution over a space of theories, and we develop a model of theory acquisition by using this distribution as the prior for a Bayesian learner. We also 1 (a) Star 11 (b) Bipartite (c) Exception 22 33 44 55 66 77 88 16 26 36 46 56 21 31 41 51 61 71 81 11 17 27 37 47 57 18 28 38 48 58 R(X, X). T(6). T(7). T(8). R(X, Y) ← ¯(X), T(Y). T R(X, 1). (d) Symmetric (e) Transitive 11 22 33 44 55 66 77 13 31 12 21 24 42 56 65 26 36 46 56 17 27 37 47 57 18 28 38 48 58 T(6). T(7). T(8). R(X, Y) ← ¯(X), T(Y). T ¯ R(1, 1). R(1, 6). (f) Random 12 21 13 23 14 24 34 13 32 14 24 34 15 25 35 45 16 26 36 46 56 51 52 35 54 61 26 63 46 56 R(1, 2). R(1, 3). R(2, 4). R(5, 6). R(1, 2). R(2, 3). R(3, 4). R(5, X). R(X, 4). R(X, Y) ← R(Y, X). R(X, X). R(4, 5). R(5, 6). R(X, Z) ← R(X, Y), R(Y, Z). R(2, 1). R(1, 3). R(6, 1). R(3, 2). R(2, 6). R(3, 5). R(6, 3). R(4, 6). ¯ ¯ ¯ R(X, X). R(6, 4). R(5, 3). Figure 1: Six possible extensions for a binary predicate R(·, ·). In each case, the objects in the domain are represented as digits, and a pair such as 16 indicates that R(1, 6) is true. Below each set of pairs, the simplest theory according to our complexity measure is shown. show how the same Bayesian approach helps to explain how theories support inductive generalization: given a set of observations, future observations (e.g. whether one individual defers to another) can be predicted using the posterior distribution over the space of theories. We test our approach by developing two experiments where people learn and make predictions about binary and ternary relations. As far as we know, the approach of Goodman [1] is the only other measure of theory complexity that has previously been tested as a psychological model [6]. We show that our experiments support our approach and raise challenges for this alternative model. 1 Theory complexity: a representation length approach Intuitive theories correspond to mental representations of some sort, and our first task is to characterize the elements used to build these representations. We explore the idea that a theory is a system of statements in a logical language, and six examples are shown in Fig. 1. The theory in Fig. 1b is related to the defers to(·, ·) example already described. Here we are interested in a domain including 9 elements, and a two place predicate R(·, ·) that is true of all and only the 15 pairs shown. R is defined using a unary predicate T which is true of only three elements: 6, 7, and 8. The theory includes a clause which states that R(X, Y) is true for all pairs XY such that T(X) is false and T(Y) is true. The theory in Fig. 1c is very similar, but includes an additional clause which specifies that R(1, 1) is true, and an exception which specifies that R(1, 6) is false. Formally, each theory we consider is a collection of function-free definite clauses. All variables are universally quantified: for instance, the clause R(X, Z) ← R(X, Y), R(Y, Z) is equivalent to the logical formula ∀x ∀y ∀z (R(x, z) ← R(x, y) ∧ R(y, z)). For readability, the theories in Fig. 1 include parentheses and arrows, but note that these symbols are unnecessary and can be removed. Our proposed language includes only predicate symbols, variable symbols, constant symbols, and a period that indicates when one clause finishes and another begins. Each theory in Fig. 1 specifies the extension of one or more predicates. The extension of predicate P is defined in terms of predicate P+ (which captures the basic rules that lead to membership in P) and predicate P− (which captures exceptions to these rules). The resulting extension of P is defined 2 as P+ \ P− , or the set difference of P+ and P− .1 Once P has been defined, later clauses in the theory may refer to P or its negation ¯. To ensure that our semantics is well-defined, the predicates P in any valid theory must permit an ordering so that the definition of any predicate does not refer to predicates that follow it in the order. Formally, the definition of each predicate P+ or P− can refer only to itself (recursive definitions are allowed) and to any predicate M or ¯ where M < P. M Once we have committed to a specific language, the subjective complexity of a theory is assumed to correspond to the number of symbols in its representation. We have chosen a language where there is one symbol for each position in a theory where a predicate, variable or constant appears, and one symbol to indicate when each clause ends. Given this language, the subjective complexity c(T ) of theory T is equal to the sum of the number of clauses in the theory and the number of positions in the theory where a predicate, variable or constant appears: c(T ) = #clauses(T ) + #pred slots(T ) + #var slots(T ) + #const slots(T ). (1) For instance, the clause R(X, Z) ← R(X, Y), R(Y, Z). contributes ten symbols towards the complexity of a theory (three predicate symbols, six variable symbols, and one period). Other languages might be considered: for instance, we could use a language which uses five symbols (e.g. five bits) to represent each predicate, variable and constant, and one symbol (e.g. one bit) to indicate the end of a clause. Our approach to subjective complexity depends critically on the representation language, but once a language has been chosen the complexity measure is uniquely specified. Although our approach is closely related to the notion of Kolmogorov complexity and to Minimum Message Length (MML) and Minimum Description Length (MDL) approaches, we refer to it as a Representation Length (RL) approach. A RL approach includes a commitment to a specific language that is proposed as a psychological hypothesis, but these other approaches aspire towards results that do not depend on the language chosen.2 It is sometimes suggested that the notion of Kolmogorov complexity provides a more suitable framework for psychological research than the RL approach, precisely because it allows for results that do not depend on a specific description language [8]. We subscribe to the opposite view. Mental representations presumably rely on some particular language, and identifying this language is a central challenge for psychological research. The language we described should be considered as a tentative approximation of the language of thought. Other languages can and should be explored, but our language has several appealing properties. Feldman [4] has argued that definite clauses are psychologically natural, and working with these representations allows our approach to account for several classic results from the concept learning literature. For instance, our language leads to the prediction that conjunctive concepts are easier to learn than disjunctive concepts [9].3 Working with definite clauses also ensures that each of our theories has a unique minimal model, which means that the extension of a theory can be defined in a particularly simple way. Finally, human learners deal gracefully with noise and exceptions, and our language provides a simple way to handle exceptions. Any concrete proposal about the language of thought should make predictions about memory, learning and reasoning. Suppose that data set D lists the extensions of one or more predicates, and that a theory is a “candidate theory” for D if it correctly defines the extensions of all predicates in D. Note that a candidate theory may well include latent predicates—predicates that do not appear in D, but are useful for defining the predicates that have been observed. We will assume that humans encode D in terms of the simplest candidate theory for D, and that the difficulty of memorizing D is determined by the subjective complexity of this theory. Our approach can and should be tested against classic results from the memory literature. Unlike some other approaches to complexity [10], for instance, our model predicts that a sequence of k items is about equally easy to remember regardless of whether the items are drawn from a set of size 2, a set of size 10, or a set of size 1000 [11]. 1 The extension of P+ is the smallest set that satisfies all of the clauses that define P+ , and the extension of P is defined similarly. To simplify our notation, Fig. 1 uses P to refer to both P and P+ , and ¯ to refer to ¯ and P P P− . Any instance of P that appears in a clause defining P is really an instance of P+ , and any instance of ¯ that P appears in a clause defining ¯ is really an instance of P− . P 2 MDL approaches also commit to a specific language, but this language is often intended to be as general as possible. See, for instance, the discussion of universal codes in Gr¨ nwald et al. [7]. u 3 A conjunctive concept C(·) can be defined using a single clause: C(X) ← A(X), B(X). The shortest definition of a disjunctive concept requires two clauses: D(X) ← A(X). D(X) ← B(X). − 3 To develop a model of inductive learning and reasoning, we take a Bayesian approach, and use our complexity measure to define a prior distribution over a hypothesis space of theories: P (T ) ∝ 2−c(T ) .4 Given this prior distribution, we can use Bayesian inference to make predictions about unobserved relations and to discover the theory T that best accounts for the observations in data set D [12, 13]. Suppose that we have a likelihood function P (D|T ) which specifies how the examples in D were generated from some underlying theory T . The best explanation for the data D is the theory that maximizes the posterior distribution P (T |D) ∝ P (D|T )P (T ). If we need to predict whether ground term g is likely to be true, 5 we can sum over the space of theories: P (g|D) = P (g|T )P (T |D) = T 1 P (D) P (D|T )P (T ) (2) T :g∈T where the final sum is over all theories T that make ground term g true. 1.1 Related work The theories we consider are closely related to logic programs, and methods for Inductive Logic Programming (ILP) explore how these programs can be learned from examples [14]. ILP algorithms are often inspired by the idea of searching for the shortest theory that accounts for the available data, and ILP is occasionally cast as the problem of minimizing an explicit MDL criterion [10]. Although ILP algorithms are rarely considered as cognitive models, the RL approach has a long psychological history, and is proposed by Chomsky [15] and Leeuwenberg [16] among others. Formal measures of complexity have been developed in many fields [17], and there is at least one other psychological account of theory complexity. Goodman [1] developed a complexity measure that was originally a philosophical proposal about scientific theories, but was later tested as a model of subjective complexity [6]. A detailed description of this measure is not possible here, but we attempt to give a flavor of the approach. Suppose that a basis is a set of predicates. The starting point for Goodman’s model is the intuition that basis B1 is at least as complex as basis B2 if B1 can be used to define B2. Goodman argues that this intuition is flawed, but his model is founded on a refinement of this intuition. For instance, since the binary predicate in Fig. 1b can be defined in terms of two unary predicates, Goodman’s approach requires that the complexity of the binary predicate is no more than the sum of the complexities of the two unary predicates. We will use Goodman’s model as a baseline for evaluating our own approach, and a comparison between these two models should be informed by both theoretical and empirical considerations. On the theoretical side, our approach relies on a simple principle for deciding which structural properties are relevant to the measurement of complexity: the relevant properties are those with short logical representations. Goodman’s approach incorporates no such principle, and he proposes somewhat arbitrarily that reflexivity and symmetry are among the relevant structural properties but that transitivity is not. A second reason for preferring our model is that it makes contact with a general principle—the idea that simplicity is related to representation length—that has found many applications across psychology, machine learning, and philosophy. 2 Experimental results We designed two experiments to explore settings where people learn, remember, and make inductive inferences about relational data. Although theories often consist of systems of many interlocking relations, we keep our experiments simple by asking subjects to learn and reason about a single relation at a time. Despite this restriction, our experiments still make contact with several issues raised by systems of relations. As the defers to(·, ·) example suggests, a single relation may be best explained as the observable tip of a system involving several latent predicates (e.g. caste 1(·) and caste 2(·)). 4 To ensure that this distribution can be normalized, we assume that there is some upper bound on the number of predicate symbols, variable symbols, and constants, and on the length of the theories we will consider. There will therefore be a finite number of possible theories, and our prior will be a valid probability distribution. 5 A ground term is a term such as R(8, 9) that does not include any variables. 4 Learning time Complexity (RL) Complexity (Human) 6 300 Complexity (Goodman) 4 20 0 0 0 star bprt excp sym trans rand 2 0 star bprt excp sym trans rand 2 star bprt excp sym trans rand 100 200 star bprt excp sym trans rand 4 40 Figure 2: (a) Average time in seconds to learn the six sets in Fig. 1. (b) Average ratings of set complexity. (c) Complexity scores according to our representation length (RL) model. (d) Complexity scores according to Goodman’s model. 2.1 Experiment 1: memory and induction In our first experiment, we studied the subjective complexity of six binary relations that display a range of structural properties, including reflexivity, symmetry, and transitivity. Materials and Methods. 18 adults participated in this experiment. Subjects were required to learn the 6 sets shown in Fig. 1, and to make inductive inferences about each set. Although Fig. 1 shows pairs of digits, the experiment used letter pairs, and the letters for each condition and the order in which these conditions were presented were randomized across subjects. The pairs for each condition were initially laid out randomly on screen, and subjects could drag them around and organize them to help them understand the structure of the set. At any stage, subjects could enter a test phase where they were asked to list the 15 pairs belonging to the current set. Subjects who made an error on the test were returned to the learning phase. After 9 minutes had elapsed, subjects were allowed to pass the test regardless of how many errors they made. After passing the test, subjects were asked to rate the complexity of the set compared to other sets with 15 pairs. Ratings were provided on a 7 point scale. Subjects were then asked to imagine that a new letter (e.g. letter 9) had belonged to the current alphabet, and were given two inductive tasks. First they were asked to enter between 1 and 10 novel pairs that they might have expected to see (each novel pair was required to include the new letter). Next they were told about a novel pair that belonged to the set (e.g. pair 91), and were again asked to enter up to 10 additional pairs that they might have expected to see. Results. The average time needed to learn each set is shown in Fig. 2a, and ratings of set complexity are shown in Fig. 2b. It is encouraging that these measures yield converging results, but they may be confounded since subjects rated the complexity of a set immediately after learning it. The complexities plotted in Fig. 2c are the complexities of the theories shown in Fig. 1, which we believe to be the simplest theories according to our complexity measure. The final plot in Fig. 2 shows complexities according to Goodman’s model, which assigns each binary relation an integer between 0 and 4. There are several differences between these models: for instance, Goodman’s account incorrectly predicts that the exception case is the hardest of the six, but our model acknowledges that a simple theory remains simple if a handful of exceptions are added. Goodman’s account also predicts that transitivity is not an important structural regularity, but our model correctly predicts that the transitive set is simpler than the same set with some of the pairs reversed (the random set). Results for the inductive task are shown in Fig. 3. The first two columns show the number of subjects who listed each novel pair. The remaining two columns show the probability of set membership predicted by our model. To generate these predictions, we applied Equation 2 and summed over a set of theories created by systematically extending the theories shown in Fig. 1. Each extended theory includes up to one additional clause for each predicate in the base theory, and each additional clause includes at most two predicate slots. For instance, each extended theory for the bipartite case is created by choosing whether or not to add the clause T(9), and adding up to one clause for predicate R.6 For the first inductive task, the likelihood term P (D|T ) (see Equation 2) is set to 0 for all theories that are not consistent with the pairs observed during training, and to a constant for all remaining theories. For the second task we assumed in addition that the novel pair observed is 6 R(9, X), ¯(2, 9), and R(X, 9) ← R(X, 2) are three possible additions. R 5 18 9 9 0.5 trans symm excep bipart 0 91 random r=0.99 1 star 18 99 19 0 91 89 99 19 89 0.5 0 91 18 18 1 9 9 99 19 r=0.96 89 0.5 0 91 99 19 0 91 89 99 19 89 18 9 1 99 19 89 r=0.98 1 9 0.5 0 91 99 19 0 91 89 99 19 89 18 9 99 19 89 0.5 0 81 88 18 0 78 81 88 18 78 0 18 18 9 0 0 71 77 17 67 71 77 17 67 18 18 81 9 88 18 r=0.62 78 71 77 17 67 Human (no examples) 0 71 77 17 67 Human (1 example) 0 0 91 99 19 89 r=0.99 0 81 88 18 78 71 77 17 67 1 71 77 17 67 r=0.38 0.5 0 89 r=0.93 0.5 1 9 99 19 r=0.99 1 0.5 0 89 r=0.99 0.5 1 9 0 91 1 r=0.88 1 9 99 19 0.5 0 91 18 0 91 0.5 0 91 18 r=0.99 1 0 r=0.74 1 0.5 71 77 17 67 RL (no examples) 0 71 77 17 67 RL (one example) Figure 3: Data and model predictions for the induction task in Experiment 1. Columns 1 and 3 show predictions before any pairs involving the new letter are observed. Columns 2 and 4 show predictions after a single novel pair (marked with a gray bar) is observed to belong to the set. The model plots for each condition include correlations with the human data. sampled at random from all pairs involving the new letter.7 All model predictions were computed using Mace4 [18] to generate the extension of each theory considered. The supporting material includes predictions for a model based on the Goodman complexity measure and an exemplar model which assumes that the new letter will be just like one of the old letters.8 The exemplar model outperforms our model in the random condition, and makes accurate predictions about three other conditions. Overall, however, our model performs better than the two baselines. Here we focus on two important predictions that are not well handled by the exemplar model. In the symmetry condition, almost all subjects predict that 78 belongs to the set after learning that 87 belongs to the set, suggesting that they have learned an abstract rule. In the transitive condition, most subjects predict that pairs 72 through 76 belong to the set after learning that 71 belongs to the set. Our model accounts for this result, but the exemplar model has no basis for making predictions about letter 7, since this letter is now known to be unlike any of the others. 2.2 Experiment 2: learning from positive examples During the learning phase of our first experiment, subjects learned a theory based on positive examples (the theory included all pairs they had seen) and negative examples (the theory ruled out all pairs they had not seen). Often, however, humans learn theories based on positive examples alone. Suppose, for instance, that our anthropologist has spent only a few hours with a new tribe. She may have observed several pairs who are obviously friends, but should realize that many other pairs of friends have not yet interacted in her presence. 7 For the second task, P (D|T ) is set to 0 for theories that are inconsistent with the training pairs and theories 1 which do not include the observed novel pair. For all remaining theories, P (D|T ) is set to n , where n is the total number of novel pairs that are consistent with T . 8 Supporting material is available at www.charleskemp.com 6 1 221 331 441 551 c) 7 1 R(X, X, Y). 221 443 552 663 d) 7 1 R(X, Y, Z). 231 456 615 344 e) 7 1 −10 −5 −0.1 −10 −20 −20 −10 −0.2 −20 777 771 778 789 237 777 771 778 789 237 −10 231 234 235 236 0 777 771 778 789 237 0 777 771 778 789 237 0 777 771 778 789 237 0 RL 0 R(2, 3, X). 777 771 778 789 237 7 R(X, X, 1). 777 771 778 789 237 b) 777 771 778 789 237 1 111 222 333 444 777 771 778 789 237 7 R(X, X, X). 777 771 778 789 237 Human a) Figure 4: Data and model predictions for Experiment 2. The four triples observed for each set are shown at the top of the figure. The first row of plots shows average ratings on a scale from 1 (very unlikely to belong to the set) to 7 (very likely). Model predictions are plotted as log probabilities. Our framework can handle cases like these if we assume that the data D in Equation 2 are sampled from the ground terms that are true according to the underlying theory. We follow [10] and [13] and use a distribution P (D|T ) which assumes that the examples in D are randomly sampled with replacement from the ground terms that are true. This sampling assumption encourages our model to identify the theory with the smallest extension that is compatible with all of the training examples. We tested this approach by designing an experiment where learners were given sets of examples that were compatible with several underlying theories. Materials and Methods. 15 adults participated in this experiment immediately after taking Experiment 1. In each of five conditions, subjects were told about a set of triples built from an alphabet of 9 letters. They were shown four triples that belonged to the set (Fig. 4), and told that the set might include triples that they had not seen. Subjects then gave ratings on a seven point scale to indicate whether five additional triples (see Fig. 4) were likely to belong to the set. Results. Average ratings and model predictions are shown in Fig. 4. Model predictions for each condition were computed using Equation 2 and summing over a space of theories that included the five theories shown at the top of Fig. 4, variants of these five theories which stated that certain pairs of slots could not be occupied by the same constant,9 and theories that included no variables but merely enumerated up to 5 triples.10 Although there are general theories like R(X, Y, Z) that are compatible with the triples observed in all five conditions, Fig. 4 shows that people were sensitive to different regularities in each case.11 We focus on one condition (Fig. 4b) that exposes the strengths and weaknesses of our model. According to our model, the two most probable theories given the triples for this condition are R(X, X, 1) and the closely related variant that rules out R(1, 1, 1). The next most probable theory is R(X, X, Y). These predictions are consistent with people’s judgments that 771 is very likely to belong to the set, and that 778 is the next most likely option. Unlike our model, however, people consider 777 to be substantially less likely than 778 to belong to the set. This result may suggest that the variant of R(X, X, Y) that rules out R(X, X, X) deserves a higher prior probability than our model recognizes. To better account for cases like this, it may be worth considering languages where any two variables that belong to the same clause but have different names must refer to different entities. 3 Discussion and Conclusion There are many psychological models of concept learning [4, 12, 13], but few that use representations rich enough to capture the content of intuitive theories. We suggested that intuitive theories are mentally represented in a first-order logical language, and proposed a specific hypothesis about 9 ¯ One such theory includes two clauses: R(X, X, Y). R(X, X, X). One such theory is the following list of clauses: R(2, 2, 1). R(3, 3, 1). R(4, 4, 1). R(5, 5, 1). R(7, 7, 7). 11 Similar results have been found with 9-month old infants. Cases like Figs. 4b and 4c have been tested in an infant language-learning study where the stimuli were three-syllable strings [19]. 9-month old infants exposed to strings like the four in Fig. 4c generalized to other strings consistent with the theory R(X, X, Y), but infants in the condition corresponding to Fig. 4b generalized only to strings consistent with the theory R(X, X, 1). 10 7 this “language of thought.” We assumed that the subjective complexity of a theory depends on the length of its representation in this language, and described experiments which suggest that the resulting complexity measure helps to explain how theories are learned and used for inductive inference. Our experiments deliberately used stimuli that minimize the influence of prior knowledge. Theories, however, are cumulative, and the theory that seems simplest to a learner will often depend on her background knowledge. Our approach provides a natural place for background knowledge to be inserted. A learner can be supplied with a stock of background predicates, and the shortest representation for a data set will depend on which background predicates are available. Since different sets of predicates will lead to different predictions about subjective complexity, empirical results can help to determine the background knowledge that people bring to a given class of problems. Future work should aim to refine the representation language and complexity measure we proposed. We expect that something like our approach will be suitable for modeling a broad class of intuitive theories, but the specific framework presented here can almost certainly be improved. Future work should also consider different strategies for searching the space of theories. Some of the strategies developed in the ILP literature should be relevant [14], but a detailed investigation of search algorithms seems premature until our approach has held up to additional empirical tests. It is comparatively easy to establish whether the theories that are simple according to our approach are also considered simple by people, and our experiments have made a start in this direction. It is much harder to establish that our approach captures most of the theories that are subjectively simple, and more exhaustive experiments are needed before this conclusion can be drawn. Boolean concept learning has been studied for more than fifty years [4, 9], and many psychologists have made empirical and theoretical contributions to this field. An even greater effort will be needed to crack the problem of theory learning, since the space of intuitive theories is much richer than the space of Boolean concepts. The difficulty of this problem should not be underestimated, but computational approaches can contribute part of the solution. Acknowledgments Supported by the William Asbjornsen Albert memorial fellowship (CK), the James S. McDonnell Foundation Causal Learning Collaborative Initiative (NDG, JBT) and the Paul E. Newton chair (JBT). References [1] N. Goodman. The structure of appearance. 2nd edition, 1961. [2] S. Carey. Conceptual change in childhood. MIT Press, Cambridge, MA, 1985. [3] H. A. Simon. Complexity and the representation of patterned sequences of symbols. Psychological Review, 79:369–382, 1972. [4] J. Feldman. An algebra of human concept learning. JMP, 50:339–368, 2006. [5] N. Chater and P. Vitanyi. Simplicity: a unifying principle in cognitive science. TICS, 7:19–22, 2003. [6] J. T. Krueger. A theory of structural simplicity and its relevance to aspects of memory, perception, and conceptual naturalness. PhD thesis, University of Pennsylvania, 1979. [7] P. Gr¨ nwald, I. J. Myung, and M. Pitt, editors. Advances in Minimum Description Length: Theory and u Applications. 2005. [8] N. Chater. Reconciling simplicity and likelihood principles in perceptual organization. Psychological Review, 103:566–581, 1996. [9] J. A. Bruner, J. S. Goodnow, and G. J. Austin. A study of thinking. Wiley, 1956. [10] D. Conklin and I. H. Witten. Complexity-based induction. Machine Learning, 16(3):203–225, 1994. [11] G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(1):81–97, 1956. [12] N. D. Goodman, T. L. Griffiths, J. Feldman, and J. B. Tenenbaum. A rational analysis of rule-based concept learning. In CogSci, 2007. [13] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. BBS, 24:629–641, 2001. [14] S. Muggleton and L. De Raedt. Inductive logic programming: theory and methods. Journal of Logic Programming, 19-20:629–679, 1994. [15] N. Chomsky. The logical structure of linguistic theory. University of Chicago Press, Chicago, 1975. [16] E. L. J. Leeuwenberg. A perceptual coding language for visual and auditory patterns. American Journal of Psychology, 84(3):307–349, 1971. [17] B. Edmonds. Syntactic measures of complexity. PhD thesis, University of Manchester, 1999. [18] W. McCune. Mace4 reference manual and guide. Technical Report ANL/MCS-TM-264, Argonne National Laboratory, 2003. [19] L. Gerken. Decisions, decisions: infant language learning when multiple generalizations are possible. Cognition, 98(3):67–74, 2006. 8

5 0.56289864 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

Author: Matthew Hoffman, Arnaud Doucet, Nando D. Freitas, Ajay Jasra

Abstract: A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.

6 0.52170342 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

7 0.44927639 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

8 0.44694778 198 nips-2007-The Noisy-Logical Distribution and its Application to Causal Inference

9 0.43031931 19 nips-2007-Active Preference Learning with Discrete Choice Data

10 0.42043278 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

11 0.41539139 150 nips-2007-Optimal models of sound localization by barn owls

12 0.4062272 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

13 0.37941167 196 nips-2007-The Infinite Gamma-Poisson Feature Model

14 0.36242577 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data

15 0.34796453 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

16 0.33097816 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

17 0.32896006 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

18 0.32406992 97 nips-2007-Hidden Common Cause Relations in Relational Learning

19 0.32287604 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model

20 0.32047316 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.03), (13, 0.057), (16, 0.013), (18, 0.108), (19, 0.018), (21, 0.074), (25, 0.2), (26, 0.014), (31, 0.024), (34, 0.019), (35, 0.014), (47, 0.12), (49, 0.016), (83, 0.109), (85, 0.033), (87, 0.028), (90, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80285251 125 nips-2007-Markov Chain Monte Carlo with People

Author: Adam Sanborn, Thomas L. Griffiths

Abstract: Many formal models of cognition implicitly use subjective probability distributions to capture the assumptions of human learners. Most applications of these models determine these distributions indirectly. We propose a method for directly determining the assumptions of human learners by sampling from subjective probability distributions. Using a correspondence between a model of human choice and Markov chain Monte Carlo (MCMC), we describe a method for sampling from the distributions over objects that people associate with different categories. In our task, subjects choose whether to accept or reject a proposed change to an object. The task is constructed so that these decisions follow an MCMC acceptance rule, defining a Markov chain for which the stationary distribution is the category distribution. We test this procedure for both artificial categories acquired in the laboratory, and natural categories acquired from experience. 1

2 0.71244359 213 nips-2007-Variational Inference for Diffusion Processes

Author: Cédric Archambeau, Manfred Opper, Yuan Shen, Dan Cornford, John S. Shawe-taylor

Abstract: Diffusion processes are a family of continuous-time continuous-state stochastic processes that are in general only partially observed. The joint estimation of the forcing parameters and the system noise (volatility) in these dynamical systems is a crucial, but non-trivial task, especially when the system is nonlinear and multimodal. We propose a variational treatment of diffusion processes, which allows us to compute type II maximum likelihood estimates of the parameters by simple gradient techniques and which is computationally less demanding than most MCMC approaches. We also show how a cheap estimate of the posterior over the parameters can be constructed based on the variational free energy. 1

3 0.71241528 3 nips-2007-A Bayesian Model of Conditioned Perception

Author: Alan Stocker, Eero P. Simoncelli

Abstract: unkown-abstract

4 0.69522566 203 nips-2007-The rat as particle filter

Author: Aaron C. Courville, Nathaniel D. Daw

Abstract: Although theorists have interpreted classical conditioning as a laboratory model of Bayesian belief updating, a recent reanalysis showed that the key features that theoretical models capture about learning are artifacts of averaging over subjects. Rather than learning smoothly to asymptote (reflecting, according to Bayesian models, the gradual tradeoff from prior to posterior as data accumulate), subjects learn suddenly and their predictions fluctuate perpetually. We suggest that abrupt and unstable learning can be modeled by assuming subjects are conducting inference using sequential Monte Carlo sampling with a small number of samples — one, in our simulations. Ensemble behavior resembles exact Bayesian models since, as in particle filters, it averages over many samples. Further, the model is capable of exhibiting sophisticated behaviors like retrospective revaluation at the ensemble level, even given minimally sophisticated individuals that do not track uncertainty in their beliefs over trials. 1

5 0.6914857 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

Author: Matthew Hoffman, Arnaud Doucet, Nando D. Freitas, Ajay Jasra

Abstract: A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.

6 0.67496669 104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes

7 0.6656847 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

8 0.66069508 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model

9 0.6551373 100 nips-2007-Hippocampal Contributions to Control: The Third Way

10 0.65378577 86 nips-2007-Exponential Family Predictive Representations of State

11 0.65183371 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

12 0.64776313 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

13 0.6464898 47 nips-2007-Collapsed Variational Inference for HDP

14 0.64264917 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging

15 0.64263517 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI

16 0.6424045 63 nips-2007-Convex Relaxations of Latent Variable Training

17 0.63910162 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

18 0.63718945 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation

19 0.63694251 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression

20 0.63685358 43 nips-2007-Catching Change-points with Lasso