nips nips2003 nips2003-188 knowledge-graph by maker-knowledge-mining

188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

Source: pdf

Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell

Abstract: We consider learning to classify cognitive states of human subjects, based on their brain activity observed via functional Magnetic Resonance Imaging (fMRI). This problem is important because such classiﬁers constitute “virtual sensors” of hidden cognitive states, which may be useful in cognitive science research and clinical applications. In recent work, Mitchell, et al. [6,7,9] have demonstrated the feasibility of training such classiﬁers for individual human subjects (e.g., to distinguish whether the subject is reading an ambiguous or unambiguous sentence, or whether they are reading a noun or a verb). Here we extend that line of research, exploring how to train classiﬁers that can be applied across multiple human subjects, including subjects who were not involved in training the classiﬁer. We describe the design of several machine learning approaches to training multiple-subject classiﬁers, and report experimental results demonstrating the success of these methods in learning cross-subject classiﬁers for two different fMRI data sets. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider learning to classify cognitive states of human subjects, based on their brain activity observed via functional Magnetic Resonance Imaging (fMRI). [sent-7, score-0.522]

2 This problem is important because such classiﬁers constitute “virtual sensors” of hidden cognitive states, which may be useful in cognitive science research and clinical applications. [sent-8, score-0.294]

3 [6,7,9] have demonstrated the feasibility of training such classiﬁers for individual human subjects (e. [sent-10, score-0.406]

4 , to distinguish whether the subject is reading an ambiguous or unambiguous sentence, or whether they are reading a noun or a verb). [sent-12, score-0.607]

5 Here we extend that line of research, exploring how to train classiﬁers that can be applied across multiple human subjects, including subjects who were not involved in training the classiﬁer. [sent-13, score-0.502]

6 We describe the design of several machine learning approaches to training multiple-subject classiﬁers, and report experimental results demonstrating the success of these methods in learning cross-subject classiﬁers for two different fMRI data sets. [sent-14, score-0.068]

7 1 Introduction The advent of functional Magnetic Resonance Imaging (fMRI) has made it possible to safely, non-invasively observe correlates of neural activity across the entire human brain at high spatial resolution. [sent-15, score-0.352]

8 A typical fMRI session can produce a three dimensional image of brain activation once per second, with a spatial resolution of a few millimeters, yielding tens of millions of individual fMRI observations over the course of a twenty-minute session. [sent-16, score-0.237]

9 This fMRI technology holds the potential to revolutionize studies of human cognitive processing, provided we can develop appropriate data analysis methods. [sent-17, score-0.236]

10 Researchers have now employed fMRI to conduct hundreds of studies that identify which regions of the brain are activated on average when a human performs a particular cognitive task (e. [sent-18, score-0.44]

11 Typical research publications describe summary statistics of brain activity in various locations, calculated by averaging together fMRI observations collected over multiple time intervals during which the subject responds to repeated stimuli of a particular type. [sent-21, score-0.429]

12 Our interest here is in a different problem: training classiﬁers to automatically decode the subject’s cognitive state at a single instant or interval in time. [sent-22, score-0.314]

13 If we can reliably train such classiﬁers, we may be able to use these as “virtual sensors” of hidden cognitive states, to observe previously hidden cognitive processes in the brain. [sent-23, score-0.349]

14 Whereas their work focussed primarily on training a different classiﬁer for each human subject, our focus in this paper is on training a single classiﬁer that can be used across multiple human subjects, including humans not involved in the training process. [sent-26, score-0.399]

15 This is challenging because different brains have substantially different sizes and shapes, and because different people may generate different brain activation given the same cognitive state. [sent-27, score-0.448]

16 Below we brieﬂy survey related work, describe a range of machine learning approaches to this problem, and present experimental results showing statistically signiﬁcant cross-subject classiﬁer accuracies for two different fMRI studies. [sent-28, score-0.216]

17 [6,7,9] describe methods for training classiﬁers of cognitive states, focussing primarily on training subject-speciﬁc classiﬁers. [sent-30, score-0.283]

18 More speciﬁcally, they train classiﬁers that distinguish among a set of predeﬁned cognitive states, based on a single fMRI image or ﬁxed window of fMRI images collected relative to the presentation of a particular stimulus. [sent-31, score-0.289]

19 They used several different classiﬁers, and report that dimensionality reduction methods are essential given the high dimensional, sparse training data. [sent-33, score-0.068]

20 [11] report that they have been able to predict whether a verbal experience will be remembered later, based on the magnitude of activity within certain parts of left prefrontal and temporal cortices during that experience. [sent-37, score-0.236]

21 [2] show that different patterns of fMRI activity are generated when a subject views a photograph of a face versus a house, etc. [sent-39, score-0.252]

22 , [8]) also seeks to decode observed brain activity (often EEG or direct neural recordings, rather than fMRI) typically for the purpose of controlling external devices. [sent-43, score-0.307]

23 , In is a sequence of n fMRI images collected during a contiguous time interval and where CognitiveState is the set of cognitive states to be discriminated. [sent-51, score-0.289]

24 We explore a number of classiﬁer training methods, including: • Gaussian Naive Bayes (GNB). [sent-52, score-0.068]

25 Classiﬁers were evaluated using a “leave one subject out” cross validation procedure, in which each of the m human subjects was used as a test subject while training on the remaining m−1 subjects, and the mean accuracy over these held out subjects was calculated. [sent-59, score-0.873]

26 We explored a variety of approaches to reducing the dimensionality of the input feature vector, including methods that select a subset of available features, methods that replace multiple feature values by their mean, and methods that use both of these extractions. [sent-62, score-0.141]

27 In the latter two cases, we take means over values found within anatomically deﬁned brain regions (e. [sent-63, score-0.221]

28 , dorsolateral prefrontal cortex) which are referred to as Regions of Interest, or ROIs). [sent-65, score-0.076]

29 We considered the following feature extraction methods: • Average. [sent-66, score-0.066]

30 For each ROI, calculate the mean activity over all voxels in the ROI. [sent-67, score-0.207]

31 For each ROI, select the n most active voxels2 , then calculate the mean of their values. [sent-70, score-0.081]

32 Here the “most active” voxels are those whose activity while performing the task varies the most from their activity when the subject is at rest (see [7] for details). [sent-72, score-0.388]

33 Select the n most active voxels over the entire brain. [sent-74, score-0.212]

34 3 Registering Data from Multiple Subjects Given the different sizes and shapes of different brains, it is not possible to directly map the voxels in one brain to those in another. [sent-77, score-0.308]

35 We considered two different methods for producing representations of fMRI data for use across multiple subjects: • ROI Mapping. [sent-78, score-0.091]

36 Abstract the voxel data in each brain using the Average or ActiveAvg(n) feature extraction method described above. [sent-79, score-0.272]

37 Because each brain contains the same set of anatomically deﬁned ROIs, we can use the resulting representation of average activity per ROI as a canonical representation across subjects. [sent-80, score-0.371]

38 The coordinate system of each brain is transformed (geometrically morphed) into the coordinate system of a standard brain (known as the Talairach-Tournoux coordinate system [10]). [sent-82, score-0.435]

39 After this transformation, each brain has the same shape and size, though the transformation is usually imperfect. [sent-83, score-0.211]

40 ROI Mapping results in just one feature per ROI (we work with at most 35 ROIs per brain) at each timepoint, whereas Talairach coordinates retain the voxel-level resolution (on the order of 15,000 voxels per brain). [sent-92, score-0.23]

41 ROI Mapping reduces noise by averaging voxel activations, whereas the Talairach transformation effectively introduces new noise due to imperfections in the morphing transformation. [sent-94, score-0.063]

42 Notice both of these transformations require background knowledge about brain anatomy in order to identify anatomical landmarks or ROIs. [sent-96, score-0.177]

43 4 Case Studies This section describes two fMRI case studies used for training classiﬁers (detailed in [7]). [sent-97, score-0.105]

44 1 Sentence versus Picture Study In this fMRI study [3], thirteen normal subjects performed a sequence of trials. [sent-99, score-0.328]

45 During each trial they were ﬁrst shown a sentence and a simple picture, then asked whether the sentence correctly described the picture. [sent-100, score-0.518]

46 We used this data set to explore the feasibility of training classiﬁers to distinguish whether the subject is examining a sentence or a picture during a particular time interval. [sent-101, score-0.736]

47 In half of the trials the picture was presented ﬁrst, followed by the sentence, which we will refer to as PS data set. [sent-102, score-0.168]

48 In the remaining trials, the sentence was presented ﬁrst, followed by the picture, which we will call SP data set. [sent-103, score-0.233]

49 ” The learning task we consider here is to train a classiﬁer to determine, given a particular 16-image interval of fMRI data, whether the subject was viewing a sentence or a picture during this interval. [sent-106, score-0.658]

50 2 Syntactic Ambiguity Study In this fMRI study [4], subjects were presented with ambiguous and unambiguous sentences, and were asked to respond to a yes-no question about the content of each sentence. [sent-113, score-0.475]

51 The questions were designed to ensure that the subject was in fact processing the sentence. [sent-114, score-0.105]

52 Five normal subjects participated in this study, which we will refer to as SA data set. [sent-115, score-0.236]

53 We are interested here in learning a classiﬁer that takes as input an interval of fMRI activity, and determines whether the subject was currently reading an unambiguous or ambiguous sentence. [sent-116, score-0.441]

54 An example ambiguous sentence is “The experienced soldiers warned about the dangers conducted the midnight raid. [sent-117, score-0.463]

55 ” An example of an unambiguous sentence is “The experienced soldiers spoke about the dangers before the midnight raid. [sent-118, score-0.474]

56 where I1 is the image captured at the time when the sentence is ﬁrst presented to the subject. [sent-123, score-0.233]

57 5 Experimental Results The primary goal of this work is to determine whether and how it is possible to train classiﬁers of cognitive states across multiple human subjects. [sent-125, score-0.467]

58 We experimented using data from the two case studies described above, measuring the accuracy of classiﬁers trained for single subjects, as well as those trained for multiple subjects. [sent-126, score-0.216]

59 Note we might expect the multiple subject classiﬁcation accuracies to be lower due to differences among subjects, or to be higher due to the larger number of training examples available. [sent-127, score-0.407]

60 Table 1 displays the lowest accuracies that are statistically signiﬁcant at the 95% conﬁdence level, where the expected accuracy due to chance is 0. [sent-133, score-0.314]

61 We will not report conﬁdence interval individually for each accuracy because they are very similar. [sent-135, score-0.116]

62 Table 1: The lowest accuracies that are signiﬁcantly better than chance at the 95% level. [sent-136, score-0.217]

63 Table 2 shows the classiﬁer accuracies for the Sentence versus Picture study, when training across subjects and testing on the subject withheld from the training set. [sent-143, score-0.748]

64 For comparison, it also shows (in parentheses) the average accuracy achieved by classiﬁers trained and tested on single subjects. [sent-144, score-0.13]

65 All results are highly signiﬁcant compared to the 50% accuracy expected by chance, demonstrating convincingly the feasibility of training classiﬁers to distinguish cognitive states in subjects beyond the training set. [sent-145, score-0.77]

66 In fact, the accuracy achieved on the left out subject for the multiple-subject classiﬁers is often very close to the average accuracy of the single-subject classiﬁers, and in several cases it is signiﬁcantly better. [sent-146, score-0.274]

67 This surprisingly positive result indicates that the accuracy of the multiple-subject classiﬁer, when tested on new subjects outside the training set, is comparable to the average accuracy achieved when training and testing using data from a single subject. [sent-147, score-0.541]

68 Presumably this can be explained by the fact that it is trained using an order of magnitude more training examples, from twelve subjects rather than one. [sent-148, score-0.336]

69 The increase in training set size apparently compensates for the variability among subjects. [sent-149, score-0.068]

70 A second trend apparent in Table 2 is that the accuracies in SP or PS data sets are better than the accuracies when using their union (SP+PS). [sent-150, score-0.38]

71 4 They include pars opercularis of the inferior frontal gyrus, pars triangularis of the inferior frontal gyrus, Wernicke’s area, and the superior temporal gyrus. [sent-152, score-0.463]

72 5 Under cross validation, we learn m classiﬁers, and the accuracy we reported is the mean accuracy of these classiﬁers. [sent-153, score-0.142]

73 The size of the conﬁdence interval we compute is the upper bound of the size of Table 2: Multiple-subject accuracies in the Sentence versus Picture study (ROI mapping). [sent-154, score-0.327]

74 Numbers in parenthesis are the corresponding mean accuracies of single-subject classiﬁers. [sent-155, score-0.245]

75 2%) Table 3: Multiple-subject accuracies in the Syntactic Ambiguity study (ROI mapping). [sent-210, score-0.248]

76 Numbers in parenthesis are the corresponding mean accuracies of single-subject classiﬁers. [sent-211, score-0.245]

77 To choose n in ActiveAvg(n), we explored all even numbers less than 50, reporting the best. [sent-212, score-0.084]

78 0%) Classiﬁer accuracies for the Syntactic Ambiguity study are shown in Table 3. [sent-233, score-0.248]

79 The accuracies for both singlesubject and multiple-subject classiﬁers are lower than in the ﬁrst study, perhaps due in part to the smaller number of subjects and training examples. [sent-236, score-0.494]

80 Although we cannot draw strong conclusions from the results of this study, it provides modest additional support for the feasibility of training multiple-subject classiﬁers using ROI mapping. [sent-237, score-0.118]

81 Note that accuracies of the multiple-subject classiﬁers are again comparable to those of single subject classiﬁers. [sent-238, score-0.295]

82 2 Talairach Coordinates Next we explore the Talairach coordinates method for merging data from multiple subjects. [sent-240, score-0.083]

83 Note one difﬁculty in utilizing the Talairach transformation here is that slightly different regions of the brain were scanned for different subjects. [sent-242, score-0.262]

84 Figure 1 shows the portions of the brain that were scanned for two of the subjects along with the intersection of these regions from all ﬁve subjects. [sent-243, score-0.464]

85 6 We experienced technical difﬁculties in applying the Talairach transformation software to the Sentence versus Picture study (see [3] for details). [sent-246, score-0.16]

86 Subject 1 Subject 2 Intersecting all subjects Figure 1: The two leftmost panels show in color the scanned portion of the brain for two subjects (Syntactic Ambiguity study) in Talairach space in sagittal view. [sent-247, score-0.7]

87 The rightmost panel shows the intersection of these scanned bands across all ﬁve subjects. [sent-248, score-0.098]

88 The results of training multiple-subject classiﬁers based on the Talairach coordinates method are shown in Table 4. [sent-249, score-0.107]

89 When using the Talairach method, we found the most effective feature extraction approach was the Active(n) feature selection approach, which chooses the n most active voxels from across the brain. [sent-252, score-0.357]

90 Note that it is not possible to use this feature selection approach with the ROI Mapping method, because the individual voxels from different brains can only be aligned after performing the Talairach transformation. [sent-253, score-0.255]

91 Table 4: Multiple-subject accuracies in the Syntactic Ambiguity study (Talairach coordinates). [sent-254, score-0.248]

92 Numbers in parenthesis are the mean accuracies of single-subject classiﬁers. [sent-255, score-0.245]

93 For n in Active(n), we explored all even numbers less than 200, reporting the best. [sent-256, score-0.084]

94 0%) Summary and Conclusions The primary goal of this research was to determine whether it is feasible to use machine learning methods to decode mental states across multiple human subjects. [sent-267, score-0.319]

95 Two methods were explored to train multiple-subject classiﬁers based on fMRI data. [sent-269, score-0.088]

96 ROI mapping abstracts fMRI data by using the mean fMRI activity in each of several anatomically deﬁned ROIs to map different brains in terms of ROIs. [sent-270, score-0.26]

97 The transformation to Talairach coordinates morphs brains into a standard coordinate frame, retaining the approximate spatial resolution of the original data. [sent-271, score-0.22]

98 , whether the subject was viewing a picture or a sentence describing a picture, and to apply these successfully to subjects outside the training set. [sent-274, score-0.862]

99 In many cases, the classiﬁcation accuracy for subjects outside the training set equalled or exceeded the accuracy achieved by training on data from just the single subject. [sent-275, score-0.514]

100 A second research direction is to develop learning methods that take advantage of data from multiple studies, in contrast to the single study efforts described here. [sent-282, score-0.102]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fmri', 0.423), ('roi', 0.313), ('talairach', 0.277), ('subjects', 0.236), ('sentence', 0.233), ('activeavg', 0.221), ('accuracies', 0.19), ('brain', 0.177), ('picture', 0.168), ('classi', 0.158), ('ers', 0.157), ('cognitive', 0.147), ('voxels', 0.131), ('mitchell', 0.117), ('gnb', 0.111), ('subject', 0.105), ('unambiguous', 0.096), ('ambiguity', 0.093), ('brains', 0.092), ('syntactic', 0.085), ('ambiguous', 0.085), ('ps', 0.081), ('active', 0.081), ('activity', 0.076), ('sp', 0.075), ('pars', 0.074), ('rois', 0.074), ('accuracy', 0.071), ('states', 0.07), ('training', 0.068), ('inferior', 0.065), ('gyrus', 0.064), ('distinguish', 0.06), ('study', 0.058), ('reading', 0.058), ('er', 0.056), ('train', 0.055), ('hutchinson', 0.055), ('niculescu', 0.055), ('parenthesis', 0.055), ('decode', 0.054), ('human', 0.052), ('whether', 0.052), ('scanned', 0.051), ('pereira', 0.051), ('feasibility', 0.05), ('keller', 0.048), ('mapping', 0.048), ('across', 0.047), ('interval', 0.045), ('classifier', 0.044), ('anatomically', 0.044), ('prefrontal', 0.044), ('multiple', 0.044), ('frontal', 0.042), ('noun', 0.041), ('coordinates', 0.039), ('dence', 0.037), ('studies', 0.037), ('cognitivestate', 0.037), ('dangers', 0.037), ('haxby', 0.037), ('midnight', 0.037), ('opercularis', 0.037), ('photograph', 0.037), ('soldiers', 0.037), ('sulcus', 0.037), ('triangularis', 0.037), ('verbal', 0.037), ('sentences', 0.035), ('transformation', 0.034), ('extraction', 0.034), ('versus', 0.034), ('experienced', 0.034), ('prede', 0.034), ('explored', 0.033), ('trained', 0.032), ('dorsolateral', 0.032), ('resonance', 0.032), ('activation', 0.032), ('feature', 0.032), ('table', 0.031), ('wang', 0.029), ('verb', 0.029), ('wagner', 0.029), ('voxel', 0.029), ('sa', 0.029), ('svm', 0.029), ('imaging', 0.028), ('resolution', 0.028), ('average', 0.027), ('coordinate', 0.027), ('chance', 0.027), ('temporal', 0.027), ('collected', 0.027), ('reporting', 0.026), ('magnetic', 0.026), ('cortex', 0.026), ('statistically', 0.026), ('numbers', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell

2 0.17053135 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification

Author: Felix A. Wichmann, Arnulf B. Graf

Abstract: We attempt to understand visual classiﬁcation in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classiﬁcation task. Human subjects classiﬁed the faces and their gender judgment, reaction time and conﬁdence rating were recorded. Several hyperplane learning algorithms were used on the same classiﬁcation task using the Principal Components of the texture and shape representation of the faces. The classiﬁcation performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classiﬁcation can be modeled by some hyperplane algorithms in the feature space we used. For classiﬁcation, the brain needs more processing for stimuli close to that hyperplane than for those further away. 1

3 0.11143167 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing

Author: Konrad P. Körding, Daniel M. Wolpert

Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.

4 0.095124222 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

Author: Saori C. Tanaka, Kenji Doya, Go Okada, Kazutaka Ueda, Yasumasa Okamoto, Shigeto Yamawaki

Abstract: To understand the brain mechanisms involved in reward prediction on different time scales, we developed a Markov decision task that requires prediction of both immediate and future rewards, and analyzed subjects’ brain activities using functional MRI. We estimated the time course of reward prediction and reward prediction error on different time scales from subjects' performance data, and used them as the explanatory variables for SPM analysis. We found topographic maps of different time scales in medial frontal cortex and striatum. The result suggests that different cortico-basal ganglia loops are specialized for reward prediction on different time scales. 1 Intro du ction In our daily life, we make decisions based on the prediction of rewards on different time scales; immediate and long-term effects of an action are often in conflict, and biased evaluation of immediate or future outcome can lead to pathetic behaviors. Lesions in the central serotonergic system result in impulsive behaviors in humans [1], and animals [2, 3], which can be attributed to deficits in reward prediction on a long time scale. Damages in the ventral part of medial frontal cortex (MFC) also cause deficits in decision-making that requires assessment of future outcomes [4-6]. A possible mechanism underlying these observations is that different brain areas are specialized for reward prediction on different time scales, and that the ascending serotonergic system activates those specialized for predictions in longer time scales [7]. The theoretical framework of temporal difference (TD) learning [8] successfully explains reward-predictive activities of the midbrain dopaminergic system as well as those of the cortex and the striatum [9-13]. In TD learning theory, the predicted amount of future reward starting from a state s(t) is formulated as the “value function” V(t) = E[r(t + 1) + γ r(t + 2) + γ 2r(t + 3) + …] (1) and learning is based on the TD error δ(t) = r(t) + γ V(t) – V(t - 1). (2) The ‘discount factor’ γ controls the time scale of prediction; while only the immediate reward r(t + 1) is considered with γ = 0, rewards in the longer future are taken into account with γ closer to 1. In order to test the above hypothesis [7], we developed a reinforcement learning task which requires a large value of discount factor for successful performance, and analyzed subjects’ brain activities using functional MRI. In addition to conventional block-design analysis, a novel model-based regression analysis revealed topographic representation of prediction time scale with in the cortico-basal ganglia loops. 2 2.1 Methods Markov Decision Task In the Markov decision task (Fig. 1), markers on the corners of a square present four states, and the subject selects one of two actions by pressing a button (a1 = left button, a2 = right button) (Fig. 1A). The action determines both the amount of reward and the movement of the marker (Fig. 1B). In the REGULAR condition, the next trial is started from the marker position at the end of the previous trial. Therefore, in order to maximize the reward acquired in a long run, the subject has to select an action by taking into account both the immediate reward and the future reward expected from the subsequent state. The optimal behavior is to receive small negative rewards at states s 2, s3, and s4 to obtain a large positive reward at state s1 (Fig. 1C). In the RANDOM condition, next trial is started from a random marker position so that the subject has to consider only immediate reward. Thus, the optimal behavior is to collect a larger reward at each state (Fig. 1D). In the baseline condition (NO condition), the reward is always zero. In order to learn the optimal behaviors, the discount factor γ has to be larger than 0.3425 in REGULAR condition, while it can be arbitrarily small in RANDOM condition. 2.2 fMRI imaging Eighteen healthy, right-handed volunteers (13 males and 5 females), gave informed consent to take part in the study, with the approval of the ethics and safety committees of ATR and Hiroshima University. A 0 Time 1.0 2.0 2.5 3.0 100 C B +r 2 s2 s1 REGULAR condition s2 -r 1 -r 2 +r 1 s1 100 D RANDOM condition +r 2 s2 s1 -r 1 +r 1 -r 2 -r 1 s4 +r 2 4.0 (s) -r 1 s3 a1 a2 r1 = 20 10 yen r2 = 100 10 yen +r 1 -r 1 s4 -r 1 -r 1 s3 s4 -r 1 s3 Fig. 1. (A) Sequence of stimulus and response events in the Markov decision task. First, one of four squares representing present state turns green (0s). As the fixation point turns green (1s), the subject presses either the right or left button within 1 second. After 1s delay, the green square changes its position (2s), and then a reward for the current action is presented by a number (2.5s) and a bar graph showing cumulative reward during the block is updated (3.0s). One trial takes four seconds. Subjects performed five trials in the NO condition, 32 trials in the RANDOM condition, five trials in the NO condition, and 32 trials in the REGULAR condition in one block. They repeated four blocks; thus, the entire experiment consisted of 312 trials, taking about 20 minutes. (B) The rule of the reward and marker movement. (C) In the REGULAR condition, the optimal behavior is to receive small negative rewards –r 1 (-10, -20, or -30 yen) at states s2, s3, and s4 to obtain a large positive reward +r2 (90, 100, or 110 yen) at state s1. (D) In the RANDOM condition, the next trial is started from random state. Thus, the optimal behavior is to select a larger reward at each state. A 1.5-Tesla scanner (Marconi, MAGNEX ECLIPSE, Japan) was used to acquire both structural T1-weighted images (TR = 12 s, TE = 450 ms, flip angle = 20 deg, matrix = 256 × 256, FoV = 256 mm, thickness = 1 mm, slice gap = 0 mm ) and T2*-weighted echo planar images (TR = 4 s, TE = 55 msec, flip angle = 90 deg, 38 transverse slices, matrix = 64 × 64, FoV = 192 mm, thickness = 4 mm, slice gap = 0 mm, slice gap = 0 mm) with blood oxygen level-dependent (BOLD) contrast. 2.3 Data analysis The data were preprocessed and analyzed with SPM99 (Friston et al., 1995; Wellcome Department of Cognitive Neurology, London, UK). The first three volumes of images were discarded to avoid T1 equilibrium effects. The images were realigned to the first image as a reference, spatially normalized with respect to the Montreal Neurological Institute EPI template, and spatially smoothed with a Gaussian kernel (8 mm, full-width at half-maximum). A RANDOM condition action larger reward Fig. 2. The selected action of a representative single subject (solid line) and the group average ratio of selecting optimal action (dashed line) in (A) RANDOM and (B) REGULAR conditions. smaller reward 1 32 64 96 128 96 128 trial REGULAR condition B action optimal nonoptimal 1 32 64 trial Images of parameter estimates for the contrast of interest were created for each subject. These were then used for a second-level group analysis using a one-sample t-test across the subjects (random effects analysis). We conducted two types of analysis. One was block design analysis using three boxcar regressors convolved with a hemodynamic response function as the reference waveform for each condition (RANDOM, REGULAR, and NO). The other was multivariate regression analysis using explanatory variables, representing the time course of the reward prediction V(t) and reward prediction error δ(t) estimated from subjects’ performance data (described below), in addition to three regressors representing the condition of the block. 2.4 Estimation of predicted reward V(t) and prediction error δ(t) The time course of reward prediction V(t) and reward prediction error δ(t) were estimated from each subject’s performance data, i.e. state s(t), action a(t), and reward r(t), as follows. If the subject starts from a state s(t) and comes back to the same state after k steps, the expected cumulative reward V(t) should satisfy the consistency condition V(t) = r(t + 1) + γ r(t + 2) + … + γ k-1 r(t + k) + γ kV(t). (3) Thus, for each time t of the data file, we calculated the weighted sum of the rewards acquired until the subject returned to the same state and estimated the value function for that episode as  r ( t + 1) + γ r ( t + 2 ) + ... + γ k −1r ( t + k )  . ˆ (t ) =  V 1− γ k (4) The estimate of the value function V(t) at time t was given by the average of all previous episodes from the same state as at time t V (t ) = 1 L L ∑ Vˆ ( t ) , l (5) l =1 where {t1, …, tL} are the indices of time visiting the same state as s(t), i.e. s(t1) = … = s(tL) = s(t). The TD error was given by the difference between the actual reward r(t) and the temporal difference of the value function V(t) according to equation (2). Assuming that different brain areas are involved in reward prediction on different time scales, we varied the discount factor γ as 0, 0.3, 0.6, 0.8, 0.9, and 0.99. Fig. 3. (A) In REGULAR vs. RANDOM comparison, significant activation was observed in DLPFC ((x, y, z) = (46, 45, 9), peak t = 4.06) (p < 0.001 uncorrected). (B) In RANDOM vs. REGULAR comparison, significant activation was observed in lateral OFC ((x, y, z) = (-32, 9, -21), peak t = 4.90) (p < 0.001 uncorrected). 3 3.1 R e sul t s Behavioral results Figure 2 summarizes the learning performance of a representative single subject (solid line) and group average (dashed line) during fMRI measurement. Fourteen subjects successfully learned to take larger immediate rewards in the RANDOM condition (Fig. 2A) and a large positive reward at s1 after small negative rewards at s2, s3 and s4 in the REGULAR condition (Fig. 2B). 3.2 Block-design analysis In REGULAR vs. RANDOM contrast, we observed a significant activation in the dorsolateral prefrontal cortex (DLPFC) (Fig. 3A) (p < 0.001 uncorrected). In RANDOM vs. REGULAR contrast, we observed a significant activation in lateral orbitofrontal cortex (lOFC) (Fig. 3B) (p < 0.001 uncorrected). The result of block-design analysis suggests differential involvement of neural pathways in reward prediction on long and short time scales. The result in RANDOM vs. REGULAR contrast was consistent with previous studies that the OFC is involved in reward prediction within a short delay and reward outcome [14-20]. 3.3 Regression analysis We observed significant correlation with reward prediction V(t) in the MFC, DLPFC (all γ ), ventromedial insula (small γ ), dorsal striatum, amygdala, hippocampus, and parahippocampal gyrus (large γ ) (p < 0.001 uncorrected) (Fig. 4A). We also found significant correlation with reward prediction error δ(t) in the IPC, PMd, cerebellum (all γ ), ventral striatum (small γ ), and lateral OFC (large γ ) (p < 0.001 uncorrected) (Fig. 4B). As we changed the time scale parameter γ of reward prediction, we found rostro-caudal maps of correlation to V(t) in MFC with increasing γ. Fig. 4. Voxels with a significant correlation (p < 0.001 uncorrected) with reward prediction V(t) and prediction error δ(t) are shown in different colors for different settings of the time scale parameter (γ = 0 in red, γ = 0.3 in orange, γ = 0.6 in yellow, γ = 0.8 in green, γ = 0.9 in cyan, and γ = 0.99 in blue). Voxels correlated with two or more regressors are shown by a mosaic of colors. (A) Significant correlation with reward prediction V(t) was observed in the MFC, DLPFC, dorsal striatum, insula, and hippocampus. Note the anterior-ventral to posterior-dorsal gradient with the increase in γ in the MFC. (B) Significant correlation with reward prediction error δ(t) on γ = 0 was observed in the ventral striatum. 4 D i s c u ss i o n In the MFC, anterior and ventral part was involved in reward prediction V(t) on shorter time scales (0 ≤ γ ≤ 0.6), whereas posterior and dorsal part was involved in reward prediction V(t) on longer time scales (0.6 ≤ γ ≤ 0.99). The ventral striatum involved in reward prediction error δ(t) on shortest time scale (γ = 0), while the dorsolateral striatum correlated with reward prediction V(t) on longer time scales (0.9 ≤ γ ≤ 0.99). These results are consistent with the topographic organization of fronto-striatal connection; the rostral part of the MFC project to the ventral striatum, whereas the dorsal and posterior part of the cingulate cortex project to the dorsolateral striatum [21]. In the MFC and the striatum, no significant difference in activity was observed in block-design analysis while we did find graded maps of activities with different values of γ. A possible reason is that different parts of the MFC and the striatum are concurrently involved with reward prediction on different time scales, regardless of the task context. Activities of the DLPFC and lOFC, which show significant differences in block-design analysis (Fig. 3), may be regulated according to the necessity for the task; From these results, we propose the following mechanism of reward prediction on different time scales. The parallel cortico-basal ganglia loops are responsible for reward prediction on various time scales. The ‘limbic loop’ via the ventral striatum specializes in immediate reward prediction, whereas the ‘cognitive and motor loop’ via the dorsal striatum specialises in future reward prediction. Each loop learns to predict rewards on its specific time scale. To perform an optimal action under a given time scale, the output of the loop with an appropriate time scale is used for actual action selection. Previous studies in brain damages and serotonergic functions suggest that the MFC and the dorsal raphe, which are reciprocally connected [22, 23], play an important role in future reward prediction. The cortico-cortico projections from the MFC, or the serotonergic projections from the dorsal raphe to the cortex and the striatum may be involved in the modulation of these parallel loops. In present study, using a novel regression analysis based on subjects’ performance data and reinforcement learning model, we revealed the maps of time scales in reward prediction, which could not be found by conventional block-design analysis. Future studies using this method under pharmacological manipulation of the serotonergic system would clarify the role of serotonin in regulating the time scale of reward prediction. Acknowledgments We thank Nicolas Schweighofer, Kazuyuki Samejima, Masahiko Haruno, Hiroshi Imamizu, Satomi Higuchi, Toshinori Yoshioka, and Mitsuo Kawato for helpful discussions and technical advice. References [1] Rogers, R.D., et al. (1999) Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology 20(4):322-339. [2] Evenden, J.L. & Ryan, C.N. (1996) The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology (Berl) 128(2):161-170. [3] Mobini, S., et al. (2000) Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 152(4):390-397. [4] Bechara, A., et al. (1994) Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50(1-3):7-15. [5] Bechara, A., Tranel, D. & Damasio, H. (2000) Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions. Brain 123:2189-2202. [6] Mobini, S., et al. (2002) Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl) 160(3):290-298. [7] Doya, K. (2002) 15(4-6):495-506. Metalearning and neuromodulation. Neural Netw [8] Sutton, R.S., Barto, A. G. (1998) Reinforcement learning. Cambridge, MA: MIT press. [9] Houk, J.C., Adams, J.L. & Barto, A.G., A model of how the basal ganglia generate and use neural signals that predict reinforcement, in Models of information processing in the basal ganglia, J.C. Houk, J.L. Davis, and D.G. Beiser, Editors. 1995, MIT Press: Cambridge, Mass. p. 249-270. [10] Schultz, W., Dayan, P. & Montague, P.R. (1997) A neural substrate of prediction and reward. Science 275(5306):1593-1599. [11] Doya, K. (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732-739. [12] Berns, G.S., et al. (2001) Predictability modulates human brain response to reward. J Neurosci 21(8):2793-2798. [13] O'Doherty, J.P., et al. (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329-337. [14] Koepp, M.J., et al. (1998) Evidence for striatal dopamine release during a video game. Nature 393(6682):266-268. [15] Rogers, R.D., et al. (1999) Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J Neurosci 19(20):9029-9038. [16] Elliott, R., Friston, K.J. & Dolan, R.J. (2000) Dissociable neural responses in human reward systems. J Neurosci 20(16):6159-6165. [17] Breiter, H.C., et al. (2001) Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30(2):619-639. [18] Knutson, B., et al. (2001) Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J Neurosci 21(16):RC159. [19] O'Doherty, J.P., et al. (2002) Neural responses during anticipation of a primary taste reward. Neuron 33(5):815-826. [20] Pagnoni, G., et al. (2002) Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci 5(2):97-98. [21] Haber, S.N., et al. (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci 15(7 Pt 1):4851-4867. [22] Celada, P., et al. (2001) Control of dorsal raphe serotonergic neurons by the medial prefrontal cortex: Involvement of serotonin-1A, GABA(A), and glutamate receptors. J Neurosci 21(24):9917-9929. [23] Martin-Ruiz, R., et al. (2001) Control of serotonergic function in medial prefrontal cortex by serotonin-2A receptors through a glutamate-dependent mechanism. J Neurosci 21(24):9856-9866.

5 0.09388227 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification

Author: G.C. Littlewort, M.S. Bartlett, I.R. Fasel, J. Chenu, T. Kanda, H. Ishiguro, J.R. Movellan

Abstract: Computer animated agents and robots bring a social dimension to human computer interaction and force us to think in new ways about how computers could be used in daily life. Face to face communication is a real-time process operating at a time scale of less than a second. In this paper we present progress on a perceptual primitive to automatically detect frontal faces in the video stream and code them with respect to 7 dimensions in real time: neutral, anger, disgust, fear, joy, sadness, surprise. The face ﬁnder employs a cascade of feature detectors trained with boosting techniques [13, 2]. The expression recognizer employs a novel combination of Adaboost and SVM’s. The generalization performance to new subjects for a 7-way forced choice was 93.3% and 97% correct on two publicly available datasets. The outputs of the classiﬁer change smoothly as a function of time, providing a potentially valuable representation to code facial expression dynamics in a fully automatic and unobtrusive manner. The system was deployed and evaluated for measuring spontaneous facial expressions in the ﬁeld in an application for automatic assessment of human-robot interaction.

6 0.089530557 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

7 0.078866512 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

8 0.077579632 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

9 0.076919131 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

10 0.07593894 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

11 0.0737582 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

12 0.067967415 89 nips-2003-Impact of an Energy Normalization Transform on the Performance of the LF-ASD Brain Computer Interface

13 0.064014047 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

14 0.061416779 53 nips-2003-Discriminating Deformable Shape Classes

15 0.060273942 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

16 0.056314182 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

17 0.047855139 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

18 0.047747936 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

19 0.04726623 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

20 0.046374641 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.16), (1, -0.011), (2, 0.063), (3, -0.142), (4, -0.093), (5, -0.057), (6, 0.02), (7, -0.048), (8, -0.034), (9, 0.092), (10, 0.103), (11, -0.026), (12, 0.069), (13, -0.051), (14, 0.076), (15, 0.162), (16, -0.099), (17, 0.088), (18, 0.088), (19, -0.118), (20, 0.241), (21, 0.099), (22, -0.025), (23, -0.156), (24, -0.051), (25, 0.1), (26, -0.017), (27, -0.047), (28, 0.023), (29, -0.027), (30, 0.043), (31, -0.009), (32, -0.02), (33, -0.002), (34, 0.039), (35, -0.063), (36, 0.016), (37, -0.04), (38, 0.099), (39, -0.032), (40, 0.095), (41, -0.042), (42, -0.03), (43, -0.046), (44, -0.056), (45, -0.059), (46, 0.016), (47, -0.008), (48, 0.034), (49, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94745636 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell

2 0.74065906 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

Author: Guido Dornhege, Benjamin Blankertz, Gabriel Curio, Klaus-Robert Müller

Abstract: Brain-Computer Interfaces (BCI) are an interesting emerging technology that is driven by the motivation to develop an effective communication interface translating human intentions into a control signal for devices like computers or neuroprostheses. If this can be done bypassing the usual human output pathways like peripheral nerves and muscles it can ultimately become a valuable tool for paralyzed patients. Most activity in BCI research is devoted to ﬁnding suitable features and algorithms to increase information transfer rates (ITRs). The present paper studies the implications of using more classes, e.g., left vs. right hand vs. foot, for operating a BCI. We contribute by (1) a theoretical study showing under some mild assumptions that it is practically not useful to employ more than three or four classes, (2) two extensions of the common spatial pattern (CSP) algorithm, one interestingly based on simultaneous diagonalization, and (3) controlled EEG experiments that underline our theoretical ﬁndings and show excellent improved ITRs. 1

3 0.73610216 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification

Author: Felix A. Wichmann, Arnulf B. Graf

4 0.62693775 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing

Author: Konrad P. Körding, Daniel M. Wolpert

5 0.56688511 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

Author: Xavier Carreras, Lluís Màrquez

Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a ﬁltering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reﬂects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1

6 0.52868038 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification

7 0.52558464 89 nips-2003-Impact of an Energy Normalization Transform on the Performance of the LF-ASD Brain Computer Interface

8 0.44428673 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

9 0.43371406 181 nips-2003-Statistical Debugging of Sampled Programs

10 0.42834979 53 nips-2003-Discriminating Deformable Shape Classes

11 0.42445338 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

12 0.42333904 3 nips-2003-AUC Optimization vs. Error Rate Minimization

13 0.40931112 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

14 0.39988032 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

15 0.36689851 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

16 0.36293527 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

17 0.35637385 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

18 0.34461111 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

19 0.34360662 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

20 0.32600492 23 nips-2003-An Infinity-sample Theory for Multi-category Large Margin Classification

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (11, 0.021), (29, 0.016), (35, 0.043), (53, 0.118), (64, 0.011), (66, 0.355), (69, 0.011), (71, 0.052), (76, 0.035), (82, 0.011), (85, 0.122), (91, 0.08), (99, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9231506 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

Author: David Donoho, Victoria Stodden

Abstract: We interpret non-negative matrix factorization geometrically, as the problem of ﬁnding a simplicial cone which contains a cloud of data points and which is contained in the positive orthant. We show that under certain conditions, basically requiring that some of the data are spread across the faces of the positive orthant, there is a unique such simplicial cone. We give examples of synthetic image articulation databases which obey these conditions; these require separated support and factorial sampling. For such databases there is a generative model in terms of ‘parts’ and NMF correctly identiﬁes the ‘parts’. We show that our theoretical results are predictive of the performance of published NMF code, by running the published algorithms on one of our synthetic image articulation databases. 1

same-paper 2 0.80590647 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell

3 0.76767117 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

Author: Leonid Sigal, Michael Isard, Benjamin H. Sigelman, Michael J. Black

Abstract: The detection and pose estimation of people in images and video is made challenging by the variability of human appearance, the complexity of natural scenes, and the high dimensionality of articulated body models. To cope with these problems we represent the 3D human body as a graphical model in which the relationships between the body parts are represented by conditional probability distributions. We formulate the pose estimation problem as one of probabilistic inference over a graphical model where the random variables correspond to the individual limb parameters (position and orientation). Because the limbs are described by 6-dimensional vectors encoding pose in 3-space, discretization is impractical and the random variables in our model must be continuousvalued. To approximate belief propagation in such a graph we exploit a recently introduced generalization of the particle ﬁlter. This framework facilitates the automatic initialization of the body-model from low level cues and is robust to occlusion of body parts and scene clutter. 1

4 0.74365395 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

Author: Kenji Fukumizu, Francis R. Bach, Michael I. Jordan

Abstract: We propose a novel method of dimensionality reduction for supervised learning. Given a regression or classiﬁcation problem in which we wish to predict a variable Y from an explanatory vector X, we treat the problem of dimensionality reduction as that of ﬁnding a low-dimensional “effective subspace” of X which retains the statistical relationship between X and Y . We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem, we characterize the notion of conditional independence using covariance operators on reproducing kernel Hilbert spaces; this allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y . 1

5 0.56623584 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman

Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Signiﬁcant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of signiﬁcant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proﬁciency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1

6 0.51999533 115 nips-2003-Linear Dependent Dimensionality Reduction

7 0.50027591 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

8 0.49648699 107 nips-2003-Learning Spectral Clustering

9 0.49357852 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

10 0.49304038 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

11 0.48747 3 nips-2003-AUC Optimization vs. Error Rate Minimization

12 0.48426509 124 nips-2003-Max-Margin Markov Networks

13 0.4824211 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

14 0.47956517 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

15 0.47859645 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

16 0.47638264 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

17 0.47555768 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

18 0.47348323 126 nips-2003-Measure Based Regularization

19 0.47217447 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

20 0.47165909 78 nips-2003-Gaussian Processes in Reinforcement Learning