nips nips2003 nips2003-161 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Konrad P. Körding, Daniel M. Wolpert
Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.
Reference: text
sentIndex sentText sentNum sentScore
1 The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. [sent-8, score-0.266]
2 Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. [sent-9, score-0.185]
3 We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. [sent-10, score-0.507]
4 We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. [sent-12, score-0.455]
5 The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal. [sent-13, score-0.115]
6 Due to this sensory uncertainty we can only generate an estimate of the ball’s velocity. [sent-16, score-0.209]
7 This uncertainty can be reduced by taking into account information that is available on a longer time scale: not all velocities are a priori equally probable. [sent-17, score-0.164]
8 Bayesian theory [1-2] tells us that to make an optimal estimate of the velocity of a given ball, this a priori information about the distribution of velocities should be combined with the evidence provided by sensory feedback. [sent-20, score-0.26]
9 This combination process requires prior knowledge, how probable each possible velocity is, and knowledge of the uncertainty inherent in the sensory estimate of velocity. [sent-21, score-0.3]
10 As the degree of uncertainty in the feedback increases, for example when playing in fog or at dusk, an optimal system should increasingly depend on prior knowledge. [sent-22, score-0.444]
11 Here we examine whether subjects represent the probability distribution of a task and if this can be appropriately combined with an estimate of sensory uncer£ www. [sent-23, score-0.569]
12 Moreover, we examine whether subjects can represent priors that have multimodal distributions. [sent-28, score-0.522]
13 2 Experiment 1: Gaussian Prior To examine whether subjects can represent a prior distribution of a task and integrate it with a measure of their sensory uncertainty we examined performance on a reaching task. [sent-29, score-0.754]
14 This displacement or shift is drawn randomly from an underlying probability distribution and subjects have to estimate this shift to perform well on the task. [sent-31, score-1.016]
15 By examining where subjects reached while manipulating the reliability of their visual feedback we distinguished between several models of sensorimotor learning. [sent-32, score-0.707]
16 1 Methods Ten subjects made reaching movement on a table to a visual target with their right index finger in a virtual reality setup (for details of the set-up see [6]). [sent-34, score-0.561]
17 An Optotrak 3020 measured the position of their finger and a projection/mirror system prevented direct view of their arm and allowed us to generate a cursor representing their finger position which was displayed in the plane of the movement (Figure 1A). [sent-35, score-0.222]
18 As the finger moved from the starting circle, the cursor was extinguished and shifted laterally from the true finger location by an amount ÜØÖÙ which was drawn each trial from a Gaussian distribution: Ô´ÜØÖÙ µ Ô¾ Ü ´ÜØÖÙ ¾ ¾ ½ ÔÖ ÓÖ ×Ø µ¾ (1) ÔÖ ÓÖ ½ Ñ and ÔÖ ÓÖ ¼ Ñ (Figure 1B). [sent-36, score-0.232]
19 Halfway to the target (10 cm), viwhere Ü ×Ø sual feedback was briefly provided for 100 ms either clearly ( ¼ ) or with different degrees of blur ( Å and Ä ), or withheld ( ½ ). [sent-37, score-0.359]
20 On each trial one of the 4 types of feedback ( ¼ Å Ä ½ ) was selected randomly, with the relative frequencies of (3, 1, 1, 1) respectively. [sent-38, score-0.299]
21 The ( Å ) feedback was 25 small translucent spheres, distributed as a 2 dimensional Gaussian with a standard deviation of 1 cm, giving a cloud type impression. [sent-40, score-0.306]
22 The ( Ä ) feedback was analogous but with a standard deviation of 2 cm. [sent-41, score-0.306]
23 After another 10 cm of movement the trial finished and feedback of the final cursor location was only provided in the ( ¼ ) condition. [sent-43, score-0.945]
24 The experiment consisted of 2000 trials for each subject. [sent-44, score-0.113]
25 Subjects were instructed to take into account what they see at the midpoint and get as close to the target as possible and that the cursor is always there even if it is not displayed. [sent-45, score-0.153]
26 2 Results: Trajectories in the Presence of Uncertainty Subjects were trained for 1000 trials on the task to ensure that they experienced many samples ÜØÖÙ drawn from the underlying distribution Ô´Ü ØÖÙ µ. [sent-47, score-0.174]
27 After this period, when feedback was withheld ( ½ ), subjects pointed 0. [sent-48, score-0.636]
28 06 cm (mean se across subjects) to the left of the target showing that they had learned the average shift of 1 cm experienced over the trials. [sent-50, score-1.299]
29 Subsequently, we examined the relationship between visual feedback and the location Ü ×Ø Ñ Ø subjects pointed to. [sent-51, score-0.701]
30 On trials in which feedback was provided, there was compensation during the second half of the movement. [sent-52, score-0.359]
31 Figure 1A shows typical finger and cursor paths for two trials, ½ and ¼ , in which ÜØÖÙ ¾ Ñ. [sent-53, score-0.114]
32 The visual feedback midway through the movement provides information about the lateral shift on the current trial and allows for a correction for the current lateral shift. [sent-54, score-1.539]
33 However, the visual system is not perfect and we expect some uncertainty in the sensed lateral shift Ü × Ò× . [sent-55, score-0.974]
34 5) cm probability p(xtrue|xsensed) Target 1 cm A lateral shift xtrue [cm] 1 > lateral deviation xtrue-xestimate [cm] E Compensation Model 1 Probabilistic Model 1 0 0 -1 -1 < 0 -1 lateral shift xtrue (e. [sent-57, score-3.501]
35 2cm) Mapping Model lateral shift xtrue [cm] lateral shift xtrue [cm] lateral shift xtrue [cm] Figure 1: The experiment and models. [sent-59, score-3.187]
36 A) Subjects are required to place the cursor on the target, thereby compensating for the lateral displacement. [sent-60, score-0.52]
37 The finger paths illustrate typical trajectories at the end of the experiment when the lateral shift was 2 cm (the colors correspond to two of the feedback conditions). [sent-61, score-1.4]
38 B) The experimentally imposed prior distribution of lateral shifts is Gaussian with a mean of 1 cm. [sent-62, score-0.576]
39 C) A schematic of the probability distribution of visually sensed shifts under clear and the two blurred feedback conditions (colors as in panel A) for a trial in which the true lateral shift is 2 cm. [sent-63, score-1.252]
40 D) The estimate of the lateral shift for an optimal observer that combines the prior with the evidence. [sent-64, score-0.8]
41 E) The average lateral deviation from the target as a function of the true lateral shift for the models. [sent-65, score-1.251]
42 distribution centered on Ü ØÖÙ with a standard deviation of the system. [sent-69, score-0.139]
43 3 Computational Models and Predictions There are several computational models which subjects could use to determine the compensation needed to reach the target based on the sensed location of the finger midway through the movement. [sent-72, score-0.721]
44 To analyze the subjects performance we plot the average lateral deviation ÜØÖÙ Ü ×Ø Ñ Ø in a set of bins of as a function of the true shift Ü ØÖÙ . [sent-73, score-1.215]
45 Because feedback is not biased this term approximates Ü × Ò× Ü ×Ø Ñ Ø . [sent-74, score-0.199]
46 Subjects could compensate for the sensed lateral shift Ü × Ò× and thus use Ü ×Ø Ñ Ø Ü× Ò× . [sent-77, score-0.819]
47 The average lateral deviation should thus be ÜØÖÙ Ü ×Ø Ñ Ø ¼ (Figure 1E, left panel). [sent-78, score-0.539]
48 In this model, increasing the uncertainty of the feedback × Ò× (by increasing the blur) affects the variability of the pointing but not the average location. [sent-79, score-0.417]
49 Errors arise from variability in the visual feedback and the means ¾ squared error (MSE) for this strategy (ignoring motor variability) is × Ò× . [sent-80, score-0.341]
50 Crucially this model does not require subjects to estimate their visual uncertainty nor the distribution of shifts. [sent-81, score-0.63]
51 Subjects could optimally use prior information about the distribution and the uncertainty of the visual feedback to estimate the lateral shift. [sent-83, score-0.934]
52 Using Bayes rule we can obtain the posterior distribution, that is the probability of a shift Ü ØÖÙ given the evidence Ü × Ò× , Ô´ÜØÖÙ Ü× Ò× Ô´ÜØÖÙ µÔ´Ü× Ò× Ô´Ü× Ò× µ ÜØÖÙ µ (3) µ If subjects choose the most likely shift they also minimize their mean squared error (MSE). [sent-85, score-0.964]
53 The MSE depends on two factors, the width of the prior ÔÖ ÓÖ and the uncertainty in the visual feedback × Ò× . [sent-88, score-0.467]
54 As we increase the blur, and thus the degree of uncertainty, the estimate of the shift moves away from the visually sensed displacement Ü× Ò× towards the mean of the prior distribution Ü ×Ø (Figure 1D). [sent-90, score-0.6]
55 Such a computational strategy thus allows subjects to minimize the MSE at the target. [sent-91, score-0.391]
56 A third computational strategy is to learn a mapping from the sensed shift Ü× Ò× to the optimal lateral shift Ü ×Ø Ñ Ø . [sent-93, score-1.146]
57 By minimizing the average error over many trials the subjects could achieve a combination similar to model 2 but without any representation of the prior distribution or the visual uncertainty. [sent-94, score-0.698]
58 However, to learn such a mapping requires visual feedback and knowledge of the error at the end of the movement. [sent-95, score-0.352]
59 In our experiment we only revealed the shifted position of the finger at the end of the movement of the clear feedback trials ( ¼ ). [sent-96, score-0.41]
60 Therefore, if subjects learn a mapping, they can only do so for these trials and apply the same mapping to the blurred conditions ( Å , Ä ). [sent-97, score-0.57]
61 Therefore, this model predicts that the average lateral shift Ü ØÖÙ Ü ×Ø Ñ Ø should be independent of the degree of blur (Figure 1E right panel) 2. [sent-98, score-0.793]
62 The slope increases with increasing uncertainty and is, therefore, incompatible with models 1 and 3 but is predicted by model 2. [sent-101, score-0.213]
63 Moreover, this transition from using feedback to using prior information occurs gradually with increasing uncertainty as also predicted by this Bayesian model. [sent-102, score-0.427]
64 These effects are consistent over all the subjects tested. [sent-103, score-0.391]
65 The slope increases with increasing uncertainty in the visual feedback (Figure 2B). [sent-104, score-0.452]
66 Depending on the uncertainty of the feedback, subjects thus combine prior knowledge of the distribution of shifts with new evidence to generate the optimal compensatory movement. [sent-105, score-0.683]
67 Using Bayesian theory we can furthermore infer the degree of uncertainty from the errors the subjects made. [sent-106, score-0.525]
68 5 1 lateral shift xtrue [cm] Figure 2: Results with color codes as in Figure 1. [sent-109, score-1.052]
69 A) The average lateral deviation of the cursor at the end of the trial as a function of the imposed lateral shift for a typical subject. [sent-110, score-1.475]
70 The horizontal dotted lines indicate the prediction from the full compensation model and sloped line for a model that ignores sensory feedback on the current trial and corrects only for the mean over all trials. [sent-114, score-0.46]
71 C) The inferred priors and the real prior (red) for each subjects and condition. [sent-117, score-0.545]
72 3, ¼ , Å and Ä , we find that the subjects uncertainty × Ò× 0. [sent-123, score-0.503]
73 The average of Ü × Ò× across many trials is the imposed shift Ü ØÖÙ . [sent-132, score-0.406]
74 Since Ô must approach zero for both very small and very large Ü ØÖÙ , we subtract the mean of the right hand side before integrating numerically to obtain an estimate the prior Ô´ÜØÖÙ µ. [sent-134, score-0.124]
75 3 Experiment 2: Mixture of Gaussians Priors The second experiment was designed to examine whether subjects are able to represent more complicated priors such as mixtures of Gaussians and if they can utilize such prior knowledge. [sent-137, score-0.641]
76 1 Methods 12 additional subjects participated in an experiment similar to Experiment 1 with the following changes. [sent-139, score-0.422]
77 The experiments lasted for twice as many trials run on two consecutive days with 2000 trials performed on each day. [sent-140, score-0.164]
78 Feedback midway through the movement was always blurred (spheres distributed as a two dimensional Gaussian with Ñ) and feedback at the end of the movement was provided on every trial. [sent-141, score-0.429]
79 The prior distribution was a mixture of Gaussians ( Figure 3A,D). [sent-142, score-0.155]
80 One group of 6 subjects was exposed to: Ô´ÜØÖÙ Ô¾ µ ´ÜØÖÙ ½ ¾ Ü ¾ ¾ ÔÖ ÓÖ ×Ø µ¾ · ´ÜØÖÙ ·Ü ¾ ¾ ×Ø µ¾ ÔÖ ÓÖ (7) ÔÖ ÓÖ where Ü ×Ø ¾ Ñ is half the distance between the two peaks of the Gaussians. [sent-143, score-0.391]
81 Another group of 6 subjects experienced Ô´ÜØÖÙ Ô¾ µ ½¼ In this case we set Ü ÔÖ ÓÖ is still 0. [sent-146, score-0.451]
82 To estimate the priors learned by the subjects we fitted and compared two models. [sent-149, score-0.507]
83 The first assumed that subjects learned a single Gaussian distribution and the second assumed that subjects learned a mixture of Gaussians and we tuned the position of the Gaussians to minimizes the MSE between predicted and actual data. [sent-150, score-0.958]
84 Fitting the and Ü ×Ø to a two component Mixture of Gaussians model led to an average error × Ò× over all 6 subjects of 0. [sent-154, score-0.453]
85 01 cm compared to an average error obtained for a single Gaussian of 0. [sent-156, score-0.485]
86 The mixture model of the prior is thus better able to explain the data than the model that assumes that people can just represent one Gaussian. [sent-160, score-0.165]
87 One of the subjects compensated least for the feedback and his data was well fit by a single Gaussian. [sent-161, score-0.59]
88 Fitting the and Ü ×Ø of the three Gaussians model (Figure 3E) led to an average error over × Ò× all subjects of 0. [sent-167, score-0.453]
89 02 cm instead of an error from a single Gaussians of 0. [sent-169, score-0.459]
90 ¦ ¦ ¦ This result shows that subjects can not fully learn this more complicated distribution but rather just learn some of its properties. [sent-175, score-0.493]
91 Second, it could be that subjects use a simpler model such as a generalized Gaussian (the family of distribution that also the Laplacian distribution belongs to) or that they use a mixture of only a few Gaussians model. [sent-178, score-0.522]
92 Third, subjects could have a prior over priors that makes a mixture of three Gaussians model very unlikely. [sent-179, score-0.589]
93 During the process of training the average error over batches of 500 subsequent trials decreased from 1. [sent-183, score-0.125]
94 To address this we plot the evolution of the lateral deviation graph, as a function of the trial number (Figure 4B). [sent-187, score-0.613]
95 In other words they rely on the prior belief that their hand will not be displaced and ignore the feedback. [sent-190, score-0.135]
96 It seems that in particular after trial 2000, the trial after which people enjoy a nights rest, does the explanatory power of the full model improve. [sent-193, score-0.241]
97 It could be that subjects need a consolidation period to adequately learn the distribution. [sent-194, score-0.433]
98 4 Conclusion We have shown that a prior is used by humans to determine appropriate motor commands and that it is combined with an estimate of sensory uncertainty. [sent-196, score-0.201]
99 Such a Bayesian view of sensorimotor learning is consistent with neurophysiological studies that show that the brain represents the degree of uncertainty when estimating rewards [8-10] and with psychophysical studies addressing the timing of movements [11]. [sent-197, score-0.191]
100 Not only do people represent the uncertainty and combine this with prior information, they are also able to represent and utilize complicated nongaussian priors. [sent-198, score-0.288]
wordName wordTfidf (topN-words)
[('cm', 0.442), ('lateral', 0.406), ('subjects', 0.391), ('xtrue', 0.379), ('shift', 0.267), ('feedback', 0.199), ('nger', 0.129), ('sensed', 0.129), ('gaussians', 0.127), ('cursor', 0.114), ('uncertainty', 0.112), ('deviation', 0.107), ('trial', 0.1), ('trials', 0.082), ('compensation', 0.078), ('mse', 0.076), ('prior', 0.073), ('blur', 0.072), ('sensory', 0.062), ('experienced', 0.06), ('visual', 0.06), ('priors', 0.058), ('sensorimotor', 0.057), ('movement', 0.052), ('mixture', 0.05), ('midway', 0.049), ('tennis', 0.049), ('xsensed', 0.049), ('wolpert', 0.043), ('slope', 0.04), ('blurred', 0.039), ('target', 0.039), ('estimate', 0.035), ('bayesian', 0.034), ('variability', 0.034), ('shifts', 0.034), ('mapping', 0.033), ('ball', 0.033), ('konrad', 0.033), ('distribution', 0.032), ('imposed', 0.031), ('experiment', 0.031), ('motor', 0.031), ('priori', 0.029), ('displaced', 0.029), ('ucl', 0.029), ('withheld', 0.029), ('panel', 0.028), ('position', 0.028), ('examine', 0.027), ('london', 0.026), ('adelson', 0.026), ('spheres', 0.026), ('average', 0.026), ('subject', 0.026), ('learn', 0.025), ('gaussian', 0.024), ('displacement', 0.024), ('neurology', 0.024), ('multimodal', 0.024), ('inferred', 0.023), ('width', 0.023), ('learned', 0.023), ('velocities', 0.023), ('switzerland', 0.023), ('increasing', 0.023), ('degree', 0.022), ('evidence', 0.022), ('represent', 0.022), ('colors', 0.021), ('simoncelli', 0.021), ('full', 0.021), ('predicted', 0.02), ('explained', 0.02), ('provided', 0.02), ('people', 0.02), ('complicated', 0.02), ('james', 0.02), ('led', 0.019), ('narrow', 0.019), ('inserting', 0.019), ('utilize', 0.019), ('fitting', 0.019), ('optimal', 0.019), ('reaching', 0.019), ('playing', 0.019), ('analyze', 0.018), ('end', 0.018), ('velocity', 0.018), ('visually', 0.018), ('increases', 0.018), ('location', 0.018), ('error', 0.017), ('could', 0.017), ('ignore', 0.017), ('tted', 0.017), ('pointed', 0.017), ('hand', 0.016), ('trajectories', 0.016), ('examined', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing
Author: Konrad P. Körding, Daniel M. Wolpert
Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.
2 0.12423442 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
Author: Felix A. Wichmann, Arnulf B. Graf
Abstract: We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and shape representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away. 1
3 0.11953263 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
Author: Marc Toussaint
Abstract: We present a connectionist architecture that can learn a model of the relations between perceptions and actions and use this model for behavior planning. State representations are learned with a growing selforganizing layer which is directly coupled to a perception and a motor layer. Knowledge about possible state transitions is encoded in the lateral connectivity. Motor signals modulate this lateral connectivity and a dynamic field on the layer organizes a planning process. All mechanisms are local and adaptation is based on Hebbian ideas. The model is continuous in the action, perception, and time domain.
4 0.11143167 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell
Abstract: We consider learning to classify cognitive states of human subjects, based on their brain activity observed via functional Magnetic Resonance Imaging (fMRI). This problem is important because such classifiers constitute “virtual sensors” of hidden cognitive states, which may be useful in cognitive science research and clinical applications. In recent work, Mitchell, et al. [6,7,9] have demonstrated the feasibility of training such classifiers for individual human subjects (e.g., to distinguish whether the subject is reading an ambiguous or unambiguous sentence, or whether they are reading a noun or a verb). Here we extend that line of research, exploring how to train classifiers that can be applied across multiple human subjects, including subjects who were not involved in training the classifier. We describe the design of several machine learning approaches to training multiple-subject classifiers, and report experimental results demonstrating the success of these methods in learning cross-subject classifiers for two different fMRI data sets. 1
5 0.10082991 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron
Author: Sung C. Jun, Barak A. Pearlmutter
Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1
6 0.072011203 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales
7 0.072005376 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class
8 0.070017792 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
9 0.061312765 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
10 0.059733696 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors
11 0.058297489 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
12 0.057187892 130 nips-2003-Model Uncertainty in Classical Conditioning
13 0.055635169 119 nips-2003-Local Phase Coherence and the Perception of Blur
14 0.054420866 89 nips-2003-Impact of an Energy Normalization Transform on the Performance of the LF-ASD Brain Computer Interface
15 0.051746015 140 nips-2003-Nonlinear Processing in LGN Neurons
16 0.046660651 133 nips-2003-Mutual Boosting for Contextual Inference
17 0.04645545 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
18 0.038691234 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations
19 0.037279297 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector
20 0.03723694 51 nips-2003-Design of Experiments via Information Theory
topicId topicWeight
[(0, -0.124), (1, 0.037), (2, 0.078), (3, -0.026), (4, -0.072), (5, -0.001), (6, 0.096), (7, -0.021), (8, -0.071), (9, 0.044), (10, 0.069), (11, 0.038), (12, 0.11), (13, -0.034), (14, 0.058), (15, 0.167), (16, -0.153), (17, 0.094), (18, 0.029), (19, -0.027), (20, 0.158), (21, 0.065), (22, 0.023), (23, -0.059), (24, -0.032), (25, 0.093), (26, 0.041), (27, -0.123), (28, 0.103), (29, -0.054), (30, -0.03), (31, -0.019), (32, -0.096), (33, -0.114), (34, 0.12), (35, 0.033), (36, -0.135), (37, -0.046), (38, -0.03), (39, 0.042), (40, -0.008), (41, -0.13), (42, -0.206), (43, -0.103), (44, -0.022), (45, -0.189), (46, 0.202), (47, -0.03), (48, 0.004), (49, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.98069012 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing
Author: Konrad P. Körding, Daniel M. Wolpert
Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.
2 0.52487051 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell
Abstract: We consider learning to classify cognitive states of human subjects, based on their brain activity observed via functional Magnetic Resonance Imaging (fMRI). This problem is important because such classifiers constitute “virtual sensors” of hidden cognitive states, which may be useful in cognitive science research and clinical applications. In recent work, Mitchell, et al. [6,7,9] have demonstrated the feasibility of training such classifiers for individual human subjects (e.g., to distinguish whether the subject is reading an ambiguous or unambiguous sentence, or whether they are reading a noun or a verb). Here we extend that line of research, exploring how to train classifiers that can be applied across multiple human subjects, including subjects who were not involved in training the classifier. We describe the design of several machine learning approaches to training multiple-subject classifiers, and report experimental results demonstrating the success of these methods in learning cross-subject classifiers for two different fMRI data sets. 1
3 0.50093925 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
Author: Marc Toussaint
Abstract: We present a connectionist architecture that can learn a model of the relations between perceptions and actions and use this model for behavior planning. State representations are learned with a growing selforganizing layer which is directly coupled to a perception and a motor layer. Knowledge about possible state transitions is encoded in the lateral connectivity. Motor signals modulate this lateral connectivity and a dynamic field on the layer organizes a planning process. All mechanisms are local and adaptation is based on Hebbian ideas. The model is continuous in the action, perception, and time domain.
4 0.43400532 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron
Author: Sung C. Jun, Barak A. Pearlmutter
Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1
5 0.42365718 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
6 0.42170489 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
7 0.40862098 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class
8 0.39275327 89 nips-2003-Impact of an Energy Normalization Transform on the Performance of the LF-ASD Brain Computer Interface
9 0.38389781 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors
10 0.34687677 130 nips-2003-Model Uncertainty in Classical Conditioning
11 0.27260664 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales
12 0.24154156 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
14 0.2355473 68 nips-2003-Eye Movements for Reward Maximization
15 0.23019473 13 nips-2003-A Neuromorphic Multi-chip Model of a Disparity Selective Complex Cell
16 0.22832045 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning
17 0.22536559 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
18 0.21588945 196 nips-2003-Wormholes Improve Contrastive Divergence
19 0.2062695 140 nips-2003-Nonlinear Processing in LGN Neurons
20 0.20332113 67 nips-2003-Eye Micro-movements Improve Stimulus Detection Beyond the Nyquist Limit in the Peripheral Retina
topicId topicWeight
[(0, 0.027), (11, 0.013), (29, 0.035), (30, 0.02), (35, 0.032), (47, 0.255), (49, 0.013), (53, 0.118), (69, 0.014), (71, 0.061), (76, 0.044), (85, 0.098), (91, 0.132), (93, 0.019), (99, 0.017)]
simIndex simValue paperId paperTitle
1 0.88598812 5 nips-2003-A Classification-based Cocktail-party Processor
Author: Nicoleta Roman, Deliang Wang, Guy J. Brown
Abstract: At a cocktail party, a listener can selectively attend to a single voice and filter out other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, we employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency unit. Within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, we perform pattern classification in order to estimate ideal binary masks. A systematic evaluation in terms of signal-to-noise ratio as well as automatic speech recognition performance shows that the resulting system produces masks very close to ideal binary ones. A quantitative comparison shows that our model yields significant improvement in performance over an existing approach. Furthermore, under certain conditions the model produces large speech intelligibility improvements with normal listeners. 1 In t ro d u c t i o n The perceptual ability to detect, discriminate and recognize one utterance in a background of acoustic interference has been studied extensively under both monaural and binaural conditions [1, 2, 3]. The human auditory system is able to segregate a speech signal from an acoustic mixture using various cues, including fundamental frequency (F0), onset time and location, in a process that is known as auditory scene analysis (ASA) [1]. F0 is widely used in computational ASA systems that operate upon monaural input – however, systems that employ only this cue are limited to voiced speech [4, 5, 6]. Increased speech intelligibility in binaural listening compared to the monaural case has prompted research in designing cocktail-party processors based on spatial cues [7, 8, 9]. Such a system can be applied to, among other things, enhancing speech recognition in noisy environments and improving binaural hearing aid design. In this study, we propose a sound segregation model using binaural cues extracted from the responses of a KEMAR dummy head that realistically simulates the filtering process of the head, torso and external ear. A typical approach for signal reconstruction uses a time-frequency (T-F) mask: T-F units are weighted selectively in order to enhance the target signal. Here, we employ an ideal binary mask [6], which selects the T-F units where the signal energy is greater than the noise energy. The ideal mask notion is motivated by the human auditory masking phenomenon, in which a stronger signal masks a weaker one in the same critical band. In addition, from a theoretical ASA perspective, an ideal binary mask gives a performance ceiling for all binary masks. Moreover, such masks have been recently shown to provide a highly effective front-end for robust speech recognition [10]. We show for mixtures of multiple sound sources that there exists a strong correlation between the relative strength of target and interference and estimated ITD/IID, resulting in a characteristic clustering across frequency bands. Consequently, we employ a nonparametric classification method to determine decision regions in the joint ITDIID feature space that correspond to an optimal estimate for an ideal mask. Related models for estimating target masks through clustering have been proposed previously [11, 12]. Notably, the experimental results by Jourjine et al. [12] suggest that speech signals in a multiple-speaker condition obey to a large extent disjoint orthogonality in time and frequency. That is, at most one source has a nonzero energy at a specific time and frequency. Such models, however, assume input directly from microphone recordings and head-related filtering is not considered. Simulation of human binaural hearing introduces different constraints as well as clues to the problem. First, both ITD and IID should be utilized since IID is more reliable at higher frequencies than ITD. Second, frequency-dependent combinations of ITD and IID arise naturally for a fixed spatial configuration. Consequently, channel-dependent training should be performed for each frequency band. The rest of the paper is organized as follows. The next section contains the architecture of the model and describes our method for azimuth localization. Section 3 is devoted to ideal binary mask estimation, which constitutes the core of the model. Section 4 presents the performance of the system and a quantitative comparison with the Bodden [7] model. Section 5 concludes our paper. 2 M od el a rch i t ect u re a n d a zi mu t h locali zat i o n Our model consists of the following stages: 1) a model of the auditory periphery; 2) frequency-dependent ITD/IID extraction and azimuth localization; 3) estimation of an ideal binary mask. The input to our model is a mixture of two or more signals presented at different, but fixed, locations. Signals are sampled at 44.1 kHz. We follow a standard procedure for simulating free-field acoustic signals from monaural signals (no reverberations are modeled). Binaural signals are obtained by filtering the monaural signals with measured head-related transfer functions (HRTF) from a KEMAR dummy head [13]. HRTFs introduce a natural combination of ITD and IID into the signals that is extracted in the subsequent stages of the model. To simulate the auditory periphery we use a bank of 128 gammatone filters in the range of 80 Hz to 5 kHz as described in [4]. In addition, the gains of the gammatone filters are adjusted in order to simulate the middle ear transfer function. In the final step of the peripheral model, the output of each gammatone filter is half-wave rectified in order to simulate firing rates of the auditory nerve. Saturation effects are modeled by taking the square root of the signal. Current models of azimuth localization almost invariably start with Jeffress’s crosscorrelation mechanism. For all frequency channels, we use the normalized crosscorrelation computed at lags equally distributed in the plausible range from –1 ms to 1 ms using an integration window of 20 ms. Frequency-dependent nonlinear transformations are used to map the time-delay axis onto the azimuth axis resulting in a cross-correlogram structure. In addition, a ‘skeleton’ cross-correlogram is formed by replacing the peaks in the cross-correlogram with Gaussians of narrower widths that are inversely proportional to the channel center frequency. This results in a sharpening effect, similar in principle to lateral inhibition. Assuming fixed sources, multiple locations are determined as peaks after summating the skeleton cross-correlogram across frequency and time. The number of sources and their locations computed here, as well as the target source location, feed to the next stage. 3 B i n a ry ma s k est i mat i on The objective of this stage of the model is to develop an efficient mechanism for estimating an ideal binary mask based on observed patterns of extracted ITD and IID features. Our theoretical analysis for two-source interactions in the case of pure tones shows relatively smooth changes for ITD and IID with the relative strength R between the two sources in narrow frequency bands [14]. More specifically, when the frequencies vary uniformly in a narrow band the derived mean values of ITD/IID estimates vary monotonically with respect to R. To capture this relationship in the context of real signals, statistics are collected for individual spatial configurations during training. We employ a training corpus consisting of 10 speech utterances from the TIMIT database (see [14] for details). In the two-source case, we divide the corpus in two equal sets: target and interference. In the three-source case, we select 4 signals for the target set and 2 interfering sets of 3 signals each. For all frequency channels, local estimates of ITD, IID and R are based on 20-ms time frames with 10 ms overlap between consecutive time frames. In order to eliminate the multi-peak ambiguity in the cross-correlation function for mid- and high-frequency channels, we use the following strategy. We compute ITDi as the peak location of the cross-correlation in the range 2π / ω i centered at the target ITD, where ω i indicates the center frequency of the ith channel. On the other hand, IID and R are computed as follows: ∑ t s i2 (t ) Ri = ∑ ∑ t li2 (t ) , t s i2 (t ) + ∑ ∑ t ri2 (t ) t ni2 (t ) IIDi = 20 log10 where l i and ri refer to the left and right peripheral output of the ith channel, respectively, s i refers to the output for the target signal, and ni that for the acoustic interference. In computing IIDi , we use 20 instead of 10 in order to compensate for the square root operation in the peripheral model. Fig. 1 shows empirical results obtained for a two-source configuration on the training corpus. The data exhibits a systematic shift for both ITD and IID with respect to the relative strength R. Moreover, the theoretical mean values obtained in the case of pure tones [14] match the empirical ones very well. This observation extends to multiple-source scenarios. As an example, Fig. 2 displays histograms that show the relationship between R and both ITD (Fig. 2A) and IID (Fig. 2B) for a three-source situation. Note that the interfering sources introduce systematic deviations for the binaural cues. Consider a worst case: the target is silent and two interferences have equal energy in a given T-F unit. This results in binaural cues indicating an auditory event at half of the distance between the two interference locations; for Fig. 2, it is 0° - the target location. However, the data in Fig. 2 has a low probability for this case and shows instead a clustering phenomenon, suggesting that in most cases only one source dominates a T-F unit. B 1 1 R R A theoretical empirical 0 -1 theoretical empirical 0 -15 1 ITD (ms) 15 IID (dB) Figure 1. Relationship between ITD/IID and relative strength R for a two-source configuration: target in the median plane and interference on the right side at 30°. The solid curve shows the theoretical mean and the dash curve shows the data mean. A: The scatter plot of ITD and R estimates for a filter channel with center frequency 500 Hz. B: Results for IID for a filter channel with center frequency 2.5 kHz. A B 1 C 10 1 IID s) 0.5 0 -10 IID (d B) 10 ) (dB R R 0 -0.5 m ITD ( -10 -0.5 m ITD ( s) 0.5 Figure 2. Relationship between ITD/IID and relative strength R for a three-source configuration: target in the median plane and interference at -30° and 30°. Statistics are obtained for a channel with center frequency 1.5 kHz. A: Histogram of ITD and R samples. B: Histogram of IID and R samples. C: Clustering in the ITD-IID space. By displaying the information in the joint ITD-IID space (Fig. 2C), we observe location-based clustering of the binaural cues, which is clearly marked by strong peaks that correspond to distinct active sources. There exists a tradeoff between ITD and IID across frequencies, where ITD is most salient at low frequencies and IID at high frequencies [2]. But a fixed cutoff frequency that separates the effective use of ITD and IID does not exist for different spatial configurations. This motivates our choice of a joint ITD-IID feature space that optimizes the system performance across different configurations. Differential training seems necessary for different channels given that there exist variations of ITD and, especially, IID values for different center frequencies. Since the goal is to estimate an ideal binary mask, we focus on detecting decision regions in the 2-dimensional ITD-IID space for individual frequency channels. Consequently, supervised learning techniques can be applied. For the ith channel, we test the following two hypotheses. The first one is H 1 : target is dominant or Ri > 0.5 , and the second one is H 2 : interference is dominant or Ri < 0.5 . Based on the estimates of the bivariate densities p( x | H 1 ) and p( x | H 2 ) the classification is done by the maximum a posteriori decision rule: p( H 1 ) p( x | H 1 ) > p( H 2 ) p( x | H 2 ) . There exist a plethora of techniques for probability density estimation ranging from parametric techniques (e.g. mixture of Gaussians) to nonparametric ones (e.g. kernel density estimators). In order to completely characterize the distribution of the data we use the kernel density estimation method independently for each frequency channel. One approach for finding smoothing parameters is the least-squares crossvalidation method, which is utilized in our estimation. One cue not employed in our model is the interaural time difference between signal envelopes (IED). Auditory models generally employ IED in the high-frequency range where the auditory system becomes gradually insensitive to ITD. We have compared the performance of the three binaural cues: ITD, IID and IED and have found no benefit for using IED in our system after incorporating ITD and IID [14]. 4 Pe rfo rmanc e an d c omp arison The performance of a segregation system can be assessed in different ways, depending on intended applications. To extensively evaluate our model, we use the following three criteria: 1) a signal-to-noise (SNR) measure using the original target as signal; 2) ASR rates using our model as a front-end; and 3) human speech intelligibility tests. To conduct the SNR evaluation a segregated signal is reconstructed from a binary mask using a resynthesis method described in [5]. To quantitatively assess system performance, we measure the SNR using the original target speech as signal: ∑ t 2 s o (t ) ∑ SNR = 10 log 10 (s o (t ) − s e (t ))2 t where s o (t ) represents the resynthesized original speech and s e (t ) the reconstructed speech from an estimated mask. One can measure the initial SNR by replacing the denominator with s N (t ) , the resynthesized original interference. Fig. 3 shows the systematic results for two-source scenarios using the Cooke corpus [4], which is commonly used in sound separation studies. The corpus has 100 mixtures obtained from 10 speech utterances mixed with 10 types of intrusion. We compare the SNR gain obtained by our model against that obtained using the ideal binary mask across different noise types. Excellent results are obtained when the target is close to the median plane for an azimuth separation as small as 5°. Performance degrades when the target source is moved to the side of the head, from an average gain of 13.7 dB for the target in the median plane (Fig. 3A) to 1.7 dB when target is at 80° (Fig. 3B). When spatial separation increases the performance improves even for side targets, to an average gain of 14.5 dB in Fig. 3C. This performance profile is in qualitative agreement with experimental data [2]. Fig. 4 illustrates the performance in a three-source scenario with target in the median plane and two interfering sources at –30° and 30°. Here 5 speech signals from the Cooke corpus form the target set and the other 5 form one interference set. The second interference set contains the 10 intrusions. The performance degrades compared to the two-source situation, from an average SNR of about 12 dB to 4.1 dB. However, the average SNR gain obtained is approximately 11.3 dB. This ability of our model to segregate mixtures of more than two sources differs from blind source separation with independent component analysis. In order to draw a quantitative comparison, we have implemented Bodden’s cocktail-party processor using the same 128-channel gammatone filterbank [7]. The localization stage of this model uses an extended cross-correlation mechanism based on contralateral inhibition and it adapts to HRTFs. The separation stage of the model is based on estimation of the weights for a Wiener filter as the ratio between a desired excitation and an actual one. Although the Bodden model is more flexible by incorporating aspects of the precedence effect into the localization stage, the estimation of Wiener filter weights is less robust than our binary estimation of ideal masks. Shown in Fig. 5, our model shows a considerable improvement over the Bodden system, producing a 3.5 dB average improvement. A B C 20 20 10 10 10 0 0 0 -10 SNR (dB) 20 -10 -10 N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 Figure 3. Systematic results for two-source configuration. Black bars correspond to the SNR of the initial mixture, white bars indicate the SNR obtained using ideal binary mask, and gray bars show the SNR from our model. Results are obtained for speech mixed with ten intrusion types (N0: pure tone; N1: white noise; N2: noise burst; N3: ‘cocktail party’; N4: rock music; N5: siren; N6: trill telephone; N7: female speech; N8: male speech; N9: female speech). A: Target at 0°, interference at 5°. B: Target at 80°, interference at 85°. C: Target at 60°, interference at 90°. 20 0 SNR (dB) SNR (dB) 5 -5 -10 -15 -20 10 0 -10 N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 Figure 4. Evaluation for a three-source configuration: target at 0° and two interfering sources at –30° and 30°. Black bars correspond to the SNR of the initial mixture, white bars to the SNR obtained using the ideal binary mask, and gray bars to the SNR from our model. N0 N1 N2 N3 N4 N5 N6 N7 N8 N9 Figure 5. SNR comparison between the Bodden model (white bars) and our model (gray bars) for a two-source configuration: target at 0° and interference at 30°. Black bars correspond to the SNR of the initial mixture. For the ASR evaluation, we use the missing-data technique as described in [10]. In this approach, a continuous density hidden Markov model recognizer is modified such that only acoustic features indicated as reliable in a binary mask are used during decoding. Hence, it works seamlessly with the output from our speech segregation system. We have implemented the missing data algorithm with the same 128-channel gammatone filterbank. Feature vectors are obtained using the Hilbert envelope at the output of the gammatone filter. More specifically, each feature vector is extracted by smoothing the envelope using an 8-ms first-order filter, sampling at a frame-rate of 10 ms and finally log-compressing. We use the bounded marginalization method for classification [10]. The task domain is recognition of connected digits, and both training and testing are performed on acoustic features from the left ear signal using the male speaker dataset in the TIDigits database. A 100 B 100 Correctness (%) Correctness (%) Fig. 6A shows the correctness scores for a two-source condition, where the male target speaker is located at 0° and the interference is another male speaker at 30°. The performance of our model is systematically compared against the ideal masks for four SNR levels: 5 dB, 0 dB, -5 dB and –10 dB. Similarly, Fig. 6B shows the results for the three-source case with an added female speaker at -30°. The ideal mask exhibits only slight and gradual degradation in recognition performance with decreasing SNR and increasing number of sources. Observe that large improvements over baseline performance are obtained across all conditions. This shows the strong potential of applying our model to robust speech recognition. 80 60 40 20 5 dB Baseline Ideal Mask Estimated Mask 0 dB −5 dB 80 60 40 20 5 dB −10 dB Baseline Ideal Mask Estimated Mask 0 dB −5 dB −10 dB Figure 6. Recognition performance at different SNR values for original mixture (dotted line), ideal binary mask (dashed line) and estimated mask (solid line). A. Correctness score for a two-source case. B. Correctness score for a three-source case. Finally we evaluate our model on speech intelligibility with listeners with normal hearing. We use the Bamford-Kowal-Bench sentence database that contains short semantically predictable sentences [15]. The score is evaluated as the percentage of keywords correctly identified, ignoring minor errors such as tense and plurality. To eliminate potential location-based priming effects we randomly swap the locations for target and interference for different trials. In the unprocessed condition, binaural signals are produced by convolving original signals with the corresponding HRTFs and the signals are presented to a listener dichotically. In the processed condition, our algorithm is used to reconstruct the target signal at the better ear and results are presented diotically. 80 80 Keyword score (%) B100 Keyword score (%) A 100 60 40 20 0 0 dB −5 dB −10 dB 60 40 20 0 Figure 7. Keyword intelligibility score for twelve native English speakers (median values and interquartile ranges) before (white bars) and after processing (black bars). A. Two-source condition (0° and 5°). B. Three-source condition (0°, 30° and -30°). Fig. 7A gives the keyword intelligibility score for a two-source configuration. Three SNR levels are tested: 0 dB, -5 dB and –10 dB, where the SNR is computed at the better ear. Here the target is a male speaker and the interference is babble noise. Our algorithm improves the intelligibility score for the tested conditions and the improvement becomes larger as the SNR decreases (61% at –10 dB). Our informal observations suggest, as expected, that the intelligibility score improves for unprocessed mixtures when two sources are more widely separated than 5°. Fig. 7B shows the results for a three-source configuration, where our model yields a 40% improvement. Here the interfering sources are one female speaker and another male speaker, resulting in an initial SNR of –10 dB at the better ear. 5 C onclu si on We have observed systematic deviations of the ITD and IID cues with respect to the relative strength between target and acoustic interference, and configuration-specific clustering in the joint ITD-IID feature space. Consequently, supervised learning of binaural patterns is employed for individual frequency channels and different spatial configurations to estimate an ideal binary mask that cancels acoustic energy in T-F units where interference is stronger. Evaluation using both SNR and ASR measures shows that the system estimates ideal binary masks very well. A comparison shows a significant improvement in performance over the Bodden model. Moreover, our model produces substantial speech intelligibility improvements for two and three source conditions. A c k n ow l e d g me n t s This research was supported in part by an NSF grant (IIS-0081058) and an AFOSR grant (F49620-01-1-0027). A preliminary version of this work was presented in 2002 ICASSP. References [1] A. S. Bregman, Auditory Scene Analysis, Cambridge, MA: MIT press, 1990. [2] J. Blauert, Spatial Hearing - The Psychophysics of Human Sound Localization, Cambridge, MA: MIT press, 1997. [3] A. Bronkhorst, “The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions,” Acustica, vol. 86, pp. 117-128, 2000. [4] M. P. Cooke, Modeling Auditory Processing and Organization, Cambridge, U.K.: Cambridge University Press, 1993. [5] G. J. Brown and M. P. Cooke, “Computational auditory scene analysis,” Computer Speech and Language, vol. 8, pp. 297-336, 1994. [6] G. Hu and D. L. Wang, “Monaural speech separation,” Proc. NIPS, 2002. [7] M. Bodden, “Modeling human sound-source localization and the cocktail-party-effect,” Acta Acoustica, vol. 1, pp. 43-55, 1993. [8] C. Liu et al., “A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers,” J. Acoust. Soc. Am., vol. 110, pp. 32183230, 2001. [9] T. Whittkop and V. Hohmann, “Strategy-selective noise reduction for binaural digital hearing aids,” Speech Comm., vol. 39, pp. 111-138, 2003. [10] M. P. Cooke, P. Green, L. Josifovski and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Comm., vol. 34, pp. 267285, 2001. [11] H. Glotin, F. Berthommier and E. Tessier, “A CASA-labelling model using the localisation cue for robust cocktail-party speech recognition,” Proc. EUROSPEECH, pp. 2351-2354, 1999. [12] A. Jourjine, S. Rickard and O. Yilmaz, “Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures,” Proc. ICASSP, 2000. [13] W. G. Gardner and K. D. Martin, “HRTF measurements of a KEMAR dummy-head microphone,” MIT Media Lab Technical Report #280, 1994. [14] N. Roman, D. L. Wang and G. J. Brown, “Speech segregation based on sound localization,” J. Acoust. Soc. Am., vol. 114, pp. 2236-2252, 2003. [15] J. Bench and J. Bamford, Speech Hearing Tests and the Spoken Language of HearingImpaired Children, London: Academic press, 1979.
same-paper 2 0.8108567 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing
Author: Konrad P. Körding, Daniel M. Wolpert
Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.
3 0.67211866 117 nips-2003-Linear Response for Approximate Inference
Author: Max Welling, Yee W. Teh
Abstract: Belief propagation on cyclic graphs is an efficient algorithm for computing approximate marginal probability distributions over single nodes and neighboring nodes in the graph. In this paper we propose two new algorithms for approximating joint probabilities of arbitrary pairs of nodes and prove a number of desirable properties that these estimates fulfill. The first algorithm is a propagation algorithm which is shown to converge if belief propagation converges to a stable fixed point. The second algorithm is based on matrix inversion. Experiments compare a number of competing methods.
4 0.64140922 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
Author: Maneesh Sahani
Abstract: Significant plasticity in sensory cortical representations can be driven in mature animals either by behavioural tasks that pair sensory stimuli with reinforcement, or by electrophysiological experiments that pair sensory input with direct stimulation of neuromodulatory nuclei, but usually not by sensory stimuli presented alone. Biologically motivated theories of representational learning, however, have tended to focus on unsupervised mechanisms, which may play a significant role on evolutionary or developmental timescales, but which neglect this essential role of reinforcement in adult plasticity. By contrast, theoretical reinforcement learning has generally dealt with the acquisition of optimal policies for action in an uncertain world, rather than with the concurrent shaping of sensory representations. This paper develops a framework for representational learning which builds on the relative success of unsupervised generativemodelling accounts of cortical encodings to incorporate the effects of reinforcement in a biologically plausible way. 1
5 0.64064294 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
Author: Yu-han Chang, Tracey Ho, Leslie P. Kaelbling
Abstract: In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy. 1
6 0.63117611 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
7 0.6278553 107 nips-2003-Learning Spectral Clustering
8 0.62773156 78 nips-2003-Gaussian Processes in Reinforcement Learning
9 0.62505889 30 nips-2003-Approximability of Probability Distributions
10 0.62408918 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
11 0.62287009 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
12 0.622361 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
13 0.62139684 126 nips-2003-Measure Based Regularization
14 0.62098724 68 nips-2003-Eye Movements for Reward Maximization
15 0.6205523 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
16 0.62007648 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
17 0.61991251 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
18 0.61973834 143 nips-2003-On the Dynamics of Boosting
19 0.61911643 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
20 0.61797982 3 nips-2003-AUC Optimization vs. Error Rate Minimization