nips nips2003 nips2003-182 knowledge-graph by maker-knowledge-mining

182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron


Source: pdf

Author: Sung C. Jun, Barak A. Pearlmutter

Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ie Abstract We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. [sent-7, score-0.362]

2 At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. [sent-8, score-0.864]

3 Including head position overcomes the previous need to retrain the MLP for each subject and session. [sent-9, score-0.385]

4 The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. [sent-10, score-0.439]

5 We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. [sent-16, score-0.662]

6 1 Introduction The goal of MEG/EEG localization is to identify and measure the signals emitted by electrically active brain regions. [sent-17, score-0.359]

7 A number of methods are in widespread use, most assuming dipolar sources (H¨ m¨ l¨ inen et al. [sent-18, score-0.235]

8 , 1986) have a aa become popular for building fast dipole localizers (Abeyratne et al. [sent-21, score-0.479]

9 Since it is easy to use a forward model to create synthetic data consisting of dipole locations and corresponding sensor signals, one can train a MLP on the inverse problem. [sent-24, score-0.523]

10 (2000) took EEG measurements for both spherical and realistic head models and trained MLPs on randomly generated noise-free datasets. [sent-26, score-0.431]

11 Integrated approaches to the EEG/MEG dipole source localization, in which the trained MLPs are used as initializers for iterative methods, have also been studied (Jun et al. [sent-27, score-0.516]

12 Interestingly, all work to date trained with a fixed head model. [sent-30, score-0.312]

13 However, for MEG, head movement relative to the fixed sensor array is very difficult to avoid, and even with heroic measures (bite bars) the position of the head relative to the sensor array varies from subject to subject and session to session. [sent-31, score-0.898]

14 This either results in significant localization error (Kwon et al. [sent-32, score-0.331]

15 We propose an augmented system which takes head position into account, yet remains able to localize a single dipole to reasonable accuracy within a fraction of a millisecond on a standard PC, even when the signals are contaminated by considerable noise. [sent-34, score-0.781]

16 The system uses a MLP trained on random dipoles and random head positions, which takes as inputs both the coordinates of the center of a sphere fitted to the head and the sensor measurements, uses two hidden layers, and generates the source location (in Cartesian coordinates) as its output. [sent-35, score-0.996]

17 Adding head position as an extra input overcomes the primary practical limitation of previous MLP-based MEG localization systems: the need to retrain the network for each new head position. [sent-36, score-0.883]

18 To improve the localization accuracy we use a hybrid MLPstart-LM method, in which the MLP’s output provides the starting point for a Levenberg-Marquardt (LM) optimization (Press et al. [sent-38, score-0.457]

19 We use the MLP and MLP-start-LM methods to localize singledipole sources from actual MEG signal components isolated by a blind source separation (BSS) algorithm (Vig´ rio et al. [sent-40, score-0.39]

20 , 2002) and compare the results with the output of standard interactive commercial localization software. [sent-42, score-0.321]

21 605 cm z z x y Saggital View Coronal View Head Model B 10. [sent-48, score-0.161]

22 5 cm A B 3 cm Training Region 4 cm Training Region and various Head Models Figure 1: Sensor surface and training region. [sent-51, score-0.566]

23 The center of the spherical head model was varied within the given region. [sent-52, score-0.408]

24 Section 3 presents the localization performance of both the MLP and MLP-startLM, and compares them with various conventional LM methods. [sent-55, score-0.269]

25 2, comparative localization results for our proposed methods and standard Neuromag commercial software on actual BSS-separated MEG signals are presented. [sent-57, score-0.411]

26 Each exemplar thus consisted of the (x, y, z) coordinates of the center of a sphere fitted to the head, sensor activations generated by a forward model, and the target dipole location. [sent-60, score-0.645]

27 Centers of spherical head 1 Given the sensor activations and a dipole location, the minimum error dipole moment can be calculated analytically (H¨ m¨ l¨ inen et al. [sent-62, score-1.324]

28 Therefore, although the dipoles used in generating a aa the dataset had both location and moment, the moments were not included in the datasets used for training or testing. [sent-64, score-0.211]

29 models in the training set were drawn from a ball of radius 3 cm centered 4 cm above the bottom of the training region,2 as shown in Figure 1. [sent-65, score-0.401]

30 The dipoles in the training set were drawn uniformly from a spherical region centered at the corresponding center, with a radius of 7. [sent-66, score-0.28]

31 The corresponding sensor activations were calculated by adding the results of a forward model and a noise model. [sent-69, score-0.248]

32 We used the sensor geometry of a 4D Neuroimaging Neuromag-122 whole-head gradiometer (Ahonen et al. [sent-71, score-0.193]

33 , 1993) and a standard analytic model of quasistatic electromagnetic propagation in a spherical head (Jun et al. [sent-72, score-0.439]

34 This work could be easily extended to a more realistic head model. [sent-74, score-0.258]

35 The human skull phantom study in Leahy a aa et al. [sent-77, score-0.165]

36 (1998) shows that the fitted spherical head model for MEG localization is slightly inferior in accuracy to the realistic head model numerically calculated by BEM. [sent-78, score-0.939]

37 In forward calculation, a spherical head model has some advantages: it is more easily implemented and is much faster. [sent-79, score-0.414]

38 Despite its inferiority in terms of localization accuracy, we use a spherical head model in this work. [sent-80, score-0.646]

39 To this end, we measured real brain noise and used it to additively contaminate synthetic sensor readings (Jun et al. [sent-82, score-0.29]

40 This noise was taken, unaveraged, from MEG recordings during periods in which the brain region of interest in the experiment was quiescent, and therefore included all sources of noise present in actual data: brain noise, external noise, sensor noise, etc. [sent-84, score-0.418]

41 The datasets used for training and testing were made by adding the noise to synthetic sensor activations generated by the forward model, and exemplars whose resulting SNR was below −4 dB were rejected. [sent-86, score-0.275]

42 The MLP charged with approximating the inverse mapping had an input layer of 125 units consisting of the three Cartesian coordinate of the center of the sphere fitted to the head, and the 122 sensor activations. [sent-87, score-0.255]

43 2 Fitted spheres from twelve subjects performing various tasks on a 4D Neuroimaging Neuromag122 MEG system were collected, and this distribution of head positions was chosen to include all twelve cases. [sent-95, score-0.3]

44 Just as the position of the center of the head varies from session to session and subject to subject, so does head orientation and radius. [sent-96, score-0.67]

45 Because a sphere is rotationally symmetric, our forward model is insensitive to orientation, and similarly the external magnetic field caused by a dipole in a homogeneous sphere is invariant to the sphere’s radius. [sent-97, score-0.508]

46 92 6 −6 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 Figure 2: Mean localization errors of the trained MLP as a function of correct dipole location, binned into regions. [sent-125, score-0.65]

47 1 Results and discussion Training and localization results Datasets of 100,000 (training) and 25,000 (testing) patterns, all contaminated by real brain noise, were constructed. [sent-130, score-0.358]

48 5 0 0 5 10 15 S/N (dB) We investigated localization error distributions over various regions of interest. [sent-137, score-0.269]

49 considered two cross sections (coronal and MLP, MLP-start-LM, and optimal-start-LM sagittal views) with width of 2 cm, and were tested on signals from 25,000 random each of these was divided into 19 regions, dipoles, contaminated by real brain noise. [sent-140, score-0.182]

50 We extracted the noisy signals and the corresponding dipoles from testing datasets. [sent-142, score-0.147]

51 A dipole localization was performed using the trained MLP, and the average localization error for each region was calculated. [sent-144, score-0.919]

52 Figure 2 shows the localization error distribution over two cross sections. [sent-145, score-0.269]

53 In general, dipoles closer to the sensor surface were better localized. [sent-146, score-0.296]

54 We compared various automatic localization methods, most of which consist of LM used in different ways: • MLP-start-LM LM was started with the trained MLP’s output. [sent-147, score-0.323]

55 • fixed-4-start-LM LM was tuned for good performance using restarts at the four fixed initial points (0, 0, 6), (−5, 2, −1), (5, 2, −1), and (0, −5, −1), in units of cm relative to the center of the spherical head model. [sent-148, score-0.652]

56 Table 1: Comparison of performance on real brain noise test set of Levenberg-Marquardt source localizers with three LM restarts strategies, the trained MLP, and a hybrid system. [sent-150, score-0.379]

57 53 • random-n-start-LM LM was restarted with n random (uniformly distributed) points within the spherical head model. [sent-158, score-0.377]

58 • optimal-start-LM LM was started with the known exact dipole source location. [sent-161, score-0.4]

59 Figure 3 shows the localization performance as a function of SNR for fixed-4-start-LM, optimal-start-LM, the trained MLP, and MLP-start-LM. [sent-162, score-0.323]

60 Optimal-start-LM shows the best localization performance across the whole range of SNRs, but the hybrid system shows almost the same performance as optimal-start-LM except at very high SNRs, while the trained MLP is more robust to noise than fixed-4-start-LM. [sent-163, score-0.459]

61 These sorts of sources are often very hard to localize well, as it is easy to become trapped in a local minimum (Jun et al. [sent-165, score-0.238]

62 7 cm on average from the exact source) would be required to obtain near-optimal performance from LM. [sent-168, score-0.161]

63 The trained MLP is fastest, and its hybrid system is about 40× faster than random-20-start-LM, while the hybrid system is about 9× faster, yet more accurate than, fixed-4-start-LM. [sent-170, score-0.236]

64 2 Localization on real MEG signals and comparison with commercial software The sensors in MEG systems have poor signal-to-noise ratios (SNRs) for single-trial data, since MEG data is strongly contaminated by various noises. [sent-173, score-0.156]

65 Blind source separation of MEG data segregates noise from signal (Vig´ rio et al. [sent-174, score-0.23]

66 Even though the sensor attenuation vectors of the BSS-separated components can be well localized to equivalent current dipoles (Vig´ rio et al. [sent-179, score-0.413]

67 We applied the MLP and MLP-start-LM to localize single dipolar sources from various actual BSS-separated MEG signals. [sent-182, score-0.187]

68 The following four visual reaction time tasks were performed by each subject: stimulus pre-exposure task, trump card task, elemental discrimination task, and transverse patterning task. [sent-186, score-0.149]

69 PV and SVdenote primary visual source and secondary visual source, respectively. [sent-190, score-0.183]

70 The outer surface denotes the sensor surface, and diamonds on this surface denote sensors. [sent-194, score-0.281]

71 The inner surface denotes a spherical head model fit to the subject. [sent-195, score-0.433]

72 Their MLP’s outputs were scaled back to their dipole location vectors and were used for initializing LM. [sent-199, score-0.35]

73 Figure 4 shows the dipole locations estimated by the MLP, MLP-startLM, and Neuromag’s xfit software, for two sorts of sensory sources: primary visual sources and secondary visual sources, respectively, over four tasks in subject S01. [sent-200, score-0.637]

74 In Figure 5, the estimated dipole locations are shown for somatosensory sources over three different subjects. [sent-201, score-0.458]

75 Each figure consists of three viewpoints: axial (x-y plane), coronal (x-z plane), and sagittal (y-z plane). [sent-202, score-0.163]

76 The center of a fitted spherical head model (S01: trump card task) is (0. [sent-203, score-0.433]

77 All dipole locations estimated by the MLP and MLP-start-LM are clustered within about 3 cm, and about 0. [sent-208, score-0.355]

78 We see that the primary visual sources are more consistently localized, across all four tasks, than the secondary visual sources. [sent-210, score-0.203]

79 It is noticeable that somatosensory sources on the right hemisphere are localized poorly by the MLP, but well localized by the hybrid method. [sent-212, score-0.268]

80 The trained MLP and the hybrid method are applicable to actual MEG signals, and seem to offer comparable and perhaps superior localization relative to xfit, with clear advantages in both speed and in the lack of required human interaction or subjective human input. [sent-226, score-0.489]

81 A dipole fitting method was applied to the identified neural components. [sent-230, score-0.327]

82 The input to the dipole fitting algorithm of xfit was the field map and the output was the location of ECDs. [sent-231, score-0.35]

83 From all separated components for four subjects and four sorts of tasks taken as in Tang et al. [sent-232, score-0.224]

84 Even the center of a fitted spherical head model is varied over three subjects, the only fitted sphere of subject S01 transverse patterning task, centered at (0. [sent-238, score-0.562]

85 The outer surface denotes the sensor surface, and diamonds on this surface denote sensors. [sent-245, score-0.281]

86 The inner surface denotes a spherical head model fit to the subject. [sent-246, score-0.433]

87 4 Conclusion We propose the inclusion of a head position input for MLP-based MEG dipole localizers. [sent-247, score-0.615]

88 This overcomes the limitation of previous MLP-based MEG localization systems, namely the need to retrain the network for each session or subject. [sent-248, score-0.369]

89 This motivated us to construct a hybrid system, MLP-start-LM, which improves the localization accuracy while reducing the computational burden to less than one ninth than that of fixed4-start-LM. [sent-250, score-0.395]

90 We applied the MLP and MLP-start-LM to localize single dipolar sources from actual BSSseparated MEG signals, and compared these with the results of the commercial Neuromag program xfit. [sent-253, score-0.239]

91 The MLP yielded dipole locations close to those of xfit, and MLP-start-LM gave locations that were even closer to those of xfit. [sent-254, score-0.383]

92 In conclusion, our MLP can itself serve as a reasonably accurate real-time MEG dipole localizer, even when the head position changes regularly. [sent-255, score-0.615]

93 This MLP also constitutes an excellent dipole guessor for LM. [sent-256, score-0.327]

94 Because this MLP receives a head position input, the need to retrain for various subjects or sessions has been eliminated without sacrificing the many advantages of the universal approximator direct inverse approach to localization. [sent-257, score-0.37]

95 Artificial neural networks for source localization in the human brain. [sent-269, score-0.368]

96 Fast accurate MEG source localization using a multilayer perceptron trained with real brain noise. [sent-326, score-0.474]

97 MEG source localization using a MLP with a distributed output representation. [sent-334, score-0.342]

98 Dipole source localization of MEG by BP neural networks. [sent-343, score-0.342]

99 Localization accuracy of single current dipoles from tangential components of auditory evoked fields. [sent-354, score-0.168]

100 A study of dipole localization accuracy for MEG and EEG using a human skull phantom. [sent-370, score-0.682]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mlp', 0.556), ('meg', 0.36), ('dipole', 0.327), ('localization', 0.269), ('head', 0.258), ('lm', 0.169), ('cm', 0.161), ('sensor', 0.131), ('spherical', 0.119), ('dipoles', 0.109), ('tang', 0.105), ('jun', 0.099), ('hybrid', 0.091), ('coronal', 0.075), ('xfit', 0.075), ('source', 0.073), ('pearlmutter', 0.07), ('sources', 0.07), ('inen', 0.065), ('neuromag', 0.063), ('et', 0.062), ('snr', 0.061), ('sphere', 0.059), ('tted', 0.058), ('surface', 0.056), ('localize', 0.056), ('snrs', 0.055), ('sagittal', 0.055), ('trained', 0.054), ('commercial', 0.052), ('aa', 0.052), ('brain', 0.052), ('rio', 0.05), ('vig', 0.05), ('weisend', 0.05), ('sorts', 0.05), ('noise', 0.045), ('neuroimaging', 0.044), ('subjects', 0.042), ('retrain', 0.04), ('secondary', 0.04), ('signals', 0.038), ('diamonds', 0.038), ('dipolar', 0.038), ('kinouchi', 0.038), ('localizers', 0.038), ('sobi', 0.038), ('contaminated', 0.037), ('forward', 0.037), ('localized', 0.037), ('accuracy', 0.035), ('visual', 0.035), ('pv', 0.035), ('activations', 0.035), ('eeg', 0.035), ('units', 0.034), ('transverse', 0.033), ('nolte', 0.033), ('somatosensory', 0.033), ('mlps', 0.033), ('axial', 0.033), ('patterning', 0.033), ('magnetoencephalography', 0.033), ('blind', 0.032), ('session', 0.032), ('center', 0.031), ('position', 0.03), ('cartesian', 0.03), ('subject', 0.029), ('software', 0.029), ('locations', 0.028), ('overcomes', 0.028), ('training', 0.027), ('rms', 0.027), ('human', 0.026), ('restarts', 0.026), ('multilayer', 0.026), ('magnetic', 0.026), ('radius', 0.025), ('abeyratne', 0.025), ('ahonen', 0.025), ('exemplar', 0.025), ('kwon', 0.025), ('leahy', 0.025), ('lounasmaa', 0.025), ('matsumoto', 0.025), ('maynooth', 0.025), ('sander', 0.025), ('shichijo', 0.025), ('skull', 0.025), ('squid', 0.025), ('topography', 0.025), ('trump', 0.025), ('biomedical', 0.025), ('components', 0.024), ('sv', 0.024), ('start', 0.024), ('location', 0.023), ('four', 0.023), ('actual', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

Author: Sung C. Jun, Barak A. Pearlmutter

Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1

2 0.26190323 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

Author: Maneesh Sahani, Srikantan S. Nagarajan

Abstract: Existing source location and recovery algorithms used in magnetoencephalographic imaging generally assume that the source activity at different brain locations is independent or that the correlation structure is known. However, electrophysiological recordings of local field potentials show strong correlations in aggregate activity over significant distances. Indeed, it seems very likely that stimulus-evoked activity would follow strongly correlated time-courses in different brain areas. Here, we present, and validate through simulations, a new approach to source reconstruction in which the correlation between sources is modelled and estimated explicitly by variational Bayesian methods, facilitating accurate recovery of source locations and the time-courses of their activation. 1

3 0.10082991 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing

Author: Konrad P. Körding, Daniel M. Wolpert

Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.

4 0.091125593 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

Author: Eiji Mizutani, James Demmel

Abstract: The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer iteration to solve the associated “overdetermined” nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter’s implicit sparse Hessian matrix-vector multiply algorithm to construct the Krylov subspaces used to solve for the truncated Newton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs. 1

5 0.083724774 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

Author: Anton Schwaighofer, Marian Grigoras, Volker Tresp, Clemens Hoffmann

Abstract: In this article, we present a novel approach to solving the localization problem in cellular networks. The goal is to estimate a mobile user’s position, based on measurements of the signal strengths received from network base stations. Our solution works by building Gaussian process models for the distribution of signal strengths, as obtained in a series of calibration measurements. In the localization stage, the user’s position can be estimated by maximizing the likelihood of received signal strengths with respect to the position. We investigate the accuracy of the proposed approach on data obtained within a large indoor cellular network. 1

6 0.082661271 15 nips-2003-A Probabilistic Model of Auditory Space Representation in the Barn Owl

7 0.082419164 5 nips-2003-A Classification-based Cocktail-party Processor

8 0.073811263 55 nips-2003-Distributed Optimization in Adaptive Networks

9 0.069339424 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

10 0.057297509 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors

11 0.04726623 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

12 0.046530001 153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

13 0.03995087 43 nips-2003-Bounded Invariance and the Formation of Place Fields

14 0.038218722 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

15 0.037672751 53 nips-2003-Discriminating Deformable Shape Classes

16 0.037666075 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

17 0.032657158 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification

18 0.032125935 89 nips-2003-Impact of an Energy Normalization Transform on the Performance of the LF-ASD Brain Computer Interface

19 0.030064374 119 nips-2003-Local Phase Coherence and the Perception of Blur

20 0.028873267 187 nips-2003-Training a Quantum Neural Network


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.109), (1, 0.023), (2, 0.087), (3, -0.016), (4, -0.086), (5, 0.027), (6, 0.159), (7, -0.051), (8, -0.087), (9, 0.116), (10, 0.089), (11, -0.007), (12, 0.169), (13, 0.045), (14, -0.161), (15, 0.049), (16, -0.206), (17, 0.029), (18, 0.005), (19, 0.022), (20, 0.233), (21, 0.062), (22, -0.061), (23, 0.113), (24, 0.031), (25, -0.159), (26, -0.147), (27, 0.109), (28, 0.006), (29, -0.069), (30, -0.076), (31, -0.094), (32, 0.028), (33, 0.086), (34, 0.033), (35, 0.133), (36, -0.059), (37, -0.009), (38, 0.168), (39, -0.041), (40, -0.041), (41, -0.07), (42, -0.05), (43, -0.014), (44, 0.066), (45, -0.053), (46, 0.132), (47, -0.018), (48, -0.031), (49, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97398776 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

Author: Sung C. Jun, Barak A. Pearlmutter

Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1

2 0.80303395 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

Author: Maneesh Sahani, Srikantan S. Nagarajan

Abstract: Existing source location and recovery algorithms used in magnetoencephalographic imaging generally assume that the source activity at different brain locations is independent or that the correlation structure is known. However, electrophysiological recordings of local field potentials show strong correlations in aggregate activity over significant distances. Indeed, it seems very likely that stimulus-evoked activity would follow strongly correlated time-courses in different brain areas. Here, we present, and validate through simulations, a new approach to source reconstruction in which the correlation between sources is modelled and estimated explicitly by variational Bayesian methods, facilitating accurate recovery of source locations and the time-courses of their activation. 1

3 0.50227672 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

Author: Yuanqing Li, Shun-ichi Amari, Sergei Shishkin, Jianting Cao, Fanji Gu, Andrzej S. Cichocki

Abstract: In this paper, sparse representation (factorization) of a data matrix is first discussed. An overcomplete basis matrix is estimated by using the K−means method. We have proved that for the estimated overcomplete basis matrix, the sparse solution (coefficient matrix) with minimum l1 −norm is unique with probability of one, which can be obtained using a linear programming algorithm. The comparisons of the l1 −norm solution and the l0 −norm solution are also presented, which can be used in recoverability analysis of blind source separation (BSS). Next, we apply the sparse matrix factorization approach to BSS in the overcomplete case. Generally, if the sources are not sufficiently sparse, we perform blind separation in the time-frequency domain after preprocessing the observed data using the wavelet packets transformation. Third, an EEG experimental data analysis example is presented to illustrate the usefulness of the proposed approach and demonstrate its performance. Two almost independent components obtained by the sparse representation method are selected for phase synchronization analysis, and their periods of significant phase synchronization are found which are related to tasks. Finally, concluding remarks review the approach and state areas that require further study. 1

4 0.49836925 15 nips-2003-A Probabilistic Model of Auditory Space Representation in the Barn Owl

Author: Brian J. Fischer, Charles H. Anderson

Abstract: The barn owl is a nocturnal hunter, capable of capturing prey using auditory information alone [1]. The neural basis for this localization behavior is the existence of auditory neurons with spatial receptive fields [2]. We provide a mathematical description of the operations performed on auditory input signals by the barn owl that facilitate the creation of a representation of auditory space. To develop our model, we first formulate the sound localization problem solved by the barn owl as a statistical estimation problem. The implementation of the solution is constrained by the known neurobiology.

5 0.43321019 153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

Author: Cynthia Archer, Todd K. Leen, António M. Baptista

Abstract: As part of an environmental observation and forecasting system, sensors deployed in the Columbia RIver Estuary (CORIE) gather information on physical dynamics and changes in estuary habitat. Of these, salinity sensors are particularly susceptible to biofouling, which gradually degrades sensor response and corrupts critical data. Automatic fault detectors have the capability to identify bio-fouling early and minimize data loss. Complicating the development of discriminatory classifiers is the scarcity of bio-fouling onset examples and the variability of the bio-fouling signature. To solve these problems, we take a novelty detection approach that incorporates a parameterized bio-fouling model. These detectors identify the occurrence of bio-fouling, and its onset time as reliably as human experts. Real-time detectors installed during the summer of 2001 produced no false alarms, yet detected all episodes of sensor degradation before the field staff scheduled these sensors for cleaning. From this initial deployment through February 2003, our bio-fouling detectors have essentially doubled the amount of useful data coming from the CORIE sensors. 1

6 0.41476652 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

7 0.40334725 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing

8 0.35270464 55 nips-2003-Distributed Optimization in Adaptive Networks

9 0.34197724 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

10 0.30003014 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

11 0.25804129 5 nips-2003-A Classification-based Cocktail-party Processor

12 0.25151145 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors

13 0.24413535 187 nips-2003-Training a Quantum Neural Network

14 0.23424135 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

15 0.2066592 43 nips-2003-Bounded Invariance and the Formation of Place Fields

16 0.20386688 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

17 0.19082561 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

18 0.18840241 196 nips-2003-Wormholes Improve Contrastive Divergence

19 0.18467602 102 nips-2003-Large Scale Online Learning

20 0.18452471 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.04), (11, 0.013), (18, 0.012), (29, 0.012), (30, 0.018), (35, 0.031), (47, 0.02), (49, 0.012), (53, 0.144), (55, 0.011), (66, 0.011), (69, 0.018), (71, 0.026), (76, 0.021), (83, 0.367), (85, 0.053), (91, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84136504 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

Author: Sung C. Jun, Barak A. Pearlmutter

Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1

2 0.43643188 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

Author: Yuanqing Li, Shun-ichi Amari, Sergei Shishkin, Jianting Cao, Fanji Gu, Andrzej S. Cichocki

Abstract: In this paper, sparse representation (factorization) of a data matrix is first discussed. An overcomplete basis matrix is estimated by using the K−means method. We have proved that for the estimated overcomplete basis matrix, the sparse solution (coefficient matrix) with minimum l1 −norm is unique with probability of one, which can be obtained using a linear programming algorithm. The comparisons of the l1 −norm solution and the l0 −norm solution are also presented, which can be used in recoverability analysis of blind source separation (BSS). Next, we apply the sparse matrix factorization approach to BSS in the overcomplete case. Generally, if the sources are not sufficiently sparse, we perform blind separation in the time-frequency domain after preprocessing the observed data using the wavelet packets transformation. Third, an EEG experimental data analysis example is presented to illustrate the usefulness of the proposed approach and demonstrate its performance. Two almost independent components obtained by the sparse representation method are selected for phase synchronization analysis, and their periods of significant phase synchronization are found which are related to tasks. Finally, concluding remarks review the approach and state areas that require further study. 1

3 0.43531179 149 nips-2003-Optimal Manifold Representation of Data: An Information Theoretic Approach

Author: Denis V. Chigirev, William Bialek

Abstract: We introduce an information theoretic method for nonparametric, nonlinear dimensionality reduction, based on the infinite cluster limit of rate distortion theory. By constraining the information available to manifold coordinates, a natural probabilistic map emerges that assigns original data to corresponding points on a lower dimensional manifold. With only the information-distortion trade off as a parameter, our method determines the shape of the manifold, its dimensionality, the probabilistic map and the prior that provide optimal description of the data. 1 A simple example Some data sets may not be as complicated as they appear. Consider the set of points on a plane in Figure 1. As a two dimensional set, it requires a two dimensional density ρ(x, y) for its description. Since the data are sparse the density will be almost singular. We may use a smoothing kernel, but then the data set will be described by a complicated combination of troughs and peaks with no obvious pattern and hence no ability to generalize. We intuitively, however, see a strong one dimensional structure (a curve) underlying the data. In this paper we attempt to capture this intuition formally, through the use of the infinite cluster limit of rate distortion theory. Any set of points can be embedded in a hypersurface of any intrinsic dimensionality if we allow that hypersurface to be highly “folded.” For example, in Figure 1, any curve that goes through all the points gives a one dimensional representation. We would like to avoid such solutions, since they do not help us discover structure in the data. Looking for a simpler description one may choose to penalize the curvature term [1]. The problem with this approach is that it is not easily generalized to multiple dimensions, and requires the dimensionality of the solution as an input. An alternative approach is to allow curves of all shapes and sizes, but to send the reduced coordinates through an information bottleneck. With a fixed number of bits, position along a highly convoluted curve becomes uncertain. This will penalize curves that follow the data too closely (see Figure 1). There are several advantages to this approach. First, it removes the artificiality introduced by Hastie [2] of adding to the cost function only orthogonal errors. If we believe that data points fall out of the manifold due to noise, there is no reason to treat the projection onto the manifold as exact. Second, it does not require the dimension- 9 8 Figure 1: Rate distortion curve for a data set of 25 points (red). We used 1000 points to represent the curve which where initialized by scattering them uniformly on the plane. Note that the produced curve is well defined, one dimensional and smooth. 7 6 5 4 3 2 1 0 2 4 6 8 10 12 ality of the solution manifold as an input. By adding extra dimensions, one quickly looses the precision with which manifold points are specified (due to the fixed information bottleneck). Hence, the optimal dimension emerges naturally. This also means that the method works well in many dimensions with no adjustments. Third, the method handles sparse data well. This is important since in high dimensional spaces all data sets are sparse, i.e. they look like points in Figure 1, and the density estimation becomes impossible. Luckily, if the data are truly generated by a lower dimensional process, then density estimation in the data space is not important (from the viewpoint of prediction or any other). What is critical is the density of the data along the manifold (known in latent variable modeling as a prior), and our algorithm finds it naturally. 2 Latent variable models and dimensionality reduction Recently, the problem of reducing the dimensionality of a data set has received renewed attention [3,4]. The underlying idea, due to Hotelling [5], is that most of the variation in many high dimensional data sets can often be explained by a few latent variables. Alternatively, we say that rather than filling the whole space, the data lie on a lower dimensional manifold. The dimensionality of this manifold is the dimensionality of the latent space and the coordinate system on this manifold provides the latent variables. Traditional tools of principal component analysis (PCA) and factor analysis (FA) are still the most widely used methods in data analysis. They project the data onto a hyperplane, so the reduced coordinates are easy to interpret. However, these methods are unable to deal with nonlinear correlations in a data set. To accommodate nonlinearity in a data set, one has to relax the assumption that the data is modeled by a hyperplane, and allow a general low dimensional manifold of unknown shape and dimensionality. The same questions that we asked in the previous section apply here. What do we mean by requiring that “the manifold models the data well”? In the next section, we formalize this notion by defining the manifold description of data as a doublet (the shape of the manifold and the projection map). Note that we do not require the probability distribution over the manifold (known for generative models [6,7] as a prior distribution over the latent variables and postulated a priori). It is completely determined by the doublet. Nonlinear correlations in data can also be accommodated implicitly, without constructing an actual low dimensional manifold. By mapping the data from the original space to an even higher dimensional feature space, we may hope that the correlations will become linearized and PCA will apply. Kernel methods [8] allow us to do this without actually constructing an explicit map to feature space. They introduce nonlinearity through an a priori nonlinear kernel. Alternatively, autoassociative neural networks [9] force the data through a bottleneck (with an internal layer of desired dimensionality) to produce a reduced description. One of the disadvantages of these methods is that the results are not easy to interpret. Recent attempts to describe a data set with a low dimensional representation generally follow into two categories: spectral methods and density modeling methods. Spectral methods (LLE [3], ISOMAP [4], Laplacian eigenmaps [10]) give reduced coordinates of an a priori dimensionality by introducing a quadratic cost function in reduced coordinates (hence eigenvectors are solutions) that mimics the relationships between points in the original data space (geodesic distance for ISOMAP, linear reconstruction for LLE). Density modeling methods (GTM [6], GMM [7]) are generative models that try to reproduce the data with fewer variables. They require a prior and a parametric generative model to be introduced a priori and then find optimal parameters via maximum likelihood. The approach that we will take is inspired by the work of Kramer [9] and others who tried to formulate dimensionality reduction as a compression problem. They tried to solve the problem by building an explicit neural network encoder-decoder system which restricted the information implicitly by limiting the number of nodes in the bottleneck layer. Extending their intuition with the tools of information theory, we recast dimensionality reduction as a compression problem where the bottleneck is the information available to manifold coordinates. This allows us to define the optimal manifold description as that which produces the best reconstruction of the original data set, given that the coordinates can only be transmitted through a channel of fixed capacity. 3 Dimensionality reduction as compression Suppose that we have a data set X in a high dimensional state space RD described by a density function ρ(x). We would like to find a “simplified” description of this data set. One may do so by visualizing a lower dimensional manifold M that “almost” describes the data. If we have a manifold M and a stochastic map PM : x → PM (µ|x) to points µ on the manifold, we will say that they provide a manifold description of the data set X. Note that the stochastic map here is well justified: if a data point does not lie exactly on the manifold then we should expect some uncertainty in the estimation of the value of its latent variables. Also note that we do not need to specify the inverse (generative) map: M → RD ; it can be obtained by Bayes’ rule. The manifold description (M, PM ) is a less than faithful representation of the data. To formalize this notion we will introduce the distortion measure D(M, PM , ρ): ρ(x)PM (µ|x) x − µ 2 dD xDµ. D(M, PM , ρ) = x∈RD (1) µ∈M Here we have assumed the Euclidean distance function for simplicity. The stochastic map, PM (µ|x), together with the density, ρ(x), define a joint probability function P (M, X) that allows us to calculate the mutual information between the data and its manifold representation: I(X, M) = P (x, µ) log x∈X µ∈M P (x, µ) dD xDµ. ρ(x)PM (µ) (2) This quantity tells us how many bits (on average) are required to encode x into µ. If we view the manifold representation of X as a compression scheme, then I(X, M) tells us the necessary capacity of the channel needed to transmit the compressed data. Ideally, we would like to obtain a manifold description {M, PM (M|X)} of the data set X that provides both a low distortion D(M, PM , ρ) and a good compression (i.e. small I(X, M)). The more bits we are willing to provide for the description of the data, the more detailed a manifold that can be constructed. So there is a trade off between how faithful a manifold representation can be and how much information is required for its description. To formalize this notion we introduce the concept of an optimal manifold. DEFINITION. Given a data set X and a channel capacity I, a manifold description (M, PM (M|X)) that minimizes the distortion D(M, PM , X), and requires only information I for representing an element of X, will be called an optimal manifold M(I, X). Note that another way to define an optimal manifold is to require that the information I(M, X) is minimized while the average distortion is fixed at value D. The shape and the dimensionality of optimal manifold depends on our information resolution (or the description length that we are willing to allow). This dependence captures our intuition that for real world, multi-scale data, a proper manifold representation must reflect the compression level we are trying to achieve. To find the optimal manifold (M(I), PM(I) ) for a given data set X, we must solve a constrained optimization problem. Let us introduce a Lagrange multiplier λ that represents the trade off between information and distortion. Then optimal manifold M(I) minimizes the functional: F(M, PM ) = D + λI. (3) Let us parametrize the manifold M by t (presumably t ∈ Rd for some d ≤ D). The function γ(t) : t → M maps the points from the parameter space onto the manifold and therefore describes the manifold. Our equations become: D = dD x dd t ρ(x)P (t|x) x − γ(t) 2 , I = dD x dd t ρ(x)P (t|x) log P (t|x) , P (t) F(γ(t), P (t|x)) = D + λI. (4) (5) (6) Note that both information and distortion measures are properties of the manifold description doublet {M, PM (M|X)} and are invariant under reparametrization. We require the variations of the functional to vanish for optimal manifolds δF/δγ(t) = 0 and δF/δP (t|x) = 0, to obtain the following set of self consistent equations: P (t) = γ(t) = P (t|x) = Π(x) = dD x ρ(x)P (t|x), 1 dD x xρ(x)P (t|x), P (t) P (t) − 1 x−γ (t) 2 e λ , Π(x) 2 1 dd t P (t)e− λ x−γ (t) . (7) (8) (9) (10) In practice we do not have the full density ρ(x), but only a discrete number of samples. 1 So we have to approximate ρ(x) = N δ(x − xi ), where N is the number of samples, i is the sample label, and xi is the multidimensional vector describing the ith sample. Similarly, instead of using a continuous variable t we use a discrete set t ∈ {t1 , t2 , ..., tK } of K points to model the manifold. Note that in (7 − 10) the variable t appears only as an argument for other functions, so we can replace the integral over t by a sum over k = 1..K. Then P (t|x) becomes Pk (xi ),γ(t) is now γ k , and P (t) is Pk . The solution to the resulting set of equations in discrete variables (11 − 14) can be found by an iterative Blahut-Arimoto procedure [11] with an additional EM-like step. Here (n) denotes the iteration step, and α is a coordinate index in RD . The iteration scheme becomes: (n) Pk (n) γk,α = = N 1 N (n) Pk (xi ) = Π(n) (xi ) N 1 1 (n) N P k where α (11) i=1 = (n) xi,α Pk (xi ), (12) i=1 1, . . . , D, K (n) 1 (n) Pk e− λ xi −γ k 2 (13) k=1 (n) (n+1) Pk (xi ) = (n) 2 Pk 1 . e− λ xi −γ k (n) (x ) Π i (14) 0 0 One can initialize γk and Pk (xi ) by choosing K points at random from the data set and 0 letting γk = xi(k) and Pk = 1/K, then use equations (13) and (14) to initialize the 0 association map Pk (xi ). The iteration procedure (11 − 14) is terminated once n−1 n max |γk − γk | < , (15) k where determines the precision with which the manifold points are located. The above algorithm requires the information distortion cost λ = −δD/δI as a parameter. If we want to find the manifold description (M, P (M|X)) for a particular value of information I, we can plot the curve I(λ) and, because it’s monotonic, we can easily find the solution iteratively, arbitrarily close to a given value of I. 4 Evaluating the solution The result of our algorithm is a collection of K manifold points, γk ∈ M ⊂ RD , and a stochastic projection map, Pk (xi ), which maps the points from the data space onto the manifold. Presumably, the manifold M has a well defined intrinsic dimensionality d. If we imagine a little ball of radius r centered at some point on the manifold of intrinsic dimensionality d, and then we begin to grow the ball, the number of points on the manifold that fall inside will scale as rd . On the other hand, this will not be necessarily true for the original data set, since it is more spread out and resembles locally the whole embedding space RD . The Grassberger-Procaccia algorithm [12] captures this intuition by calculating the correlation dimension. First, calculate the correlation integral: 2 C(r) = N (N − 1) N N H(r − |xi − xj |), (16) i=1 j>i where H(x) is a step function with H(x) = 1 for x > 0 and H(x) = 0 for x < 0. This measures the probability that any two points fall within the ball of radius r. Then define 0 original data manifold representation -2 ln C(r) -4 -6 -8 -10 -12 -14 -5 -4 -3 -2 -1 0 1 2 3 4 ln r Figure 2: The semicircle. (a) N = 3150 points randomly scattered around a semicircle of radius R = 20 by a normal process with σ = 1 and the final positions of 100 manifold points. (b) Log log plot of C(r) vs r for both the manifold points (squares) and the original data set (circles). the correlation dimension at length scale r as the slope on the log log plot. dcorr (r) = d log C(r) . d log r (17) For points lying on a manifold the slope remains constant and the dimensionality is fixed, while the correlation dimension of the original data set quickly approaches that of the embedding space as we decrease the length scale. Note that the slope at large length scales always tends to decrease due to finite span of the data and curvature effects and therefore does not provide a reliable estimator of intrinsic dimensionality. 5 5.1 Examples Semi-Circle We have randomly generated N = 3150 data points scattered by a normal distribution with σ = 1 around a semi-circle of radius R = 20 (Figure 2a). Then we ran the algorithm with K = 100 and λ = 8, and terminated the iterative algorithm once the precision = 0.1 had been reached. The resulting manifold is depicted in red. To test the quality of our solution, we calculated the correlation dimension as a function of spatial scale for both the manifold points and the original data set (Figure 2b). As one can see, the manifold solution is of fixed dimensionality (the slope remains constant), while the original data set exhibits varying dimensionality. One should also note that the manifold points have dcorr (r) = 1 well into the territory where the original data set becomes two dimensional. This is what we should expect: at a given information level (in this case, I = 2.8 bits), the information about the second (local) degree of freedom is lost, and the resulting structure is one dimensional. A note about the parameters. Letting K → ∞ does not alter the solution. The information I and distortion D remain the same, and the additional points γk also fall on the semi-circle and are simple interpolations between the original manifold points. This allows us to claim that what we have found is a manifold, and not an agglomeration of clustering centers. Second, varying λ changes the information resolution I(λ): for small λ (high information rate) the local structure becomes important. At high information rate the solution undergoes 3.5 3 3 3 2.5 2.5 2 2.5 2 2 1.5 1.5 1.5 1 1 1 0.5 0.5 0 0.5 -0.5 0 0 -1 5 -0.5 -0.5 4 1 3 0.5 2 -1 -1 0 1 -0.5 0 -1 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 Figure 3: S-shaped sheet in 3D. (a) N = 2000 random points on a surface of an S-shaped sheet in 3D. (b) Normal noise added. XY-plane projection of the data. (c) Optimal manifold points in 3D, projected onto an XY plane for easy visualization. a phase transition, and the resulting manifold becomes two dimensional to take into account the local structure. Alternatively, if we take λ → ∞, the cost of information rate becomes very high and the whole manifold collapses to a single point (becomes zero dimensional). 5.2 S-surface Here we took N = 2000 points covering an S-shaped sheet in three dimensions (Figure 3a), and then scattered the position of each point by adding Gaussian noise. The resulting manifold is difficult to visualize in three dimensions, so we provided its projection onto an XY plane for an illustrative purpose (Figure 3b). After running our algorithm we have recovered the original structure of the manifold (Figure 3c). 6 Discussion The problem of finding low dimensional manifolds in high dimensional data requires regularization to avoid hgihly folded, Peano curve like solutions which are low dimensional in the mathematical sense but fail to capture our geometric intuition. Rather than constraining geometrical features of the manifold (e.g., the curvature) we have constrained the mutual information between positions on the manifold and positions in the original data space, and this is invariant to all invertible coordinate transformations in either space. This approach enforces “smoothness” of the manifold only implicitly, but nonetheless seems to work. Our information theoretic approach has considerable generality relative to methods based on specific smoothing criteria, but requires a separate algorithm, such as LLE, to give the manifold points curvilinear coordinates. For data points not in the original data set, equations (9-10) and (13-14) provide the mapping onto the manifold. Eqn. (7) gives the probability distribution over the latent variable, known in the density modeling literature as “the prior.” The running time of the algorithm is linear in N . This compares favorably with other methods and makes it particularly attractive for very large data sets. The number of manifold points K usually is chosen as large as possible, given the computational constraints, to have a dense sampling of the manifold. However, a value of K << N is often sufficient, since D(λ, K) → D(λ) and I(λ, K) → I(λ) approach their limits rather quickly (the convergence improves for large λ and deteriorates for small λ). In the example of a semi-circle, the value of K = 30 was sufficient at the compression level of I = 2.8 bits. In general, the threshold value for K scales exponentially with the latent dimensionality (rather than with the dimensionality of the embedding space). The choice of λ depends on the desired information resolution, since I depends on λ. Ideally, one should plot the function I(λ) and then choose the region of interest. I(λ) is a monotonically decreasing function, with the kinks corresponding to phase transitions where the optimal manifold abruptly changes its dimensionality. In practice, we may want to run the algorithm only for a few choices of λ, and we would like to start with values that are most likely to correspond to a low dimensional latent variable representation. In this case, as a rule of thumb, we choose λ smaller, but on the order of the largest linear dimension (i.e. λ/2 ∼ Lmax ). The dependence of the optimal manifold M(I) on information resolution reflects the multi-scale nature of the data and should not be taken as a shortcoming. References [1] Bregler, C. & Omohundro, S. (1995) Nonlinear image interpolation using manifold learning. Advances in Neural Information Processing Systems 7. MIT Press. [2] Hastie, T. & Stuetzle, W. (1989) Principal curves. Journal of the American Statistical Association, 84(406), 502-516. [3] Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326. [4] Tenenbaum, J., de Silva, V., & Langford, J. (2000) A global geometric framework for nonlinear dimensionality reduction. Science, 290 , 2319–2323. [5] Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417-441,498-520. [6] Bishop, C., Svensen, M. & Williams, C. (1998) GTM: The generative topographic mapping. Neural Computation,10, 215–234. [7] Brand, M. (2003) Charting a manifold. Advances in Neural Information Processing Systems 15. MIT Press. [8] Scholkopf, B., Smola, A. & Muller K-R. (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299-1319. [9] Kramer, M. (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37, 233-243. [10] Belkin M. & Niyogi P. (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373-1396. [11] Blahut, R. (1972) Computation of channel capacity and rate distortion function. IEEE Trans. Inform. Theory, IT-18, 460-473. [12] Grassberger, P., & Procaccia, I. (1983) Characterization of strange attractors. Physical Review Letters, 50, 346-349.

4 0.43130699 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method

Author: Susanne Still, William Bialek, Léon Bottou

Abstract: We argue that K–means and deterministic annealing algorithms for geometric clustering can be derived from the more general Information Bottleneck approach. If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of “smooth” initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification, we argue that this approach can be more efficient and robust than classic algorithms. 1

5 0.43032685 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing

Author: Konrad P. Körding, Daniel M. Wolpert

Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.

6 0.42886639 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

7 0.42784262 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

8 0.4262898 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

9 0.42408574 66 nips-2003-Extreme Components Analysis

10 0.42303032 107 nips-2003-Learning Spectral Clustering

11 0.41915938 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

12 0.41842762 113 nips-2003-Learning with Local and Global Consistency

13 0.41780606 86 nips-2003-ICA-based Clustering of Genes from Microarray Expression Data

14 0.41715783 126 nips-2003-Measure Based Regularization

15 0.41689739 92 nips-2003-Information Bottleneck for Gaussian Variables

16 0.4166221 59 nips-2003-Efficient and Robust Feature Extraction by Maximum Margin Criterion

17 0.41606158 73 nips-2003-Feature Selection in Clustering Problems

18 0.41595083 49 nips-2003-Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels

19 0.41484517 81 nips-2003-Geometric Analysis of Constrained Curves

20 0.41458082 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning