nips nips2012 nips2012-341 knowledge-graph by maker-knowledge-mining

341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex


Source: pdf

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The topographic unsupervised learning of natural sounds in the auditory cortex Hiroki Terashima The University of Tokyo / JSPS Tokyo, Japan teratti@teratti. [sent-1, score-0.991]

2 jp Abstract The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. [sent-5, score-0.621]

3 Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. [sent-6, score-0.761]

4 This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. [sent-7, score-0.535]

5 To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. [sent-8, score-0.256]

6 In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. [sent-9, score-0.833]

7 The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. [sent-10, score-1.21]

8 1 Introduction Despite the anatomical and functional similarities between the primary auditory cortex (A1) and the primary visual cortex (V1), the computational modelling of A1 has proven to be less fruitful than V1, primarily because the responses of A1 cells are more disorganized. [sent-12, score-0.943]

9 For instance, the receptive fields of V1 cells are localized within a small portion of the field of view [1], whereas certain A1 cells have receptive fields that are not localized, as these A1 cells demonstrate significant responses to multiple distant frequencies [2, 3]. [sent-13, score-1.123]

10 , the retinotopy of V1 and the tonotopy of A1; these structures had long been considered to be quite similar, but studies on a microscopic scale have demonstrated that in mice, the tonotopy of A1 is much more disordered [4, 5] than the retinotopy of V1 [6, 7]. [sent-16, score-1.342]

11 A number of computational modelling studies have emphasized the close associations between V1 cells and natural image statistics, which suggests that the V1 adopts an unsupervised, efficient coding strategy [10]. [sent-19, score-0.375]

12 For instance, the receptive fields of V1 simple cells were reproduced by either sparse coding [11] or the independent component analysis [12] of natural images. [sent-20, score-0.434]

13 Similar efforts to address A1 have been attempted by only a few studies, which demonstrated that the efficient coding of natural, harmonic sounds, such as human voices or piano 1 recordings, can explain the basic receptive fields of A1 cells [16, 17] and their harmony-related responses [18, 19]. [sent-22, score-0.728]

14 In an integrated and computational manner, the present paper attempts to explain why the tonotopy of A1 is more disordered than the retinotopy of V1. [sent-24, score-0.703]

15 We hypothesized that V1 and A1 still share an efficient coding strategy, and we therefore proposed that the distant correlations in natural sounds would be responsible for the relative disorder in A1. [sent-25, score-0.94]

16 To test this hypothesis, we first demonstrated the significant differences between natural images and natural sounds. [sent-26, score-0.303]

17 Natural images and natural sounds were then each used as inputs for topographic independent component analysis, a model that had previously been proposed for the smooth topography of V1, and maps were generated for these images and sounds. [sent-27, score-0.973]

18 Due to the distant correlations of natural sounds, greater disorder was observed in the learned map that had been adapted to natural sounds than in the analogous map that had been adapted to images. [sent-28, score-1.002]

19 For natural sounds, this model not only predicted harmonic relationships between neighbouring cells but also demonstrated nonlinear responses that appeared similar to the responses of the pitch-selective cells that were recently found in A1. [sent-29, score-1.167]

20 These results suggest that the apparently dissimilar topographies of V1 and A1 may reflect statistical differences between natural images and natural sounds; however, these two regions may employ a common adaptive strategy. [sent-30, score-0.289]

21 1 Topographic independent component analysis Herein, we discuss an unsupervised learning model termed topographic independent component analysis (TICA), which was originally proposed for the study of V1 topography [13, 14]. [sent-32, score-0.338]

22 This model comprises two layers: the first layer of N units models the linear responses of V1 simple cells, whereas the second layer of N units models the nonlinear responses of V1 complex cells, and the connections between the layers define a topography. [sent-33, score-0.718]

23 Given a whitened input vector I(x) ∈ Rd (here, d = N ), the input is reconstructed by the linear superposition of a basis ai ∈ Rd , each of which corresponds to the first-layer units ∑ I= si ai (1) i where si ∈ R are activity levels of the units or model neurons. [sent-34, score-0.617]

24 Using the activities of the first layer, the activities of the second-layer units ci ∈ R can be defined as follows: ∑ ci = h(i, j)s2 (2) j j where h(i, j) is the neighbourhood function that takes the value of 1 if i and j are neighbours and is 0 otherwise. [sent-36, score-0.391]

25 (B) The correlation matrix of the human voice spectra (right) demonstrated not only local correlation but also off-diagonal distant correlations produced by harmonics. [sent-48, score-0.504]

26 2 The discontinuity index for topographic representation To compare the degrees of disorder in topographies of different modalities, we defined a discontinuity index (DI) for each point i of the maps. [sent-55, score-0.51]

27 1 Results Correlations of natural images and natural sounds Given that V1 is supposed to adapt to natural images and that A1 is supposed to adapt to natural sounds, the first analysis in this study simply compared statistics for natural images and natural sounds. [sent-63, score-1.041]

28 6k [Hz] F CF [Hz] 0 Tonotopy CF [Hz] Number of units B C Tonotopy 90 90 200 [ms] Figure 2: The ordered retinotopy and disordered tonotopy. [sent-71, score-0.581]

29 (A) The topography of units adapted to natural images. [sent-72, score-0.459]

30 (C) The topography of spectro-temporal units that have been adapted to natural sounds. [sent-75, score-0.459]

31 (D-F) The retinotopy of the visual map (D) is smooth, whereas the tonotopy of the auditory map (F) is more disordered, although global tonotopy still exists (E). [sent-76, score-1.259]

32 For natural sounds, we used human narratives from the Handbook of the International Phonetic Association [21], as efficient representations of human voices have been successful in facilitating studies of various components of the auditory system [22, 23], including A1 [16, 17]. [sent-80, score-0.531]

33 After these sounds were downsampled to 4 kHz, their spectrograms were generated using the NSL toolbox [24] to approximate peripheral auditory processing. [sent-81, score-0.76]

34 The most prominent off-diagonal correlation, which was just 1 octave away from the main diagonal, corresponded to the second harmonic of a sound, i. [sent-85, score-0.27]

35 These distant correlations represent relatively typical results for natural sounds and differ greatly from the strictly local correlations observed for natural images. [sent-91, score-0.871]

36 2 Greater disorder for the tonotopy than the retinotopy To test the hypothesis that V1 and A1 share a learning strategy, the TICA model was applied to natural images and natural sounds, which exhibit different statistical profiles, as discussed above. [sent-93, score-0.94]

37 4 Strength of distant correlation Vision-like Audition-like Figure 3: The correlation between discontinuity and input “auditoriness”. [sent-98, score-0.343]

38 Figure 2A illustrates the visual topographic map obtained from this analysis, a small square of which constitutes a basis vector ai . [sent-105, score-0.309]

39 As previously observed in the original TICA study [13, 14], each unit was localized, oriented, and bandpassed; thus, these units appeared to be organized similarly to the receptive fields of V1 simple cells. [sent-106, score-0.338]

40 Figure 2B graphically indicates that the obtained DI values were quite low, which is consistent with the smooth retinotopy illustrated in Figure 2D. [sent-109, score-0.276]

41 Next, another TICA model was applied to natural sounds to create an auditory topographic map that could be compared to the visual topography. [sent-110, score-0.998]

42 As detailed in the previous section, spectrograms of human voices (sampled at 8 kHz) were generated using the NSL toolbox to approximate peripheral auditory processing. [sent-111, score-0.498]

43 Figure 2C shows the resulting auditory topographic map, which is composed of spectro-temporal units of ai that are represented by small squares. [sent-114, score-0.76]

44 The units were localized temporally and spectrally, and some units demonstrated multiple, harmonic peaks; thus, these units appeared to reasonably represent the typical spectro-temporal receptive fields of A1 cells [16, 3]. [sent-115, score-1.108]

45 The frequency to which an auditory neuron responds most significantly is called its characteristic frequency (CF) [2]. [sent-116, score-0.518]

46 Within local regions, the tonotopy was not necessarily smooth, i. [sent-121, score-0.303]

47 However, at a global level, a smooth tonotopy was observed (Figure 2E). [sent-124, score-0.339]

48 The distribution of tonotopic DI values is shown in Figure 2B, which clearly demonstrates that the tonotopy was more disordered than the retinotopy (p < 0. [sent-126, score-0.767]

49 For this purpose, we generated artificial inputs (d = 16) with a parameter pa ∈ [0, 1] that regulates the degree of distant correlations. [sent-130, score-0.274]

50 4 Distance of two units Figure 4: The harmonic relationships between CFs of neighbouring units. [sent-147, score-0.555]

51 There were three peaks that indicate harmonic relationships between neighbouring units. [sent-150, score-0.409]

52 The topography was also set as a one-dimensional torus of 16 units with a neighbourhood window size of 5. [sent-157, score-0.459]

53 If the input only demonstrated local correlations like visual stimuli (pa ∼ 0), then its learned topography was smooth (i. [sent-161, score-0.41]

54 The DI values generally increased as distant correlations appeared more frequently, i. [sent-164, score-0.314]

55 Thus, the topographic disorder of auditory maps results from distant correlations presented by natural auditory signals. [sent-167, score-1.419]

56 4 The harmonic relationship among neighbouring units Several experiments [4, 5] have reported that the CFs of neighbouring cells can differ by up to 4 octaves, although these studies have failed to provide additional detail regarding the local spatial patterns of the CF distributions. [sent-169, score-0.928]

57 However, if the auditory topography is representative of natural stimulus statistics, the topographic map is likely to possess certain additional spatial features that reflect the statistical characteristics of natural sounds. [sent-170, score-0.864]

58 To enable a detailed investigation of the CF distribution, we employed a model that had been adapted to finer frequency spectra of natural sounds, and this model was then used throughout the remainder of the study. [sent-171, score-0.265]

59 As the temporal structure of the auditory receptive fields was less dominant than their spectral structure (Figure 2C), we focused solely on the spectral domain and did not attempt to address temporal information. [sent-172, score-0.449]

60 Therefore, the inputs for the new model (n = 100,000) were short-time frequency spectra of 128 pixels each (24 pixels = 1 octave). [sent-173, score-0.285]

61 The CFs of even neighbouring units differed by up to ∼ 4 octaves, which is consistent with recent experimental findings [4, 5]. [sent-181, score-0.363]

62 2 90 0 0 f0 4*f0 8*f0 Frequency [Hz] 12*f0 8k [Hz] CF Normalized activity B Harmonic composition of MFs A f0 2 4 6 8 10 Lowest harmonic present Pitch selective units Figure 5: Nonlinear responses similar to pitch selectivity. [sent-188, score-0.597]

63 (C) The distribution of pitch-selective units on the smoothed tonotopy in a single session. [sent-191, score-0.484]

64 These examples indicate that CFs of neighbouring units did not differ randomly, but tended to be harmonically related. [sent-197, score-0.395]

65 Thus, this prediction of a harmonic relationship in neighbouring CFs will need to be examined in more detailed investigations. [sent-200, score-0.34]

66 ), then its pitch is the frequency of the lowest harmonic, which is called the fundamental frequency f0 . [sent-206, score-0.312]

67 The perception of pitch is known to remain constant even if the sound lacks power at lower harmonics; in fact, pitch at f0 can be perceived from a sound that lacks f0 , a phenomenon known as “missing fundamental” [26]. [sent-207, score-0.355]

68 For each unit, responses were calculated to complex tones termed missing fundamental complex tones (MFs) [27]. [sent-211, score-0.271]

69 The MFs were composed of three consecutive harmonics sharing a single f0 ; the lowest frequency for these consecutive harmonics varied from the fundamental frequency (f0 ) to the tenth harmonic (10f0 ), as shown in Figure 5A. [sent-212, score-0.53]

70 Figure 5B illustrates the response profiles of the pitch-selective units, which demonstrated sustained activity for MFs with a lowest harmonic below the sixth harmonic (6f0 ), and this result is similar to previously published data [27]. [sent-219, score-0.499]

71 Additionally, these units were located in a low-frequency region of the global tonotopy, as shown in Figure 5C, and this feature of pitch-selective units is also consistent with previous findings [27]. [sent-220, score-0.362]

72 The second layer of the TICA model, which contained the pitch-selective units, was originally designed to represent the layer of V1 complex cells, which have nonlinear responses that can be modelled by a summation of “energies” of neighbouring simple cells [13, 14, 15]. [sent-221, score-0.626]

73 7 B Natural sounds Correlation 0 Retinal position 0 No correlation (different objects) Retinal distance 2:3 1:2 Smooth V1 retinotopy Cortical position 1:3 Disordered A1 tonotopy CF Natural images Correlation A 1. [sent-223, score-1.036]

74 4 Discussion Using a single model, we have provided a computational account explaining why the tonotopy of A1 is more disordered than the retinotopy of V1. [sent-225, score-0.703]

75 First, we demonstrated that there are significant differences between natural images and natural sounds; in particular, the latter evince distant correlations, whereas the former do not. [sent-226, score-0.488]

76 The topographic independent component analysis therefore generated a disordered tonotopy for these sounds, whereas the retinotopy adapted to natural images was locally organized throughout. [sent-227, score-1.078]

77 Detailed analyses of the TICA model predicted harmonic relationships among neighbouring neurons; furthermore, these analyses successfully replicated pitch selectivity, a nonlinear response of actual cells, using a mechanism that was designed to model V1 complex cells. [sent-228, score-0.56]

78 The results suggest that A1 and V1 may share an adaptive strategy, and the dissimilar topographies of visual and auditory maps may therefore reflect significant differences in the natural stimuli. [sent-229, score-0.546]

79 Natural images correlate only locally, which produces a smooth retinotopy through an efficient coding strategy (Figure 6A). [sent-231, score-0.412]

80 By contrast, natural sounds exhibit additional distant correlations (primarily correlations among harmonics), which produce the topographic disorganization observed for A1 (Figure 6B). [sent-232, score-0.968]

81 Our final result suggested that a common mechanism may underlie the complex cells of V1 and the pitch-selective cells of A1. [sent-235, score-0.418]

82 Additional support for this notion was provided by recent evidence indicating that the pitch-selective cells are most commonly found in the supragranular layer [27], and V1 complex cells display a similar tendency. [sent-236, score-0.461]

83 To the best of our knowledge, no previous studies in the literature have attempted to use this analogy of V1 complex cells to explain A1 pitch-selective cells (however, other potential analogues have been mentioned [31, 32]). [sent-238, score-0.452]

84 Another issue that must be addressed is what functional roles the other units in the second layer play. [sent-240, score-0.252]

85 Modular organization of frequency integration in primary auditory cortex. [sent-257, score-0.526]

86 Dichotomy of functional organization in the mouse auditory cortex. [sent-271, score-0.458]

87 Functional organization and population dynamics in the mouse primary auditory cortex. [sent-277, score-0.484]

88 The spatial distribution of unit characteristic frequency in the primary auditory cortex of the cat. [sent-301, score-0.575]

89 Functional architecture in cat primary auditory cortex: tonotopic organization. [sent-310, score-0.474]

90 A two-layer sparse coding model learns simple and complex cell receptive a fields and topography from natural images. [sent-338, score-0.435]

91 A hierarchical generative model for overcomplete topographic representations in natural images. [sent-350, score-0.306]

92 Unsupervised learning models of primary cortical receptive fields and receptive field plasticity. [sent-368, score-0.275]

93 Sparse codes of harmonic natural sounds and their modulatory interactions. [sent-373, score-0.561]

94 Sparse coding of harmonic vocalization in monkey auditory cortex. [sent-381, score-0.579]

95 Independent component filters of natural images compared with simple cells in primary visual cortex. [sent-392, score-0.458]

96 Measurement of absolute auditory thresholds in the common marmoset (callithrix jacchus). [sent-423, score-0.356]

97 The neuronal representation of pitch in primate auditory cortex. [sent-435, score-0.478]

98 Experimentally induced visual projections into auditory thalamus and cortex. [sent-443, score-0.413]

99 Brainstem inputs to the ferret medial geniculate nucleus and the a effect of early deafferentation on novel retinal projections to the auditory thalamus. [sent-449, score-0.437]

100 Nonlinearity of coding in primary auditory cortex of the awake ferret. [sent-461, score-0.525]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('auditory', 0.356), ('sounds', 0.318), ('tonotopy', 0.303), ('retinotopy', 0.24), ('cells', 0.191), ('distant', 0.185), ('neighbouring', 0.182), ('topographic', 0.182), ('units', 0.181), ('disordered', 0.16), ('harmonic', 0.158), ('topography', 0.156), ('disorder', 0.156), ('tica', 0.144), ('cf', 0.141), ('pitch', 0.122), ('octave', 0.112), ('responses', 0.103), ('correlations', 0.099), ('mfs', 0.096), ('receptive', 0.093), ('harmonics', 0.091), ('di', 0.088), ('natural', 0.085), ('cfs', 0.085), ('frequency', 0.081), ('auditoriness', 0.08), ('images', 0.071), ('octaves', 0.07), ('coding', 0.065), ('bandpassed', 0.064), ('tonotopic', 0.064), ('torus', 0.064), ('demonstrated', 0.062), ('discontinuity', 0.062), ('spectra', 0.062), ('neighbourhood', 0.058), ('visual', 0.057), ('voices', 0.056), ('primary', 0.054), ('inputs', 0.054), ('elds', 0.052), ('hz', 0.051), ('cortex', 0.05), ('si', 0.05), ('spectrograms', 0.049), ('phonetic', 0.049), ('correlation', 0.048), ('nsl', 0.048), ('terashima', 0.048), ('tones', 0.048), ('topographies', 0.048), ('ci', 0.045), ('frequencies', 0.045), ('pixels', 0.044), ('layer', 0.043), ('ai', 0.041), ('sound', 0.041), ('whitened', 0.04), ('mouse', 0.039), ('tokyo', 0.039), ('overcomplete', 0.039), ('peripheral', 0.037), ('khz', 0.037), ('hateren', 0.037), ('retinotopic', 0.037), ('neuroscience', 0.037), ('adapted', 0.037), ('smooth', 0.036), ('complex', 0.036), ('peaks', 0.035), ('organization', 0.035), ('cortical', 0.035), ('pa', 0.035), ('unit', 0.034), ('studies', 0.034), ('relationships', 0.034), ('activity', 0.033), ('hypothesized', 0.032), ('angelucci', 0.032), ('harmonically', 0.032), ('multipeaked', 0.032), ('nnb', 0.032), ('patches', 0.032), ('activities', 0.031), ('hyv', 0.031), ('rinen', 0.031), ('published', 0.031), ('localized', 0.031), ('appeared', 0.03), ('illustrates', 0.029), ('perceived', 0.029), ('lowest', 0.028), ('okada', 0.028), ('harmony', 0.028), ('nonlinear', 0.028), ('position', 0.028), ('functional', 0.028), ('retinal', 0.027), ('gabor', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

2 0.19617154 150 nips-2012-Hierarchical spike coding of sound

Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli

Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1

3 0.16895367 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

4 0.16895367 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

5 0.12200215 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

Author: Jaldert Rombouts, Pieter Roelfsema, Sander M. Bohte

Abstract: A key function of brains is undoubtedly the abstraction and maintenance of information from the environment for later use. Neurons in association cortex play an important role in this process: by learning these neurons become tuned to relevant features and represent the information that is required later as a persistent elevation of their activity [1]. It is however not well known how such neurons acquire these task-relevant working memories. Here we introduce a biologically plausible learning scheme grounded in Reinforcement Learning (RL) theory [2] that explains how neurons become selective for relevant information by trial and error learning. The model has memory units which learn useful internal state representations to solve working memory tasks by transforming partially observable Markov decision problems (POMDP) into MDPs. We propose that synaptic plasticity is guided by a combination of attentional feedback signals from the action selection stage to earlier processing levels and a globally released neuromodulatory signal. Feedback signals interact with feedforward signals to form synaptic tags at those connections that are responsible for the stimulus-response mapping. The neuromodulatory signal interacts with tagged synapses to determine the sign and strength of plasticity. The learning scheme is generic because it can train networks in different tasks, simply by varying inputs and rewards. It explains how neurons in association cortex learn to 1) temporarily store task-relevant information in non-linear stimulus-response mapping tasks [1, 3, 4] and 2) learn to optimally integrate probabilistic evidence for perceptual decision making [5, 6]. 1

6 0.12182651 23 nips-2012-A lattice filter model of the visual pathway

7 0.11514773 195 nips-2012-Learning visual motion in recurrent neural networks

8 0.098553315 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

9 0.097271457 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

10 0.076053567 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

11 0.070575885 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

12 0.068889737 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

13 0.066126332 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

14 0.06430839 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

15 0.062302701 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

16 0.062149651 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

17 0.059614867 65 nips-2012-Cardinality Restricted Boltzmann Machines

18 0.058659863 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

19 0.057755239 193 nips-2012-Learning to Align from Scratch

20 0.056198306 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.132), (1, 0.051), (2, -0.183), (3, 0.08), (4, 0.038), (5, 0.109), (6, -0.007), (7, -0.046), (8, 0.004), (9, -0.013), (10, -0.005), (11, -0.036), (12, -0.149), (13, 0.033), (14, 0.005), (15, -0.089), (16, 0.034), (17, 0.007), (18, 0.008), (19, 0.077), (20, 0.048), (21, 0.041), (22, -0.029), (23, -0.022), (24, 0.02), (25, 0.067), (26, 0.004), (27, -0.115), (28, 0.006), (29, 0.048), (30, 0.086), (31, -0.005), (32, -0.039), (33, -0.006), (34, -0.016), (35, -0.101), (36, 0.088), (37, -0.001), (38, 0.033), (39, 0.025), (40, 0.041), (41, 0.117), (42, -0.021), (43, -0.002), (44, -0.001), (45, 0.11), (46, -0.038), (47, 0.077), (48, -0.0), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95004511 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

2 0.69866419 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

Abstract: Many visual and auditory neurons have response properties that are well explained by pooling the rectified responses of a set of spatially shifted linear filters. These filters cannot be estimated using spike-triggered averaging (STA). Subspace methods such as spike-triggered covariance (STC) can recover multiple filters, but require substantial amounts of data, and recover an orthogonal basis for the subspace in which the filters reside rather than the filters themselves. Here, we assume a linear-nonlinear–linear-nonlinear (LN-LN) cascade model in which the first linear stage is a set of shifted (‘convolutional’) copies of a common filter, and the first nonlinear stage consists of rectifying scalar nonlinearities that are identical for all filter outputs. We refer to these initial LN elements as the ‘subunits’ of the receptive field. The second linear stage then computes a weighted sum of the responses of the rectified subunits. We present a method for directly fitting this model to spike data, and apply it to both simulated and real neuronal data from primate V1. The subunit model significantly outperforms STA and STC in terms of cross-validated accuracy and efficiency. 1

3 0.67669582 150 nips-2012-Hierarchical spike coding of sound

Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli

Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1

4 0.66387045 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

5 0.66387045 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

6 0.58652461 23 nips-2012-A lattice filter model of the visual pathway

7 0.55619502 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

8 0.53271073 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

9 0.47283709 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

10 0.45302448 193 nips-2012-Learning to Align from Scratch

11 0.44822565 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

12 0.44310743 72 nips-2012-Cocktail Party Processing via Structured Prediction

13 0.43861899 195 nips-2012-Learning visual motion in recurrent neural networks

14 0.4178226 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

15 0.36732614 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

16 0.36079022 65 nips-2012-Cardinality Restricted Boltzmann Machines

17 0.35541505 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

18 0.35399652 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

19 0.35336584 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

20 0.34965321 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.042), (17, 0.042), (21, 0.069), (38, 0.067), (39, 0.016), (42, 0.031), (44, 0.029), (54, 0.02), (55, 0.031), (74, 0.039), (76, 0.096), (80, 0.051), (87, 0.33), (92, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75257689 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

2 0.68025887 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

3 0.6038416 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

Author: Junyuan Xie, Linli Xu, Enhong Chen

Abstract: We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning. 1

4 0.47435579 150 nips-2012-Hierarchical spike coding of sound

Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli

Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1

5 0.43587339 193 nips-2012-Learning to Align from Scratch

Author: Gary Huang, Marwan Mattar, Honglak Lee, Erik G. Learned-miller

Abstract: Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face verification. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Specifically, we incorporate deep learning into the congealing alignment framework. Through deep learning, we obtain features that can represent the image at differing resolutions based on network depth, and that are tuned to the statistics of the specific data being aligned. In addition, we modify the learning algorithm for the restricted Boltzmann machine by incorporating a group sparsity penalty, leading to a topographic organization of the learned filters and improving subsequent alignment results. We apply our method to the Labeled Faces in the Wild database (LFW). Using the aligned images produced by our proposed unsupervised algorithm, we achieve higher accuracy in face verification compared to prior work in both unsupervised and supervised alignment. We also match the accuracy for the best available commercial method. 1

6 0.42992204 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

7 0.42443648 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

8 0.4231475 62 nips-2012-Burn-in, bias, and the rationality of anchoring

9 0.4231475 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

10 0.42156026 195 nips-2012-Learning visual motion in recurrent neural networks

11 0.42098373 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

12 0.42012757 23 nips-2012-A lattice filter model of the visual pathway

13 0.41800949 60 nips-2012-Bayesian nonparametric models for ranked data

14 0.41685534 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

15 0.41623983 148 nips-2012-Hamming Distance Metric Learning

16 0.41528252 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

17 0.41523486 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

18 0.41387102 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

19 0.41292012 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

20 0.41276661 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model