nips nips2000 nips2000-131 knowledge-graph by maker-knowledge-mining

131 nips-2000-The Early Word Catches the Weights


Source: pdf

Author: Mark A. Smith, Garrison W. Cottrell, Karen L. Anderson

Abstract: The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. The cause of differences in naming latency has been the subject of numerous studies. Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. 1992; Morrison & Ellis, 1995). These studies show that frequency and AoA are highly correlated (typically r =-.6) explaining the confound of older studies on frequency. However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). Much like their psychological counterparts, connectionist networks also show very strong frequency effects. However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). Most of these claims are based on the well known fact that connectionist networks exhibit

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The strong correlation between the frequency of words and their naming latency has been well documented. [sent-6, score-0.952]

2 However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. [sent-7, score-0.202]

3 While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. [sent-9, score-0.62]

4 Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. [sent-10, score-0.152]

5 In this work, we explore these claims using a simple feed forward neural network. [sent-12, score-0.048]

6 We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. [sent-13, score-0.679]

7 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. [sent-14, score-0.361]

8 It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. [sent-15, score-0.754]

9 The cause of differences in naming latency has been the subject of numerous studies. [sent-16, score-0.612]

10 Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). [sent-17, score-1.011]

11 More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. [sent-18, score-0.586]

12 Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. [sent-19, score-0.315]

13 These studies show that frequency and AoA are highly correlated (typically r =-. [sent-21, score-0.284]

14 However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). [sent-23, score-0.953]

15 Much like their psychological counterparts, connectionist networks also show very strong frequency effects. [sent-24, score-0.336]

16 However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). [sent-25, score-0.198]

17 Most of these claims are based on the well known fact that connectionist networks exhibit "destructive interference" in which later presented stimuli, in order to be learned, force early learned inputs to become less well represented, effectively increasing their associated errors. [sent-26, score-0.237]

18 However, these effects only occur when training ceases on the early patterns. [sent-27, score-0.131]

19 Continued training on all the patterns mitigates the effects of interference from later patterns. [sent-28, score-0.212]

20 Recently, Ellis & Lambon-Ralph (in press) have shown that when pattern presentation is staged, with one set of patterns initially trained, and a second set added into the training set later, strong AoA effects are found. [sent-29, score-0.245]

21 They show that this result is due to a loss of plasticity in the network units, which tend to get out of the linear range with more training. [sent-30, score-0.03]

22 While this result is not surprising, it is a good model of the fact that some words may not come into existence until late in life, such as "email" for baby boomers. [sent-31, score-0.091]

23 However, they explicitly claim that it is important to stage the learning in this way, and offer no explanation of what happens during early word acquisition, when the surrounding vocabulary is relatively constant, or why and when frequency and AoA show independent effects. [sent-32, score-0.374]

24 In this paper, we present an abstract feed-forward computational model of word acquisition that does not stage inputs. [sent-33, score-0.219]

25 We use this model to examine the effects of frequency and AoA on sum squared error, the usual variable used to model reaction time. [sent-34, score-0.335]

26 We find a consistent contribution of AoA to naming latency, as well as the conditions under which there is an independent contribution from frequency in some tasks. [sent-35, score-0.688]

27 Our first goal was to show that AoA effects could be observed in a connectionist network using the simplest possible model. [sent-37, score-0.216]

28 We did this is such a way that staging the inputs was not necessary: we defined a threshold for the error, after which we would say a pattern has been "acquired. [sent-39, score-0.049]

29 " The AoA is defined to be the epoch during which this threshold is crossed. [sent-40, score-0.091]

30 Since error for a particular pattern may occasionally go up again during online learning, we also measured the last epoch that the pattern went below the threshold for final time. [sent-41, score-0.208]

31 We analyzed our networks using both definitions of acquisition (which we call first acquisition and final acquisition), and have found that the results vary little between these definitions. [sent-42, score-0.3]

32 In what follows, we use first acquisition for simplicity. [sent-43, score-0.109]

33 1 The Model The simplest possible model is an autoencoder network. [sent-45, score-0.018]

34 Using a network architecture of 20-15-20, we trained the network to autoencode 200 patterns of random bits (each bit had a 50% probability of being on or off). [sent-46, score-0.237]

35 For this experiment, we chose the AoA threshold to be 2, indicating an average squared error of . [sent-52, score-0.112]

36 1 per input bit, yielding outputs much closer to the correct output than any other. [sent-53, score-0.023]

37 We calculated Euclidean distances between all outputs and patterns to verify that the input was mapped most closely to the correct output. [sent-54, score-0.088]

38 Training on the entire corpus continued until 98% of all patterns fell below this threshold. [sent-55, score-0.118]

39 2 Results After the network had learned the input corpus, we investigated the relationship between the epoch at which the input vector had been learned and the final sum squared error (equivalent, for us, to "adult" naming latency) for that input vector. [sent-57, score-0.783]

40 The relationship between the age of acquisition of the input vector and its '1'$Iac:q llSlll C)n . [sent-59, score-0.235]

41 ' 1'$Iac:qui sl lon reW""sion final oc;qllS~lon " Inalac:q''''l onmW''''slon ',~-;:,O -=-----;~=-,----C:::---=,OO,------,OO=,-----:~=---;:"OO~~' OO ~ _ '~,----C:::---=~ ,oo ,-----;~=,~=---=oo, ,oo ~=~-----;=-,~ , ~ ro ",~ ~ EpDdlo1 L. [sent-62, score-0.09]

42 AoA final sum squared error is clear: the earlier an input is learned, the lower its final error will be. [sent-93, score-0.239]

43 A more formal analysis of this relationship yields a significant (p « . [sent-94, score-0.075]

44 In order to understand this relationship better, we divided the learned words into five percentile groups depending upon AoA. [sent-97, score-0.147]

45 Figure 2 shows the average SSE for each group plotted over epoch number. [sent-98, score-0.082]

46 The line with the least average SSE corresponds to the earliest acquired quintile while the line with the highest average SSE corresponds to the last acquired quintile. [sent-99, score-0.098]

47 From this graph we can see that the average SSE for earlier learned patterns stays below errors for late learned patterns. [sent-100, score-0.268]

48 This is true from the outset of learning as well as when the error starts to decrease less rapidly as it asymptotically approaches some lowest error limit. [sent-101, score-0.052]

49 We sloganize this result as "the patterns that get to the weights first, win. [sent-102, score-0.065]

50 " 3 Experiment 2: Do AoA effects survive a frequency manipulation? [sent-103, score-0.304]

51 Having displayed that AoA effects are present in connectionist networks, we wanted to investigate the interaction with frequency. [sent-104, score-0.187]

52 We model the frequency distribution of inputs after the known English spoken word frequency in which very few words appear very often while a very large portion of words appear very seldom (Zipf's law). [sent-105, score-0.621]

53 05)10) is presented in Figure 3 (a true version of Zipf's law still shows the result). [sent-110, score-0.025]

54 Here we find again a very strong and significant (p « 0. [sent-115, score-0.111]

55 005) correlation between the age at which an input is learned and its naming latency. [sent-116, score-0.626]

56 The correlation coefficient averaged over 10 runs is 0. [sent-117, score-0.212]

57 Figure 5 shows how the frequency of presentation of a given stimulus correlates with NamU"lgLatoocyvs Ff"'loonc'f ! [sent-120, score-0.315]

58 We find that the best fitting correlation is an exponential one in which naming latency correlates most strongly with the log of the frequency. [sent-149, score-0.796]

59 The correlation coefficient averaged over 10 runs is significant (p « 0. [sent-150, score-0.262]

60 This is a slightly stronger correlation than is found in the literature. [sent-153, score-0.114]

61 Finally, figure 6 shows how frequency and AoA are related. [sent-154, score-0.193]

62 However, this is a much weaker correlation than is found in the literature. [sent-158, score-0.114]

63 Performing a multiple regression with the dependent variable as SSE and the two explaining variables as AoA and log frequency, we find that both AoA and log frequency contribute significantly (p« 0. [sent-159, score-0.327]

64 730, the multiple correlation coefficient averaged over 10 runs is 0. [sent-163, score-0.212]

65 AoA and log frequency each make independent contributions to naming latency. [sent-165, score-0.628]

66 We were encouraged that we found both effects of frequency and AoA on SSE in our model, but were surprised by the small size of the correlation between the two. [sent-166, score-0.416]

67 The naming literature shows a strong correlation between AoA and frequency. [sent-167, score-0.516]

68 However, pilot work with a smaller network showed no frequency effect, which was due to the autoencoding task in a network where the patterns filled 20% of the input space (200 random patterns in a 10-8-10 network, with 1024 patterns possible). [sent-168, score-0.572]

69 This suggests that autoencoding is not an appropriate task to model naming, and would give rise to the low correlation between AoA and frequency. [sent-169, score-0.167]

70 Spelling-sound consistency has been shown to have a significant effect on naming latency (Jared, McRae, & Seidenberg, 1990). [sent-171, score-0.76]

71 Object naming, another task in which AoA effects are found, is a completely arbitrary mapping. [sent-172, score-0.109]

72 Our third experiment looks at the effect that consistency of our mapping task has on AoA and frequency effects. [sent-173, score-0.377]

73 4 Experiment 3: Consistency effects Our model in this experiment is identical to the previous model except for two changes. [sent-174, score-0.127]

74 Second, we found that some patterns would end up with one bit off, leading to a bimodal distribution of SSE's. [sent-176, score-0.159]

75 We thus used cross-entropy error to ensure that all bits would be learned. [sent-177, score-0.073]

76 Eleven levels of consistency were defined; from 100% consistent, or autoencoding; to 0% consistent, or a mapping from one random 20 bit vector to another random 20 bit vector. [sent-178, score-0.241]

77 Note that in a 0% consistent mapping, since each bit as a 50% chance of being on, about 50% of the bits will be the same by chance. [sent-179, score-0.145]

78 Thus an intermediate level of 50% consistency will have on average 75% of the corresponding bits equal. [sent-180, score-0.164]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('aoa', 0.715), ('naming', 0.39), ('sse', 0.26), ('latency', 0.205), ('frequency', 0.193), ('acquisition', 0.109), ('consistency', 0.094), ('word', 0.092), ('effects', 0.092), ('ellis', 0.087), ('morrison', 0.087), ('correlation', 0.085), ('age', 0.078), ('connectionist', 0.076), ('correlates', 0.075), ('studies', 0.071), ('bit', 0.065), ('autoencoding', 0.065), ('patterns', 0.065), ('epoch', 0.059), ('oo', 0.056), ('final', 0.053), ('coefficient', 0.051), ('significant', 0.05), ('learned', 0.05), ('presentation', 0.047), ('bits', 0.047), ('barry', 0.043), ('gerhand', 0.043), ('iac', 0.043), ('zipf', 0.043), ('runs', 0.041), ('strong', 0.041), ('early', 0.039), ('words', 0.038), ('lon', 0.037), ('english', 0.036), ('experiment', 0.035), ('averaged', 0.035), ('percentile', 0.034), ('late', 0.034), ('consistent', 0.033), ('threshold', 0.032), ('interference', 0.031), ('spoken', 0.031), ('claims', 0.031), ('squared', 0.031), ('network', 0.03), ('ff', 0.029), ('found', 0.029), ('corpus', 0.028), ('earlier', 0.027), ('psychological', 0.026), ('acquired', 0.026), ('error', 0.026), ('contribution', 0.026), ('regression', 0.025), ('continued', 0.025), ('law', 0.025), ('relationship', 0.025), ('contributions', 0.024), ('contribute', 0.024), ('later', 0.024), ('average', 0.023), ('explaining', 0.023), ('input', 0.023), ('log', 0.021), ('effect', 0.021), ('correlated', 0.02), ('find', 0.02), ('cottrell', 0.019), ('baby', 0.019), ('went', 0.019), ('qui', 0.019), ('older', 0.019), ('garrison', 0.019), ('occasionally', 0.019), ('paid', 0.019), ('pilot', 0.019), ('destructive', 0.019), ('reaction', 0.019), ('stays', 0.019), ('wanted', 0.019), ('seldom', 0.019), ('survive', 0.019), ('simplest', 0.018), ('stage', 0.018), ('controlled', 0.018), ('differences', 0.017), ('adult', 0.017), ('jolla', 0.017), ('encouraged', 0.017), ('feed', 0.017), ('utterance', 0.017), ('mapping', 0.017), ('inputs', 0.017), ('task', 0.017), ('manipulation', 0.016), ('vocabulary', 0.016), ('surrounding', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 131 nips-2000-The Early Word Catches the Weights

Author: Mark A. Smith, Garrison W. Cottrell, Karen L. Anderson

Abstract: The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. The cause of differences in naming latency has been the subject of numerous studies. Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. 1992; Morrison & Ellis, 1995). These studies show that frequency and AoA are highly correlated (typically r =-.6) explaining the confound of older studies on frequency. However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). Much like their psychological counterparts, connectionist networks also show very strong frequency effects. However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). Most of these claims are based on the well known fact that connectionist networks exhibit

2 0.084823802 6 nips-2000-A Neural Probabilistic Language Model

Author: Yoshua Bengio, Réjean Ducharme, Pascal Vincent

Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a state-of-the-art trigram model.

3 0.060481768 124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks

Author: Silvia Scarpetta, Zhaoping Li, John A. Hertz

Abstract: We apply to oscillatory networks a class of learning rules in which synaptic weights change proportional to pre- and post-synaptic activity, with a kernel A(r) measuring the effect for a postsynaptic spike a time r after the presynaptic one. The resulting synaptic matrices have an outer-product form in which the oscillating patterns are represented as complex vectors. In a simple model, the even part of A(r) enhances the resonant response to learned stimulus by reducing the effective damping, while the odd part determines the frequency of oscillation. We relate our model to the olfactory cortex and hippocampus and their presumed roles in forming associative memories and input representations. 1

4 0.047918823 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

Author: Hervé Bourlard, Samy Bengio, Katrin Weber

Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)

5 0.045219682 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

Author: Lawrence K. Saul, Jont B. Allen

Abstract: An eigenvalue method is developed for analyzing periodic structure in speech. Signals are analyzed by a matrix diagonalization reminiscent of methods for principal component analysis (PCA) and independent component analysis (ICA). Our method-called periodic component analysis (1l

6 0.043276858 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System

7 0.040735051 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

8 0.039355382 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

9 0.036127895 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

10 0.034443427 129 nips-2000-Temporally Dependent Plasticity: An Information Theoretic Account

11 0.03321914 141 nips-2000-Universality and Individuality in a Neural Code

12 0.03292935 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization

13 0.030572843 34 nips-2000-Competition and Arbors in Ocular Dominance

14 0.029702799 130 nips-2000-Text Classification using String Kernels

15 0.026741957 96 nips-2000-One Microphone Source Separation

16 0.026652928 146 nips-2000-What Can a Single Neuron Compute?

17 0.026267821 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code

18 0.026208047 66 nips-2000-Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex

19 0.026145931 42 nips-2000-Divisive and Subtractive Mask Effects: Linking Psychophysics and Biophysics

20 0.026019912 102 nips-2000-Position Variance, Recurrence and Perceptual Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.093), (1, -0.054), (2, -0.022), (3, 0.017), (4, -0.012), (5, -0.026), (6, -0.05), (7, -0.015), (8, 0.027), (9, 0.069), (10, 0.055), (11, 0.018), (12, 0.061), (13, 0.033), (14, 0.089), (15, 0.069), (16, 0.055), (17, 0.001), (18, 0.1), (19, 0.007), (20, 0.107), (21, -0.049), (22, -0.03), (23, -0.088), (24, 0.057), (25, -0.071), (26, 0.087), (27, 0.038), (28, 0.054), (29, 0.025), (30, 0.051), (31, -0.147), (32, -0.004), (33, -0.099), (34, -0.032), (35, 0.128), (36, -0.266), (37, 0.043), (38, 0.067), (39, 0.058), (40, -0.08), (41, 0.249), (42, 0.131), (43, 0.152), (44, 0.082), (45, -0.08), (46, 0.093), (47, -0.205), (48, 0.004), (49, -0.105)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9623729 131 nips-2000-The Early Word Catches the Weights

Author: Mark A. Smith, Garrison W. Cottrell, Karen L. Anderson

Abstract: The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. The cause of differences in naming latency has been the subject of numerous studies. Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. 1992; Morrison & Ellis, 1995). These studies show that frequency and AoA are highly correlated (typically r =-.6) explaining the confound of older studies on frequency. However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). Much like their psychological counterparts, connectionist networks also show very strong frequency effects. However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). Most of these claims are based on the well known fact that connectionist networks exhibit

2 0.51835138 6 nips-2000-A Neural Probabilistic Language Model

Author: Yoshua Bengio, Réjean Ducharme, Pascal Vincent

Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a state-of-the-art trigram model.

3 0.40690234 124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks

Author: Silvia Scarpetta, Zhaoping Li, John A. Hertz

Abstract: We apply to oscillatory networks a class of learning rules in which synaptic weights change proportional to pre- and post-synaptic activity, with a kernel A(r) measuring the effect for a postsynaptic spike a time r after the presynaptic one. The resulting synaptic matrices have an outer-product form in which the oscillating patterns are represented as complex vectors. In a simple model, the even part of A(r) enhances the resonant response to learned stimulus by reducing the effective damping, while the odd part determines the frequency of oscillation. We relate our model to the olfactory cortex and hippocampus and their presumed roles in forming associative memories and input representations. 1

4 0.36265224 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

Abstract: In this work, we introduce an Interactive Parts (IP) model as an alternative to Hidden Markov Models (HMMs). We t ested both models on a database of on-line cursive script. We show that implementations of HMMs and the IP model, in which all letters are assumed to have the same average width , give comparable results. However , in contrast to HMMs, the IP model can handle duration modeling without an increase in computational complexity. 1

5 0.35612148 34 nips-2000-Competition and Arbors in Ocular Dominance

Author: Peter Dayan

Abstract: Hebbian and competitive Hebbian algorithms are almost ubiquitous in modeling pattern formation in cortical development. We analyse in theoretical detail a particular model (adapted from Piepenbrock & Obermayer, 1999) for the development of Id stripe-like patterns, which places competitive and interactive cortical influences, and free and restricted initial arborisation onto a common footing.

6 0.35157442 66 nips-2000-Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex

7 0.33136746 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

8 0.26774907 42 nips-2000-Divisive and Subtractive Mask Effects: Linking Psychophysics and Biophysics

9 0.25603697 132 nips-2000-The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving

10 0.22412901 25 nips-2000-Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators

11 0.21956599 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

12 0.20561863 40 nips-2000-Dendritic Compartmentalization Could Underlie Competition and Attentional Biasing of Simultaneous Visual Stimuli

13 0.19223224 141 nips-2000-Universality and Individuality in a Neural Code

14 0.18675944 127 nips-2000-Structure Learning in Human Causal Induction

15 0.18617505 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization

16 0.18328951 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

17 0.18153132 57 nips-2000-Four-legged Walking Gait Control Using a Neuromorphic Chip Interfaced to a Support Vector Learning Algorithm

18 0.17870219 32 nips-2000-Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes

19 0.17850339 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System

20 0.17829636 61 nips-2000-Generalizable Singular Value Decomposition for Ill-posed Datasets


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.023), (12, 0.365), (17, 0.077), (33, 0.069), (42, 0.028), (55, 0.044), (62, 0.039), (65, 0.016), (67, 0.038), (76, 0.032), (81, 0.087), (90, 0.014), (91, 0.011), (99, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77632648 131 nips-2000-The Early Word Catches the Weights

Author: Mark A. Smith, Garrison W. Cottrell, Karen L. Anderson

Abstract: The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. The cause of differences in naming latency has been the subject of numerous studies. Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. 1992; Morrison & Ellis, 1995). These studies show that frequency and AoA are highly correlated (typically r =-.6) explaining the confound of older studies on frequency. However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). Much like their psychological counterparts, connectionist networks also show very strong frequency effects. However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). Most of these claims are based on the well known fact that connectionist networks exhibit

2 0.60990655 51 nips-2000-Factored Semi-Tied Covariance Matrices

Author: Mark J. F. Gales

Abstract: A new form of covariance modelling for Gaussian mixture models and hidden Markov models is presented. This is an extension to an efficient form of covariance modelling used in speech recognition, semi-tied covariance matrices. In the standard form of semi-tied covariance matrices the covariance matrix is decomposed into a highly shared decorrelating transform and a component-specific diagonal covariance matrix. The use of a factored decorrelating transform is presented in this paper. This factoring effectively increases the number of possible transforms without increasing the number of free parameters. Maximum likelihood estimation schemes for all the model parameters are presented including the component/transform assignment, transform and component parameters. This new model form is evaluated on a large vocabulary speech recognition task. It is shown that using this factored form of covariance modelling reduces the word error rate.

3 0.35985297 103 nips-2000-Probabilistic Semantic Video Indexing

Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang

Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1

4 0.34941065 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Author: Thomas Natschläger, Wolfgang Maass, Eduardo D. Sontag, Anthony M. Zador

Abstract: Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their

5 0.33972237 137 nips-2000-The Unscented Particle Filter

Author: Rudolph van der Merwe, Arnaud Doucet, Nando de Freitas, Eric A. Wan

Abstract: In this paper, we propose a new particle filter based on sequential importance sampling. The algorithm uses a bank of unscented filters to obtain the importance proposal distribution. This proposal has two very

6 0.33955625 141 nips-2000-Universality and Individuality in a Neural Code

7 0.33689186 146 nips-2000-What Can a Single Neuron Compute?

8 0.33031672 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code

9 0.32769322 55 nips-2000-Finding the Key to a Synapse

10 0.32191268 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

11 0.32084045 66 nips-2000-Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex

12 0.31438959 43 nips-2000-Dopamine Bonuses

13 0.31237191 49 nips-2000-Explaining Away in Weight Space

14 0.31060323 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

15 0.31043768 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System

16 0.3091673 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

17 0.30812266 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

18 0.30645207 125 nips-2000-Stability and Noise in Biochemical Switches

19 0.30617723 80 nips-2000-Learning Switching Linear Models of Human Motion

20 0.30389968 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure