nips nips2001 nips2001-52 knowledge-graph by maker-knowledge-mining

52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

Source: pdf

Author: M. Schmitt

Abstract: Recurrent neural networks of analog units are computers for realvalued functions. We study the time complexity of real computation in general recurrent neural networks. These have sigmoidal, linear, and product units of unlimited order as nodes and no restrictions on the weights. For networks operating in discrete time, we exhibit a family of functions with arbitrarily high complexity, and we derive almost tight bounds on the time required to compute these functions. Thus, evidence is given of the computational limitations that time-bounded analog recurrent neural networks are subject to. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Recurrent neural networks of analog units are computers for realvalued functions. [sent-3, score-0.642]

2 We study the time complexity of real computation in general recurrent neural networks. [sent-4, score-0.796]

3 These have sigmoidal, linear, and product units of unlimited order as nodes and no restrictions on the weights. [sent-5, score-0.637]

4 For networks operating in discrete time, we exhibit a family of functions with arbitrarily high complexity, and we derive almost tight bounds on the time required to compute these functions. [sent-6, score-0.466]

5 Thus, evidence is given of the computational limitations that time-bounded analog recurrent neural networks are subject to. [sent-7, score-0.804]

6 1 Introduction Analog recurrent neural networks are known to have computational capabilities that exceed those of classical Turing machines (see, e. [sent-8, score-0.635]

7 Among the rare results in this direction, for instance, is the one of Sima and Orponen (2001) showing that continuous-time Hopfield networks may require exponential time before converging to a stable state. [sent-12, score-0.245]

8 This bound, however, is expressed in terms of the size of the network and, hence, does not apply to fixed-size networks with a given number of nodes. [sent-13, score-0.365]

9 Other bounds on the computational power of analog recurrent networks have been established by Maass and Orponen (1998) and Maass and Sontag (1999). [sent-14, score-0.899]

10 They show that discretetime recurrent neural networks recognize only a subset of the regular languages in the presence of noise. [sent-15, score-0.7]

11 This model of computation in recurrent networks, however, receives its inputs as sequences. [sent-16, score-0.539]

12 Therefore, computing time is not an issue since the network halts when the input sequence terminates. [sent-17, score-0.402]

13 Analog recurrent neural networks, however, can also be run as "real" computers that get as input a vector of real numbers and, after computing for a while, yield a real output value. [sent-18, score-0.904]

14 No results are available thus far regarding the time complexity of analog recurrent neural networks with given size. [sent-19, score-0.908]

15 We investigate here the time complexity of discrete-time recurrent neural networks that compute functions over the reals. [sent-20, score-0.815]

16 As network nodes we allow sigmoidal units, linear units, and product units- that is, monomials where the exponents are ad- justable weights (Durbin and Rumelhart, 1989) . [sent-21, score-0.919]

17 We study the complexity of real computation in the sense of Blum et aI. [sent-22, score-0.222]

18 With such a general type of network, the question arises which functions can be computed with a given number of nodes and a limited amount of time. [sent-29, score-0.342]

19 In the following , we exhibit a family of real-valued functions ft, l 2: 1, in one variable that is computed by some fixed size network in time O(l). [sent-30, score-0.362]

20 Our main result is, then, showing that every recurrent neural network computing the functions ft requires at least time nW /4). [sent-31, score-1.047]

21 Thus, we obtain almost tight time bounds for real computation in recurrent neural networks. [sent-32, score-0.881]

22 2 Analog Computation in Recurrent Neural Networks We study a very comprehensive type of discrete-time recurrent neural network that we call general recurrent neural network (see Figure 1). [sent-33, score-1.419]

23 For every k, n E N there is a recurrent neural architecture consisting of k computation nodes YI , . [sent-34, score-0.968]

24 The size of a network is defined to be the number ofits computation nodes. [sent-41, score-0.386]

25 The computation nodes form a fully connected recurrent network. [sent-42, score-0.807]

26 Every computation node also receives connections from every input node. [sent-43, score-0.401]

27 The input nodes play the role of the input variables of the system. [sent-44, score-0.406]

28 There are three types of computation nodes: product units, sigmoidal units, and linear units. [sent-46, score-0.586]

29 Assume that computation node i has connections from computation nodes weighted by Wil, . [sent-47, score-0.691]

30 ,X n (t) be the values of the computation nodes and input nodes at time t, respectively. [sent-60, score-0.782]

31 If node i is a product unit, it computes at time t + 1 the value (1) that is, after weighting them exponentially, the incoming values are multiplied. [sent-61, score-0.335]

32 Sigmoidal and linear units have an additional parameter associated with them, the threshold or bias ()i . [sent-62, score-0.244]

33 A sigmoidal unit computes the value where (J is the standard sigmoid (J( z ) = 1/ (1 simply outputs the weighted sum + e- Z ). [sent-63, score-0.493]

34 If node i is a linear unit, it We allow the networks to be heterogeneous, that is, they may contain all three types of computation nodes simultaneously. [sent-64, score-0.688]

35 For instance, architectures have been proposed that include a second layer of linear computation nodes which have no recurrent connections to computation nodes but serve as output nodes (see, e. [sent-66, score-1.683]

36 It is clear that in the definition given here, the linear units can function as these output nodes if the weights of the outgoing connections are set to O. [sent-69, score-0.668]

37 Also very common is the use of sigmoidal units with higher-order as computation nodes in recurrent networks (see, e. [sent-70, score-1.458]

38 Obviously, the model here includes these higher-order networks as a special case since the computation of a higher-order sigmoidal unit can be simulated by first computing the higher-order terms using product units and then passing their . [sent-74, score-1.022]

39 I I sigmoidal, product, and linear units computation nodes . [sent-75, score-0.633]

40 Yl Yk t input nodes Xl Xn I Figure 1: A general recurrent neural network of size k. [sent-76, score-1.078]

41 Product units , however, are even more powerful than higher-order terms since they allow to perform division operations using negative weights. [sent-79, score-0.216]

42 Moreover, if a negative input value is weighted by a non-integer weight, the output of a product unit may be a complex number. [sent-80, score-0.338]

43 Since we are mainly interested in lower bounds, however, these bounds obviously remain valid if the computations of the networks are extended to the complex domain. [sent-82, score-0.354]

44 We now define what it means that a recurrent neural network N computes a function f : ~n --+ llt Assume that N has n input nodes and let x E ~n. [sent-83, score-1.079]

45 Given tE N, we say that N computes f(x) in t steps if after initializing at time 0 the input nodes with x and the computation nodes with some fixed values, and performing t computation steps as defined in Equations (1) , (2) , and (3) , one of the computation nodes yields the value f(x). [sent-84, score-1.396]

46 We assume that the input nodes remain unchanged during the computation. [sent-85, score-0.366]

47 We further say that N computes f in time t if for every x E ~n , network N computes f in at most t steps. [sent-86, score-0.427]

48 We emphasize that this is a very general definition of analog computation in recurrent neural networks. [sent-88, score-0.838]

49 In particular, we do not specify any definite output node but allow the output to occur at any node. [sent-89, score-0.259]

50 Moreover, it is not even required that the network reaches a stable state, as with attractor or Hopfield networks. [sent-90, score-0.234]

51 It is sufficient that the output value appears at some point of the trajectory the network performs. [sent-91, score-0.266]

52 A similar view of computation in recurrent networks is captured in a model proposed by Maass et al. [sent-92, score-0.681]

53 Clearly, the lower bounds remain valid for more restrictive definitions of analog computation that require output nodes or stable states. [sent-94, score-0.896]

54 Moreover, they hold for architectures that have no input nodes but receive their inputs as initial values of the computation nodes. [sent-95, score-0.482]

55 Thus, the bounds serve as lower bounds also for the transition times between real-valued states of discrete-time dynamical systems comprising the networks considered here. [sent-96, score-0.454]

56 A class :F of functions mapping ~n to {O, I} is said to shatter S if for every dichotomy (So , Sd of S there is some f E :F that satisfies f(So) ~ {O} and f(S1) ~ {I}. [sent-99, score-0.248]

57 The Vapnik-Chervonenkis (VC) dimension of :F is defined as 4"'+4",IL 'I -1---Y-2----Y-5~1 S~ output Y5 Y4 Figure 2: A recurrent neural network computing the functions fl in time 2l + 1. [sent-100, score-1.218]

58 A neural network given in terms of an architecture represents a class of functions obtained by assigning real numbers to all its adjustable parameters, that is, weights and thresholds or a subset thereof. [sent-102, score-0.498]

59 The output of the network is assumed to be thresholded at some fixed constant so that the output values are binary. [sent-103, score-0.345]

60 The VC dimension of a neural network is then defined as the VC dimension of the class of functions computed by this network. [sent-104, score-0.501]

61 In deriving lower bounds in the next section, we make use of the following result on networks with product and sigmoidal units that has been previously established (Schmitt, 2002). [sent-105, score-0.973]

62 We emphasize that the only constraint on the parameters of the product units is that they yield real-valued, that is, not complex-valued, functions. [sent-106, score-0.362]

63 This means further that the statement holds for networks of arbitrary order, that is, it does not impose any restrictions on the magnitude of the weights of the product units. [sent-107, score-0.322]

64 (Schmitt, 2002, Theorem 2) Suppose N is a feedforward neural network consisting of sigmoidal, product, and linear units. [sent-109, score-0.366]

65 3 Bounds on Computing Time We establish bounds on the time required by recurrent neural networks for computing a family of functions fl : JR -+ JR, l 2:: 1, where l can be considered as a measure of the complexity of fl. [sent-112, score-1.179]

66 We observe that there is a single recurrent network capable of computing every fl in time O(l). [sent-114, score-0.959]

67 There is a general recurrent neural network that computes fl in time 2l + 1 for every l. [sent-116, score-1.031]

68 All computation nodes are initialized with 0, except Yl, which starts with 1 and outputs 0 during all following steps. [sent-120, score-0.422]

69 Clearly, the value fl (x) results at node Y5 after 2l + 1 steps. [sent-123, score-0.249]

70 D The network used for computing fl requires only linear and second-order units. [sent-124, score-0.453]

71 Moreover, the lower bound holds for networks of unrestricted order and with sigmoidal units. [sent-126, score-0.54]

72 Every general recurrent neural network of size k requires at least time cl l / 4 j k to compute function fl' where c> 0 is some constant. [sent-128, score-0.867]

73 Such a network will consist of linear and product units and hypothetical units that compute functions fJ for certain values of j. [sent-131, score-0.874]

74 Assuming that the hypothetical units can be replaced by time-bounded general recurrent networks, we determine an upper bound on the VC dimension of the resulting networks in terms of size and computing time using an idea from Koiran and Sontag (1998) and Proposition 1. [sent-133, score-1.197]

75 The comparison of the lower and upper VC dimension bounds will give an estimate of the time required for computing k Network Nt, shown in Figure 3, is a feedforward network composed of three networks • r(1) • r(2) . [sent-134, score-0.79]

76 The computation nodes are defined as follows (omitting time parameter t): for J. [sent-145, score-0.487]

77 L - 1 , 2 , 3 • The nodes Yb/1) can be considered as additional input nodes for N//1), where N;(3) gets this input from w, and N;(/1) from N;(/1+l) for J. [sent-160, score-0.674]

78 Node Y~r~l is the output node of N;(/1), and node Y~~~l is also the output node of Nt. [sent-162, score-0.461]

79 Thus, the entire network has 3l + 6 nodes that are linear or product units and 3l nodes that compute functions h, fl' or f12. [sent-163, score-1.159]

80 Every dichotomy of S can be programmed into the network parameter w using the following fact about the logistic function ¢ (see Koiran and Sontag, 1998, Lemma 2): For every binary vector b E {O, l}m, b = b1 . [sent-174, score-0.335]

81 1 2(w))), this is the value computed by Ni on input (eill ei2' ei3), where ei" is the input given to network N;(p). [sent-183, score-0.325]

82 Assume now that Ii can be computed by a general recurrent neural network of size at most kj in time tj. [sent-186, score-0.797]

83 Using an idea of Koiran and Sontag (1998), we unfold the network to obtain a feedforward network of size at most kjtj computing fj. [sent-187, score-0.55]

84 Thus we can replace the nodes computing ft, ft, fl2 in Nz by networks of size k1t1, kltl, k12t12, respectively, such that we have a feedforward network consisting of sigmoidal, product, and linear units. [sent-188, score-0.827]

85 Since there are 3l units in Nl computing ft, ft, or fl2 and at most 3l + 6 product and linear units, the size of Nt is at most c1lkl2tl2 for some constant C1 > O. [sent-189, score-0.486]

86 J Lemma 2 shows that a single recurrent network is capable of computing every function fl in time O(l). [sent-194, score-0.959]

87 Every general recurrent neural network requires at least time 0(ll /4 ) to compute the functions fl. [sent-197, score-0.837]

88 4 Conclusions and Perspectives We have established bounds on the computing time of analog recurrent neural networks. [sent-198, score-0.978]

89 The result shows that for every network of given size there are functions of arbitrarily high time complexity. [sent-199, score-0.388]

90 We have derived upper and lower bounds that are rather tight- with a polynomial gap of order four- and hold for the computation of a specific family of real-valued functions in one variable. [sent-201, score-0.424]

91 Interestingly, the upper bound is shown using second-order networks without sigmoidal units, whereas the lower bound is valid even for networks with sigmoidal units and arbitrary product units. [sent-202, score-1.44]

92 This indicates that adding these units might decrease the computing time only marginally. [sent-203, score-0.362]

93 The derivation made use of an upper bound on the VC dimension of higher-order sigmoidal networks. [sent-204, score-0.472]

94 We have focussed on product and sigmoidal units as nonlinear computing elements. [sent-207, score-0.715]

95 The questions whether such results can be obtained for continuous-time networks and for networks operating in the domain of complex numbers, are challenging. [sent-210, score-0.284]

96 A further assumption made here is that the networks compute the functions exactly. [sent-211, score-0.218]

97 By a more detailed analysis and using the fact that the shattering of sets requires the outputs only to lie below or above some threshold, similar results can be obtained for networks that approximate the functions more or less closely and for networks that are subject to noise. [sent-212, score-0.366]

98 Stable encoding of finite state machines in discrete-time recurrent neural nets with sigmoid units. [sent-245, score-0.554]

99 Real-time computing without stable states: A new framework for neural computation based on perturbations. [sent-278, score-0.333]

100 On the effect of analog noise in discrete-time analog computations. [sent-283, score-0.338]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('recurrent', 0.418), ('sigmoidal', 0.293), ('nodes', 0.268), ('sontag', 0.229), ('siegelmann', 0.221), ('units', 0.216), ('network', 0.187), ('vc', 0.186), ('analog', 0.169), ('fl', 0.148), ('networks', 0.142), ('maass', 0.132), ('koiran', 0.127), ('computation', 0.121), ('bounds', 0.119), ('nt', 0.117), ('product', 0.116), ('ft', 0.112), ('gavalda', 0.102), ('jvi', 0.102), ('orponen', 0.102), ('schmitt', 0.102), ('node', 0.101), ('computing', 0.09), ('dichotomy', 0.088), ('adjustable', 0.08), ('output', 0.079), ('neural', 0.075), ('dimension', 0.074), ('input', 0.069), ('bound', 0.069), ('hopfield', 0.066), ('computes', 0.062), ('every', 0.06), ('time', 0.056), ('sd', 0.056), ('real', 0.053), ('established', 0.051), ('ni', 0.051), ('balcazar', 0.051), ('kilian', 0.051), ('mathematik', 0.051), ('shatter', 0.051), ('shattered', 0.051), ('sima', 0.051), ('connections', 0.05), ('feedforward', 0.05), ('functions', 0.049), ('ei', 0.048), ('complexity', 0.048), ('stable', 0.047), ('bochum', 0.044), ('carrasco', 0.044), ('nz', 0.044), ('unit', 0.044), ('cl', 0.043), ('defined', 0.042), ('yl', 0.042), ('proposition', 0.042), ('turing', 0.04), ('computers', 0.04), ('haykin', 0.04), ('tight', 0.039), ('serve', 0.038), ('recognize', 0.037), ('durbin', 0.037), ('restrictions', 0.037), ('rumelhart', 0.037), ('lower', 0.036), ('size', 0.036), ('upper', 0.036), ('xl', 0.036), ('lemma', 0.035), ('hypothetical', 0.035), ('family', 0.034), ('comprehensive', 0.034), ('anthony', 0.034), ('outputs', 0.033), ('giles', 0.032), ('moreover', 0.032), ('sigmoid', 0.031), ('weighted', 0.03), ('nets', 0.03), ('emphasize', 0.03), ('blum', 0.03), ('remain', 0.029), ('gap', 0.029), ('jr', 0.029), ('types', 0.028), ('linear', 0.028), ('regular', 0.028), ('yk', 0.028), ('valid', 0.028), ('numbers', 0.027), ('weights', 0.027), ('compute', 0.027), ('bi', 0.026), ('consisting', 0.026), ('general', 0.025), ('architectures', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

Author: M. Schmitt

2 0.12291737 1 nips-2001-(Not) Bounding the True Error

Author: John Langford, Rich Caruana

Abstract: We present a new approach to bounding the true error rate of a continuous valued classiﬁer based upon PAC-Bayes bounds. The method ﬁrst constructs a distribution over classiﬁers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classiﬁer found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound. In this paper we demonstrate the method on artiﬁcial neural networks with results of a order of magnitude improvement vs. the best deterministic neural net bounds. £ ¡ ¤¢

3 0.10818411 119 nips-2001-Means, Correlations and Bounds

Author: Martijn Leisink, Bert Kappen

Abstract: The partition function for a Boltzmann machine can be bounded from above and below. We can use this to bound the means and the correlations. For networks with small weights, the values of these statistics can be restricted to non-trivial regions (i.e. a subset of [-1 , 1]). Experimental results show that reasonable bounding occurs for weight sizes where mean field expansions generally give good results. 1

4 0.10784999 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

Author: Roman Genov, Gert Cauwenberghs

Abstract: A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efﬁcient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5 m CMOS. ¢

5 0.10490558 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

Author: Bram Bakker

Abstract: This paper presents reinforcement learning with a Long ShortTerm Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task. 1

6 0.10032495 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

7 0.099650472 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network

8 0.09265995 169 nips-2001-Small-World Phenomena and the Dynamics of Information

9 0.091614142 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

10 0.08464665 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network

11 0.082024284 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets

12 0.079546519 37 nips-2001-Associative memory in realistic neuronal networks

13 0.078774087 80 nips-2001-Generalizable Relational Binding from Coarse-coded Distributed Representations

14 0.071158953 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

15 0.07096561 8 nips-2001-A General Greedy Approximation Algorithm with Applications

16 0.07056208 118 nips-2001-Matching Free Trees with Replicator Equations

17 0.068709753 91 nips-2001-Improvisation and Learning

18 0.06657017 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

19 0.066514902 115 nips-2001-Linear-time inference in Hierarchical HMMs

20 0.063573405 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.185), (1, -0.067), (2, 0.002), (3, 0.015), (4, 0.015), (5, -0.096), (6, -0.004), (7, -0.007), (8, -0.126), (9, -0.041), (10, -0.08), (11, 0.097), (12, 0.099), (13, 0.07), (14, -0.009), (15, 0.264), (16, -0.006), (17, -0.089), (18, 0.146), (19, -0.103), (20, -0.079), (21, 0.062), (22, 0.077), (23, 0.062), (24, -0.166), (25, 0.157), (26, 0.059), (27, 0.014), (28, 0.088), (29, 0.079), (30, 0.053), (31, -0.092), (32, -0.115), (33, -0.131), (34, -0.025), (35, 0.049), (36, -0.06), (37, -0.141), (38, -0.131), (39, -0.155), (40, -0.027), (41, -0.076), (42, 0.089), (43, 0.02), (44, 0.066), (45, -0.04), (46, -0.028), (47, 0.015), (48, 0.008), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98477721 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

Author: M. Schmitt

2 0.58258307 91 nips-2001-Improvisation and Learning

Author: Judy A. Franklin

Abstract: This article presents a 2-phase computational learning model and application. As a demonstration, a system has been built, called CHIME for Computer Human Interacting Musical Entity. In phase 1 of training, recurrent back-propagation trains the machine to reproduce 3 jazz melodies. The recurrent network is expanded and is further trained in phase 2 with a reinforcement learning algorithm and a critique produced by a set of basic rules for jazz improvisation. After each phase CHIME can interactively improvise with a human in real time. 1 Foundations Jazz improvisation is the creation of a jazz melody in real time. Charlie Parker, Dizzy Gillespie, Miles Davis, John Coltrane, Charles Mingus, Thelonious Monk, and Sonny Rollins et al. were the founders of bebop and post bop jazz [9] where drummers, bassists, and pianists keep the beat and maintain harmonic structure. Other players improvise over this structure and even take turns improvising for 4 bars at a time. This is called trading fours. Meanwhile, artiﬁcial neural networks have been used in computer music [4, 12]. In particular, the work of (Todd [11]) is the basis for phase 1 of CHIME, a novice machine improvisor that learns to trade fours. Firstly, a recurrent network is trained with back-propagation to play three jazz melodies by Sonny Rollins [1], as described in Section 2. Phase 2 uses actor-critic reinforcement learning and is described in Section 3. This section is on jazz basics. 1.1 Basics: Chords, the ii-V-I Chord Progression and Scales The harmonic structure mentioned above is a series of chords that may be reprated and that are often grouped into standard subsequences. A chord is a group of notes played simultaneously. In the chromatic scale, C-Db-D-Eb-E-F-Gb-G-Ab-A-Bb-B-C, notes are separated by a half step. A ﬂat (b) note is a half step below the original note; a sharp (#) is a half above. Two half steps are a whole step. Two whole steps are a major third. Three half steps are a minor third. A major triad (chord) is the ﬁrst or tonic note, then the note a major third up, then the note a minor third up. When F is the tonic, F major triad is F-A-C. A minor triad (chord) is the tonic ¡ www.cs.smith.edu/˜jfrankli then a minor third, then a major third. F minor triad is F-Ab-C. The diminished triad is the tonic, then a minor third, then a minor third. F diminished triad is F-Ab-Cb. An augmented triad is the tonic, then a major third, then a major third. The F augmented triad is F-A-Db. A third added to the top of a triad forms a seventh chord. A major triad plus a major third is the major seventh chord. F-A-C-E is the F major seventh chord (Fmaj7). A minor triad plus a minor third is a minor seventh chord. For F it is F-Ab-C-Eb (Fm7). A major triad plus a minor third is a dominant seventh chord. For F it is F-A-C-Eb (F7). These three types of chords are used heavily in jazz harmony. Notice that each note in the chromatic scales can be the tonic note for any of these types of chords. A scale, a subset of the chromatic scale, is characterized by note intervals. Let W be a whole step and H be a half. The chromatic scale is HHHHHHHHHHHH. The major scale or ionian mode is WWHWWWH. F major scale is F-G-A-Bb-C-D-E-F. The notes in a scale are degrees; E is the seventh degree of F major. The ﬁrst, third, ﬁfth, and seventh notes of a major scale are the major seventh chord. The ﬁrst, third, ﬁfth, and seventh notes of other modes produce the minor seventh and dominant seventh chords. Roman numerals represent scale degrees and their seventh chords. Upper case implies major or dominant seventh and lower case implies minor seventh [9]. The major seventh chord starting at the scale tonic is the I (one) chord. G is the second degree of F major, and G-Bb-D-F is Gm7, the ii chord, with respect to F. The ii-V-I progression is prevalent in jazz [9], and for F it is Gm7-C7-Fmaj7. The minor ii-V-i progression is obtained using diminished and augmented triads, their seventh chords, and the aeolian mode. Seventh chords can be extended by adding major or minor thirds, e.g. Fmaj9, Fmaj11, Fmaj13, Gm9, Gm11, and Gm13. Any extension can be raised or lowered by 1 step [9] to obtain, e.g. Fmaj7#11, C7#9, C7b9, C7#11. Most jazz compositions are either the 12 bar blues or sectional forms (e.g. ABAB, ABAC, or AABA) [8]. The 3 Rollins songs are 12 bar blues. “Blue 7” has a simple blues form. In “Solid” and “Tenor Madness”, Rollins adds bebop variations to the blues form [1]. ii-V-I and VI-II-V-I progressions are added and G7+9 substitutes for the VI and F7+9 for the V (see section 1.2 below); the II-V in the last bar provides the turnaround to the I of the ﬁrst bar to foster smooth repetition of the form. The result is at left and in Roman numeral notation Bb7 Bb7 Bb7 Bb7 I I I I Eb7 Eb7 Bb7 G7+9 IV IV I VI at right: Cm7 F7 Bb7 G7+9 C7 F7+9 ii V I VI II V 1.2 Scale Substitutions and Rules for Reinforcement Learning First note that the theory and rules derived in this subsection are used in Phase 2, to be described in Section 3. They are presented here since they derive from the jazz basics immediately preceding. One way a novice improvisor can play is to associate one scale with each chord and choose notes from that scale when the chord is presented in the musical score. Therefore, Rule 1 is that an improvisor may choose notes from a “standard” scale associated with a chord. Next, the 4th degree of the scale is often avoided on a major or dominant seventh chord (Rule 3), unless the player can resolve its dissonance. The major 7th is an avoid note on a dominant seventh chord (Rule 4) since a dominant seventh chord and its scale contain the ﬂat 7th, not the major 7th. Rule 2 contains many notes that can be added. A brief rationale is given next. The C7 in Gm7-C7-Fmaj7 may be replaced by a C7#11, a C7+ chord, or a C7b9b5 or C7alt chord [9]. The scales for C7+ and C7#11 make available the raised fourth (ﬂat 5), and ﬂat 6 (ﬂat 13) for improvising. The C7b9b5 and C7alt (C7+9) chords and their scales make available the ﬂat9, raised 9, ﬂat5 and raised 5 [1]. These substitutions provide the notes of Rule 2. These rules (used in phase 2) are stated below, using for reinforcement values very bad (-1.0), bad (-0.5), a little bad (-0.25), ok (0.25), good (0.5), and very good (1.0). The rules are discussed further in Section 4. The Rule Set: 1) Any note in the scale associated with the chord is ok (except as noted in rule 3). 2) On a dominant seventh, hip notes 9, ﬂat9, #9, #11, 13 or ﬂat13 are very good. One hip note 2 times in a row is a little bad. 2 hip notes more than 2 times in a row is a little bad. 3) If the chord is a dominant seventh chord, a natural 4th note is bad. 4) If the chord is a dominant seventh chord, a natural 7th is very bad. 5) A rest is good unless it is held for more than 2 16th notes and then it is very bad. 6) Any note played longer than 1 beat (4 16th notes) is very bad. 7) If two consecutive notes match the human’s, that is good. 2 CHIME Phase 1 In Phase 1, supervised learning is used to train a recurrent network to reproduce the three Sonny Rollins melodies. 2.1 Network Details and Training The recurrent network’s output units are linear. The hidden units are nonlinear (logistic function). Todd [11] used a Jordan recurrent network [6] for classical melody learning and generation. In CHIME, a Jordan net is also used, with the addition of the chord as input (Figure 1. 24 of the 26 outputs are notes (2 chromatic octaves), the 25th is a rest, and the 26th indicates a new note. The output with the highest value above a threshold is the next note, including the rest output. The new note output indicates if this is a new note, or if it is the same note being held for another time step ( note resolution). ¥£ ¡ ¦¤¢ The 12 chord inputs (12 notes in a chromatic scale), are 1 or 0. A chord is represented as its ﬁrst, third, ﬁfth, and seventh notes and it “wraps around” within the 12 inputs. E.g., the Fm7 chord F-Ab-C-Eb is represented as C, Eb, F, Ab or 100101001000. One plan input per song enables distinguishing between songs. The 26 context inputs use eligibility traces, giving the hidden units a decaying history of notes played. CHIME (as did Todd) uses teacher forcing [13], wherein the target outputs for the previous step are used as inputs (so erroneous outputs are not used as inputs). Todd used from 8 to 15 hidden units; CHIME uses 50. The learning rate is 0.075 (Todd used 0.05). The eligibility rate is 0.9 (Todd used 0.8). Differences in values perhaps reﬂect contrasting styles of the songs and available computing power. Todd used 15 output units and assumed a rest when all note units are “turned off.” CHIME uses 24 output note units (2 octaves). Long rests in the Rollins tunes require a dedicated output unit for a rest. Without it, the note outputs learned to turn off all the time. Below are results of four representative experiments. In all experiments, 15,000 presentations of the songs were made. Each song has 192 16th note events. All songs are played at a ﬁxed tempo. Weights are initialized to small random values. The squared error is the average squared error over one complete presentation of the song. “Finessing” the network may improve these values. The songs are easily recognized however, and an exact match could impair the network’s ability to improvise. Figure 2 shows the results for “Solid.” Experiment 1. Song: Blue Seven. Squared error starts at 185, decreases to 2.67. Experiment 2. Song: Tenor Madness. Squared error starts at 218, decreases to 1.05. Experiment 3. Song: Solid. Squared error starts at 184, decreases to 3.77. Experiment 4. Song: All three songs: Squared error starts at 185, decreases to 36. Figure 1: Jordan recurrent net with addition of chord input 2.2 Phase 1 Human Computer Interaction in Real Time In trading fours with the trained network, human note events are brought in via the MIDI interface [7]. Four bars of human notes are recorded then given, one note event at a time to the context inputs (replacing the recurrent inputs). The plan inputs are all 1. The chord inputs follow the “Solid” form. The machine generates its four bars and they are played in real time. Then the human plays again, etc. An accompaniment (drums, bass, and piano), produced by Band-in-a-Box software (PG Music), keeps the beat and provides chords for the human. Figure 3 shows an interaction. The machine’s improvisations are in the second and fourth lines. In bar 5 the ﬂat 9 of the Eb7 appears; the E. This note is used on the Eb7 and Bb7 chords by Rollins in “Blue 7”, as a “passing tone.” D is played in bar 5 on the Eb7. D is the natural 7 over Eb7 (with its ﬂat 7) but is a note that Rollins uses heavily in all three songs, and once over the Eb7. It may be a response to the rest and the Bb played by the human in bar 1. D follows both a rest and a Bb in many places in “Tenor Madness” and “Solid.” In bar 6, the long G and the Ab (the third then fourth of Eb7) ﬁgure prominently in “Solid.” At the beginning of bar 7 is the 2-note sequence Ab-E that appears in exactly the same place in the song “Blue 7.” The focus of bars 7 and 8 is jumping between the 3rd and 4th of Bb7. At the end of bar 8 the machine plays the ﬂat 9 (Ab) then the ﬂat 3 (Bb), of G7+9. In bars 13-16 the tones are longer, as are the human’s in bars 9-12. The tones are the 5th, the root, the 3rd, the root, the ﬂat 7, the 3rd, the 7th, and the raised fourth. Except for the last 2, these are chord tones. 3 CHIME Phase 2 In Phase 2, the network is expanded and trained by reinforcement learning to improvise according to the rules of Section 1.2 and using its knowledge of the Sonny Rollins songs. 3.1 The Expanded Network Figure 4 shows the phase 2 network. The same inputs plus 26 human inputs brings the total to 68. The weights obtained in phase 1 initialize this network. The plan and chord weights Figure 2: At left “Solid” played by a human; at right the song reproduced by the ANN. are the same. The weights connecting context units to the hidden layer are halved. The same weights, halved, connect the 26 human inputs to the hidden layer. Each output unit gets the 100 hidden units’ outputs as input. The original 50 weights are halved and used as initial values of the two sets of 50 hidden unit weights to the output unit. 3.2 SSR and Critic Algorithms Using actor-critic reinforcement learning ([2, 10, 13]), the actor chooses the next note to play. The critic receives a “raw” reinforcement signal from the critique made by the . A rules of Section 1.2. For output j, the SSR (actor) computes mean Gaussian distribution with mean and standard deviation chooses the output . is generated, the critic modiﬁes and produces . is further modiﬁed by a self-scaling algorithm that tracks, via moving average, the maximum and minimum reinforcement and uses them to scale the signal to produce .

3 0.55511189 1 nips-2001-(Not) Bounding the True Error

Author: John Langford, Rich Caruana

4 0.54085606 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

Author: Bram Bakker

5 0.53240591 26 nips-2001-Active Portfolio-Management based on Error Correction Neural Networks

Author: Hans-Georg Zimmermann, Ralph Neuneier, Ralph Grothmann

Abstract: This paper deals with a neural network architecture which establishes a portfolio management system similar to the Black / Litterman approach. This allocation scheme distributes funds across various securities or ﬁnancial markets while simultaneously complying with speciﬁc allocation constraints which meet the requirements of an investor. The portfolio optimization algorithm is modeled by a feedforward neural network. The underlying expected return forecasts are based on error correction neural networks (ECNN), which utilize the last model error as an auxiliary input to evaluate their own misspeciﬁcation. The portfolio optimization is implemented such that (i.) the allocations comply with investor’s constraints and that (ii.) the risk of the portfolio can be controlled. We demonstrate the proﬁtability of our approach by constructing internationally diversiﬁed portfolios across different ﬁnancial markets of the G7 contries. It turns out, that our approach is superior to a preset benchmark portfolio. ¢ £¡ 1 Introduction: Portfolio-Management We integrate the portfolio optimization algorithm suggested by Black / Litterman [1] into a neural network architecture. Combining the mean-variance theory [5] with the capital asset pricing model (CAPM) [7], this approach utilizes excess returns of the CAPM equilibrium to deﬁne a neutral, well balanced benchmark portfolio. Deviations from the benchmark allocation are only allowed within preset boundaries. Hence, as an advantage, there are no unrealistic solutions (e. g. large short positions, huge portfolio changes). Moreover, there is no need of formulating return expectations for all assets. In contrast to Black / Litterman, excess return forecasts are estimated by time-delay recurrent error correction neural networks [8]. Investment decisions which comply with given allocation constraints are derived from these predictions. The risk exposure of the portfolio is implicitly controlled by a parameter-optimizing task over time (sec. 3 and 5). Our approach consists of the following three steps: (i.) Construction of forecast models on the basis of error correction neural networks (ECNN) for all assets (sec. 2). ¤§© © © § ¥ ¦¨¦¤ To whom correspondence should be addressed: Georg.Zimmermann@mchp.siemens.de. ¤ ¤ (ii.) Computation of excess returns by a higher-level feedforward network (sec. 3 and 4). By this, the proﬁtability of an asset with respect to all others is measured. on the basis of the excess returns. (iii.) Optimization of the investment proportions Allocation constraints ensure, that the investment proportions may deviate from a given benchmark only within predeﬁned intervals (sec. 3 and 4). £ § ¨¡ ¥ £¡ ¦¤¢ ¡ © ¡ © Finally, we apply our neural network based portfolio management system to an asset allocation problem concerning the G7 countries (sec. 6). 2 Forecasting by Error Correction Neural Networks Most dynamical systems are driven by a superposition of autonomous development and external inﬂuences [8]. For discrete time grids, such a dynamics can be described by a recurrent state transition and an output equation (Eq. 1). ¥ § § state transition eq. output eq. (1) $

6 0.51870835 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network

7 0.51682907 169 nips-2001-Small-World Phenomena and the Dynamics of Information

8 0.50361383 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network

9 0.47527552 119 nips-2001-Means, Correlations and Bounds

10 0.42450163 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

11 0.42312571 173 nips-2001-Speech Recognition with Missing Data using Recurrent Neural Nets

12 0.42137399 76 nips-2001-Fast Parameter Estimation Using Green's Functions

13 0.40777206 177 nips-2001-Switch Packet Arbitration via Queue-Learning

14 0.39711165 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

15 0.3925119 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

16 0.39167088 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

17 0.38556108 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

18 0.35170031 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation

19 0.3410489 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding

20 0.32966751 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.056), (17, 0.035), (19, 0.052), (27, 0.097), (30, 0.103), (38, 0.061), (41, 0.268), (59, 0.023), (72, 0.039), (79, 0.057), (83, 0.013), (91, 0.11)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79257154 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

Author: M. Schmitt

2 0.59526974 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

Author: Gregor Wenning, Klaus Obermayer

Abstract: Cortical neurons might be considered as threshold elements integrating in parallel many excitatory and inhibitory inputs. Due to the apparent variability of cortical spike trains this yields a strongly fluctuating membrane potential, such that threshold crossings are highly irregular. Here we study how a neuron could maximize its sensitivity w.r.t. a relatively small subset of excitatory input. Weak signals embedded in fluctuations is the natural realm of stochastic resonance. The neuron's response is described in a hazard-function approximation applied to an Ornstein-Uhlenbeck process. We analytically derive an optimality criterium and give a learning rule for the adjustment of the membrane fluctuations, such that the sensitivity is maximal exploiting stochastic resonance. We show that adaptation depends only on quantities that could easily be estimated locally (in space and time) by the neuron. The main results are compared with simulations of a biophysically more realistic neuron model. 1

3 0.58638179 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes

Author: Thomas P. Trappenberg, Edmund T. Rolls, Simon M. Stringer

Abstract: Inferior temporal cortex (IT) neurons have large receptive ﬁelds when a single effective object stimulus is shown against a blank background, but have much smaller receptive ﬁelds when the object is placed in a natural scene. Thus, translation invariant object recognition is reduced in natural scenes, and this may help object selection. We describe a model which accounts for this by competition within an attractor in which the neurons are tuned to different objects in the scene, and the fovea has a higher cortical magniﬁcation factor than the peripheral visual ﬁeld. Furthermore, we show that top-down object bias can increase the receptive ﬁeld size, facilitating object search in complex visual scenes, and providing a model of object-based attention. The model leads to the prediction that introduction of a second object into a scene with blank background will reduce the receptive ﬁeld size to values that depend on the closeness of the second object to the target stimulus. We suggest that mechanisms of this type enable the output of IT to be primarily about one object, so that the areas that receive from IT can select the object as a potential target for action.

4 0.58482093 13 nips-2001-A Natural Policy Gradient

Author: Sham M. Kakade

Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1

5 0.58149171 46 nips-2001-Categorization by Learning and Combining Object Parts

Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio

Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classiﬁers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classiﬁers are then combined in a second stage to yield a hierarchical SVM classiﬁer. Experimental results in face classiﬁcation show considerable robustness against rotations in depth and suggest performance at signiﬁcantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classiﬁcation experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classiﬁer which may be relevant for biological models of visual recognition.

6 0.57974315 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

7 0.57862455 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

8 0.57792711 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

9 0.57618415 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

10 0.57539415 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

11 0.57352912 56 nips-2001-Convolution Kernels for Natural Language

12 0.57101721 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

13 0.5709846 60 nips-2001-Discriminative Direction for Kernel Classifiers

14 0.5708009 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

15 0.57032245 89 nips-2001-Grouping with Bias

16 0.56994987 182 nips-2001-The Fidelity of Local Ordinal Encoding

17 0.56839257 169 nips-2001-Small-World Phenomena and the Dynamics of Information

18 0.5674569 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

19 0.56740087 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

20 0.56678361 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments