nips nips2001 nips2001-6 knowledge-graph by maker-knowledge-mining

6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

Source: pdf

Author: Christopher Raphael

Abstract: We describe a computer system that provides a real-time musical accompaniment for a live soloist in a piece of non-improvised music for soloist and accompaniment. A Bayesian network is developed that represents the joint distribution on the times at which the solo and accompaniment notes are played, relating the two parts through a layer of hidden variables. The network is first constructed using the rhythmic information contained in the musical score. The network is then trained to capture the musical interpretations of the soloist and accompanist in an off-line rehearsal phase. During live accompaniment the learned distribution of the network is combined with a real-time analysis of the soloist's acoustic signal, performed with a hidden Markov model, to generate a musically principled accompaniment that respects all available sources of knowledge. A live demonstration will be provided. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We describe a computer system that provides a real-time musical accompaniment for a live soloist in a piece of non-improvised music for soloist and accompaniment. [sent-3, score-1.807]

2 A Bayesian network is developed that represents the joint distribution on the times at which the solo and accompaniment notes are played, relating the two parts through a layer of hidden variables. [sent-4, score-1.197]

3 The network is first constructed using the rhythmic information contained in the musical score. [sent-5, score-0.448]

4 The network is then trained to capture the musical interpretations of the soloist and accompanist in an off-line rehearsal phase. [sent-6, score-0.799]

5 During live accompaniment the learned distribution of the network is combined with a real-time analysis of the soloist's acoustic signal, performed with a hidden Markov model, to generate a musically principled accompaniment that respects all available sources of knowledge. [sent-7, score-1.564]

6 1 Introduction We discuss our continuing work in developing a computer system that plays the role of a musical accompanist in a piece of non-improvisatory music for soloist and accompaniment. [sent-9, score-0.86]

7 The system begins with the musical score to a given piece of music. [sent-10, score-0.442]

8 Then, using training for the accompaniment part as well as a series of rehearsals, we learn a performer-specific model for the rhythmic interpretation of the composition. [sent-11, score-0.866]

9 In performance, the system takes the acoustic signal of the live player and generates the accompaniment around this signal, in real-time, while respecting the learned model and the constraints imposed by the score. [sent-12, score-0.855]

10 The accompaniment played by our system responds both flexibly and expressively to the soloist's musical interpretation. [sent-13, score-1.084]

11 " Listen takes as input the acoustic signal of the soloist and, using a hidden Markov model, performs a real-time analysis of the signal. [sent-15, score-0.419]

12 The output of Listen is essentially a running commentary on the acoustic input which identifies note boundaries in the solo part and communicates these events with variable latency. [sent-16, score-0.491]

13 Thus we can automatically adapt to changes in solo instrument, microphone placement, ambient noise, room acoustics, and the sound of the accompaniment instrument. [sent-19, score-1.025]

14 Fast dynamic programming algorithms provide the computational efficiency necessary to process the soloist's acoustic signal at a rate consistent with the real-time demands of our application. [sent-22, score-0.072]

15 This formulation allows one to compute the probability that an event is in the past. [sent-26, score-0.061]

16 We delay the estimation of the precise location of an event until we are reasonably confident that it is, in fact, past. [sent-27, score-0.061]

17 In this way our system achieves accuracy while retaining the lowest latency possible in the identification of musical events. [sent-28, score-0.316]

18 The heart of our system, the Play component, develops a Bayesian network consisting of hundreds of Gaussian random variables including both observable quantities, such as note onset times, and unobservable quantities, such as local tempo. [sent-30, score-0.252]

19 The network can be trained during a rehearsal phase to model both the soloist's and accompanist's interpretations of a specific piece of music. [sent-31, score-0.16]

20 2 Knowledge Sources A musical accompaniment requires the synthesis of a number of different knowledge sources. [sent-35, score-0.962]

21 From a modeling perspective, the fundamental challenge of musical accompaniment is to express these disparate knowledge sources in terms of a common denominator. [sent-36, score-0.999]

22 We describe here the three knowledge sources we use. [sent-37, score-0.057]

23 We work with non-improvisatory music so naturally the musical score, which gives the pitches and relative durations of the various notes, as well as points of synchronization between the soloist and accompaniment, must figure prominently in our model. [sent-39, score-0.713]

24 The score should not be viewed as a rigid grid prescribing the precise times at which musical events will occur; rather, the score gives the basic elastic material which will be stretched in various ways to to produce the actual performance. [sent-40, score-0.457]

25 The score simply does not address most interpretive aspects of performance. [sent-41, score-0.061]

26 Since our accompanist must follow the soloist, the output of the Listen component, which identifies note boundaries in the solo part, constitutes our second knowledge source. [sent-43, score-0.492]

27 While most musical events, such as changes between neighboring diatonic pitches, can be detected very shortly after the change of note, some events, such as rearticulations and octave slurs, are much less obvious and can only be precisely located with the benefit of longer term hindsight. [sent-44, score-0.312]

28 With this in mind, we feel that any successful accompaniment system cannot synchronize in a purely responsive manner. [sent-45, score-0.692]

29 Rather it must be able to predict the future using the past and base its synchronization on these predictions, as human musicians do. [sent-46, score-0.052]

30 While the same player's performance of a particular piece will vary from rendition to rendition, many aspects of musical interpretation are clearly established with only a few repeated examples. [sent-48, score-0.451]

31 These examples, both of solo performances and human (MIDI) performances of the accompaniment part constitute the third knowledge source for our system. [sent-49, score-1.17]

32 The solo data is used primarily to teach the system how to predict the future evolution of the solo part. [sent-50, score-0.685]

33 The accompaniment data is used to learn the musicality necessary to bring the accompaniment to life. [sent-51, score-1.367]

34 We have developed a probabilistic model, a Bayesian network, that represents all of these knowledge sources through a jointly Gaussian distribution containing hundreds of random variables. [sent-52, score-0.078]

35 The observable variables in this model are the estimated soloist note onset times produced by Listen and the directly observable times for the accompaniment notes. [sent-53, score-1.283]

36 Between these observable variables lies a layer of hidden variables that describe unobservable quantities such as local tempo, change in tempo, and rhythmic stress. [sent-54, score-0.413]

37 3 A Model for Rhythmic Interpretation We begin by describing a model for the sequence of note onset times generated by a monophonic (single voice) musical instrument playing a known piece of music. [sent-55, score-0.535]

38 , N - 1, where in is the musical length of the nth note, in measures, and the {(Tn' CTnY} and (to, so)t are mutually independent Gaussian random vectors. [sent-64, score-0.303]

39 ćŽł The distributions of the {CT n } will tend concentrate around expressing the notion that tempo changes are gradual. [sent-65, score-0.139]

40 The means and variances of the {CT n} show where the soloist is speeding-up (negative mean), slowing-down (positive mean), and tell us if these tempo changes are nearly deterministic (low variance), or quite variable (high variance). [sent-66, score-0.439]

41 The {Tn} variables describe stretches (positive mean) or compressions (negative mean) in the music that occur without any actual change in tempo, as in a tenuto or agogic accent. [sent-67, score-0.092]

42 The addition of the {Tn} variables leads to a more musically plausible model, since not all variation in note lengths can be explained through tempo variation. [sent-68, score-0.213]

43 Equally important, however, the {Tn} variables stabilize the model by not forcing the model to explain, and hence respond to, all note length variation as tempo variation. [sent-69, score-0.18]

44 Collectively, the distributions of the (Tn' CTn)t vectors characterize the solo player's rhythmic interpretation. [sent-70, score-0.494]

45 Both overall tendencies (means) and the repeatability of these tendencies (covariances) are captured by these distributions. [sent-71, score-0.052]

46 1 Joint Model of Solo and Accompaniment In modeling the situation of musical accompaniment we begin with the our basic rhythm model of Eqn. [sent-73, score-0.987]

47 More precisely, Listen Update Composite Accomp Figure 1: A graphical description of the dependency structure of our model. [sent-75, score-0.047]

48 The top layer of the graph corresponds to the solo note onset times detected by Listen. [sent-76, score-0.582]

49 The 2nd layer of the graph describes the (Tn, 0"n) variables that characterize the rhythmic interpretation. [sent-77, score-0.279]

50 The 3rd layer of the graph is the time-tempo process {(Sn, t n )}. [sent-78, score-0.097]

51 The bottom layer is the observed accompaniment event times. [sent-79, score-0.812]

52 , m'Na denote the positions, in measures, of the various solo and accompaniment events. [sent-86, score-0.985]

53 For example, a sequence of quarter notes in 3/ 4 time would lie at measure positions 0, 1/ 3, 2/ 3, etc. [sent-87, score-0.084]

54 , mN be the sorted union of these two sets of positions with duplicate times removed; thus mo < ml < . [sent-91, score-0.112]

55 In this figure, the layer labeled "Composite" corresponds to the time-tempo variables, (tn, sn)t, for the composite rhythm, while the layer labeled "Update" corresponds to the interpretation variables (Tn, 0"n) t. [sent-100, score-0.33]

56 The directed arrows of this graph indicate the conditional dependency structure of our model. [sent-101, score-0.09]

57 Thus, given all variables "upstream" of a variable, x, in the graph, the conditional distribution of x depends only on the parent variables. [sent-102, score-0.084]

58 Recall that the Listen component estimates the times at which solo notes begin. [sent-103, score-0.443]

59 We model the note onset times estimated by Listen as noisy observations of the true positions {t n }. [sent-105, score-0.16]

60 Thus if m n is a measure position at which a solo note occurs, then the corresponding estimate from Listen is modeled as an = tn + an 2). [sent-106, score-0.538]

61 Similarly, if m n is the measure position of an accompaniment where an rv N(O, 1I event, then we model the observed time at which the event occurs as bn = tn + f3n where f3n rv N(O, ",2). [sent-107, score-1.004]

62 These two collections of observable variables constitute the top layer of our figure, labeled "Listen," and the bottom layer, labeled "Accomp. [sent-108, score-0.21]

63 " There are, of course, measure positions at which both solo and accompaniment events should occur. [sent-109, score-1.061]

64 The vectors/ variables {(to, so)t, (Tn ' O"n)t, a n , f3n} are assumed to be mutually independent. [sent-111, score-0.06]

65 4 Training the Model Our system learns its rhythmic interpretation by estimating the parameters of the (Tn,O"n) variables. [sent-112, score-0.222]

66 We begin with a collection of J performances of the accompaniment part played in isolation. [sent-113, score-0.841]

67 We refer to the model learned from this accompaniment data as the "practice room" distribution since it reflects the way the accompanist plays when the constraint of following the soloist is absent. [sent-114, score-1.083]

68 For each Listen Update Composite Accomp Figure 2: Conditioning on the observed accompaniment performance (darkened circles), we use the message passing algorithm to compute the conditional distributions on the unobservable {Tn' O"n} variables. [sent-115, score-0.861]

69 such performance, we treat the sequence of times at which accompaniment events occur as observed variables in our model. [sent-116, score-0.815]

70 These variables are shown with darkened circles in Figure 2. [sent-117, score-0.113]

71 Given an initial assignment of of means and covariances to the (Tn , O"n) variables, we use the "message passing" algorithm of Bayesian Networks [8,9] to compute the conditional distributions (given the observed performance) of the (Tn,O"n) variables. [sent-118, score-0.112]

72 Several such performances lead to several such estimates, enabling us to improve our initial estimates by reestimating the (Tn ' O"n) parameters from these conditional distributions. [sent-119, score-0.125]

73 The conditional distribution of (Tn, 0"n) given the jth accompaniment performance, and using {J-L~ , ~~} , has a N(m; ,n, S~ ) distribution where the m;,n and S~ parameters are computed using the message passing algorithm. [sent-122, score-0.768]

74 } Lmj,n j=l ~ i+ l n The conventional wisdom of musicians is that the accompaniment should follow the soloist. [sent-124, score-0.687]

75 In past versions of our system we have explicitly modeled the asymmetric roles of soloist and accompaniment through a rather complicated graph structure [2- 4] . [sent-125, score-1.047]

76 Training using the accompaniment performances allows our model to learn some of the musicality these performances demonstrate. [sent-127, score-0.836]

77 Since the soloist's interpretation must take precedence, we want to use this accompaniment interpretation only to the extent that it does not conflict with that of the soloist. [sent-128, score-0.753]

78 We accomplish this by first beginning with the result of the accompaniment training described above. [sent-129, score-0.659]

79 We use the practice room distributions , (the distributions on the {(Tn, O"n)} learned from the accompaniment data) , as the initial distributions , {J-L~ , ~~} . [sent-130, score-0.777]

80 We then run the EM algorithm as described above now treating the currently available collection of solo performances as the observed data. [sent-131, score-0.498]

81 During this phase, only those parameters relevant to the soloist's rhythmic interpretation will be modified significantly. [sent-132, score-0.189]

82 Parameters describing the interpretation of a musical segment in which the soloist is mostly absent will be largely unaffected by the second training pass. [sent-133, score-0.656]

83 Listen Update Composite Accomp Figure 3: At any given point in the performance we will have observed a collection of solo note times estimated estimated by Listen, and the accompaniment event times (the darkened circles). [sent-134, score-1.263]

84 We compute the conditional distribution on the next unplayed accompaniment event, given these observations. [sent-135, score-0.736]

85 This solo training actually happens over the course of a series of rehearsals. [sent-136, score-0.326]

86 We first initialize our model to the practice room distribution by training with the accompaniment data. [sent-137, score-0.699]

87 Then we iterate the process of creating a performance with our system, (described in the next section), extracting the sequence of solo note onset times in an off-line estimation process, and then retraining the model using all currently available solo performances. [sent-138, score-0.841]

88 In our experience, only a few such rehearsals are necessary to train a system that responds gracefully and anticipates the soloist's rhythmic nuance where appropriate - generally less than 10. [sent-139, score-0.23]

89 5 Real Time Accompaniment The methodological key to our real-time accompaniment algorithm is the computation of (conditional) marginal distributions facilitated by the message-passing machinery of Bayesian networks. [sent-140, score-0.703]

90 At any point during the performance some collection of solo notes and accompaniment notes will have been observed, as in Fig. [sent-141, score-1.118]

91 The real-time computational requirement is limited by passing only the messages necessary to compute the marginal distribution on the pending accompaniment note. [sent-144, score-0.792]

92 Once the conditional marginal distribution of the pending accompaniment note is calculated we schedule the note accordingly. [sent-145, score-0.877]

93 Currently we schedule the note to be played at the conditional mean time, given all observed information, however other reasonable choices are possible. [sent-146, score-0.169]

94 The initial scheduling of each accompaniment note takes place immediately after the previous accompaniment note is played. [sent-148, score-1.372]

95 It is possible that a solo note will be detected before the pending accompaniment is played; in this event the pending accompaniment event is rescheduled by recomputing the its conditional distribution using the newly available information. [sent-149, score-2.082]

96 The pending accompaniment note is rescheduled each time an additional solo note is detected until its currently scheduled time arrives, at which time it is finally played. [sent-150, score-1.223]

97 In this way our accompaniment makes use of all currently available information. [sent-151, score-0.718]

98 Does our system pass the musical equivalent of the Turing Test? [sent-152, score-0.316]

99 However, we believe that the level of musicality attained by our system is truly surprising, while the reliability is sufficient for live demonstration. [sent-154, score-0.142]

100 We hope that the interested reader will form an independent opinion, even if different from ours, and to this end we have made musical examples demonstrating our progress available on the web page: http://fafner. [sent-155, score-0.302]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('accompaniment', 0.659), ('soloist', 0.326), ('solo', 0.326), ('musical', 0.283), ('tn', 0.185), ('listen', 0.181), ('rhythmic', 0.142), ('tempo', 0.113), ('accompanist', 0.098), ('raphael', 0.098), ('pending', 0.082), ('lauritzen', 0.071), ('piece', 0.068), ('layer', 0.068), ('composite', 0.065), ('performances', 0.064), ('event', 0.061), ('live', 0.06), ('onset', 0.057), ('notes', 0.054), ('played', 0.054), ('acoustic', 0.053), ('music', 0.052), ('accomp', 0.049), ('darkened', 0.049), ('musicality', 0.049), ('interpretation', 0.047), ('events', 0.046), ('times', 0.046), ('conditional', 0.044), ('unobservable', 0.043), ('observable', 0.041), ('score', 0.041), ('variables', 0.04), ('currently', 0.04), ('room', 0.04), ('sources', 0.037), ('interpretations', 0.036), ('mo', 0.036), ('system', 0.033), ('passing', 0.033), ('expressively', 0.033), ('instrument', 0.033), ('musically', 0.033), ('rehearsal', 0.033), ('rehearsals', 0.033), ('rendition', 0.033), ('rescheduled', 0.033), ('unplayed', 0.033), ('message', 0.032), ('expert', 0.031), ('player', 0.031), ('positions', 0.03), ('graphical', 0.03), ('detected', 0.029), ('graph', 0.029), ('sn', 0.029), ('mn', 0.029), ('bayesian', 0.029), ('amherst', 0.028), ('cowell', 0.028), ('dawid', 0.028), ('musicians', 0.028), ('pitches', 0.028), ('note', 0.027), ('rv', 0.026), ('spiegelhalter', 0.026), ('tendencies', 0.026), ('distributions', 0.026), ('collection', 0.025), ('circles', 0.024), ('synchronization', 0.024), ('rhythm', 0.024), ('observed', 0.024), ('network', 0.023), ('bn', 0.023), ('responds', 0.022), ('hidden', 0.021), ('labeled', 0.021), ('begin', 0.021), ('hundreds', 0.021), ('identifies', 0.021), ('aspects', 0.02), ('mutually', 0.02), ('schedule', 0.02), ('knowledge', 0.02), ('signal', 0.019), ('association', 0.019), ('constitute', 0.019), ('ct', 0.019), ('play', 0.019), ('available', 0.019), ('quantities', 0.018), ('part', 0.018), ('covariances', 0.018), ('marginal', 0.018), ('estimates', 0.017), ('update', 0.017), ('begins', 0.017), ('dependency', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

Author: Christopher Raphael

2 0.17347205 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo

Author: Ali Taylan Cemgil, Bert Kappen

Abstract: We present a probabilistic generative model for timing deviations in expressive music. performance. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out. The resulting model is suitable for realtime tempo tracking and transcription and hence useful in a number of music applications such as adaptive automatic accompaniment and score typesetting. 1

3 0.066154063 91 nips-2001-Improvisation and Learning

Author: Judy A. Franklin

Abstract: This article presents a 2-phase computational learning model and application. As a demonstration, a system has been built, called CHIME for Computer Human Interacting Musical Entity. In phase 1 of training, recurrent back-propagation trains the machine to reproduce 3 jazz melodies. The recurrent network is expanded and is further trained in phase 2 with a reinforcement learning algorithm and a critique produced by a set of basic rules for jazz improvisation. After each phase CHIME can interactively improvise with a human in real time. 1 Foundations Jazz improvisation is the creation of a jazz melody in real time. Charlie Parker, Dizzy Gillespie, Miles Davis, John Coltrane, Charles Mingus, Thelonious Monk, and Sonny Rollins et al. were the founders of bebop and post bop jazz [9] where drummers, bassists, and pianists keep the beat and maintain harmonic structure. Other players improvise over this structure and even take turns improvising for 4 bars at a time. This is called trading fours. Meanwhile, artiﬁcial neural networks have been used in computer music [4, 12]. In particular, the work of (Todd [11]) is the basis for phase 1 of CHIME, a novice machine improvisor that learns to trade fours. Firstly, a recurrent network is trained with back-propagation to play three jazz melodies by Sonny Rollins [1], as described in Section 2. Phase 2 uses actor-critic reinforcement learning and is described in Section 3. This section is on jazz basics. 1.1 Basics: Chords, the ii-V-I Chord Progression and Scales The harmonic structure mentioned above is a series of chords that may be reprated and that are often grouped into standard subsequences. A chord is a group of notes played simultaneously. In the chromatic scale, C-Db-D-Eb-E-F-Gb-G-Ab-A-Bb-B-C, notes are separated by a half step. A ﬂat (b) note is a half step below the original note; a sharp (#) is a half above. Two half steps are a whole step. Two whole steps are a major third. Three half steps are a minor third. A major triad (chord) is the ﬁrst or tonic note, then the note a major third up, then the note a minor third up. When F is the tonic, F major triad is F-A-C. A minor triad (chord) is the tonic ¡ www.cs.smith.edu/˜jfrankli then a minor third, then a major third. F minor triad is F-Ab-C. The diminished triad is the tonic, then a minor third, then a minor third. F diminished triad is F-Ab-Cb. An augmented triad is the tonic, then a major third, then a major third. The F augmented triad is F-A-Db. A third added to the top of a triad forms a seventh chord. A major triad plus a major third is the major seventh chord. F-A-C-E is the F major seventh chord (Fmaj7). A minor triad plus a minor third is a minor seventh chord. For F it is F-Ab-C-Eb (Fm7). A major triad plus a minor third is a dominant seventh chord. For F it is F-A-C-Eb (F7). These three types of chords are used heavily in jazz harmony. Notice that each note in the chromatic scales can be the tonic note for any of these types of chords. A scale, a subset of the chromatic scale, is characterized by note intervals. Let W be a whole step and H be a half. The chromatic scale is HHHHHHHHHHHH. The major scale or ionian mode is WWHWWWH. F major scale is F-G-A-Bb-C-D-E-F. The notes in a scale are degrees; E is the seventh degree of F major. The ﬁrst, third, ﬁfth, and seventh notes of a major scale are the major seventh chord. The ﬁrst, third, ﬁfth, and seventh notes of other modes produce the minor seventh and dominant seventh chords. Roman numerals represent scale degrees and their seventh chords. Upper case implies major or dominant seventh and lower case implies minor seventh [9]. The major seventh chord starting at the scale tonic is the I (one) chord. G is the second degree of F major, and G-Bb-D-F is Gm7, the ii chord, with respect to F. The ii-V-I progression is prevalent in jazz [9], and for F it is Gm7-C7-Fmaj7. The minor ii-V-i progression is obtained using diminished and augmented triads, their seventh chords, and the aeolian mode. Seventh chords can be extended by adding major or minor thirds, e.g. Fmaj9, Fmaj11, Fmaj13, Gm9, Gm11, and Gm13. Any extension can be raised or lowered by 1 step [9] to obtain, e.g. Fmaj7#11, C7#9, C7b9, C7#11. Most jazz compositions are either the 12 bar blues or sectional forms (e.g. ABAB, ABAC, or AABA) [8]. The 3 Rollins songs are 12 bar blues. “Blue 7” has a simple blues form. In “Solid” and “Tenor Madness”, Rollins adds bebop variations to the blues form [1]. ii-V-I and VI-II-V-I progressions are added and G7+9 substitutes for the VI and F7+9 for the V (see section 1.2 below); the II-V in the last bar provides the turnaround to the I of the ﬁrst bar to foster smooth repetition of the form. The result is at left and in Roman numeral notation Bb7 Bb7 Bb7 Bb7 I I I I Eb7 Eb7 Bb7 G7+9 IV IV I VI at right: Cm7 F7 Bb7 G7+9 C7 F7+9 ii V I VI II V 1.2 Scale Substitutions and Rules for Reinforcement Learning First note that the theory and rules derived in this subsection are used in Phase 2, to be described in Section 3. They are presented here since they derive from the jazz basics immediately preceding. One way a novice improvisor can play is to associate one scale with each chord and choose notes from that scale when the chord is presented in the musical score. Therefore, Rule 1 is that an improvisor may choose notes from a “standard” scale associated with a chord. Next, the 4th degree of the scale is often avoided on a major or dominant seventh chord (Rule 3), unless the player can resolve its dissonance. The major 7th is an avoid note on a dominant seventh chord (Rule 4) since a dominant seventh chord and its scale contain the ﬂat 7th, not the major 7th. Rule 2 contains many notes that can be added. A brief rationale is given next. The C7 in Gm7-C7-Fmaj7 may be replaced by a C7#11, a C7+ chord, or a C7b9b5 or C7alt chord [9]. The scales for C7+ and C7#11 make available the raised fourth (ﬂat 5), and ﬂat 6 (ﬂat 13) for improvising. The C7b9b5 and C7alt (C7+9) chords and their scales make available the ﬂat9, raised 9, ﬂat5 and raised 5 [1]. These substitutions provide the notes of Rule 2. These rules (used in phase 2) are stated below, using for reinforcement values very bad (-1.0), bad (-0.5), a little bad (-0.25), ok (0.25), good (0.5), and very good (1.0). The rules are discussed further in Section 4. The Rule Set: 1) Any note in the scale associated with the chord is ok (except as noted in rule 3). 2) On a dominant seventh, hip notes 9, ﬂat9, #9, #11, 13 or ﬂat13 are very good. One hip note 2 times in a row is a little bad. 2 hip notes more than 2 times in a row is a little bad. 3) If the chord is a dominant seventh chord, a natural 4th note is bad. 4) If the chord is a dominant seventh chord, a natural 7th is very bad. 5) A rest is good unless it is held for more than 2 16th notes and then it is very bad. 6) Any note played longer than 1 beat (4 16th notes) is very bad. 7) If two consecutive notes match the human’s, that is good. 2 CHIME Phase 1 In Phase 1, supervised learning is used to train a recurrent network to reproduce the three Sonny Rollins melodies. 2.1 Network Details and Training The recurrent network’s output units are linear. The hidden units are nonlinear (logistic function). Todd [11] used a Jordan recurrent network [6] for classical melody learning and generation. In CHIME, a Jordan net is also used, with the addition of the chord as input (Figure 1. 24 of the 26 outputs are notes (2 chromatic octaves), the 25th is a rest, and the 26th indicates a new note. The output with the highest value above a threshold is the next note, including the rest output. The new note output indicates if this is a new note, or if it is the same note being held for another time step ( note resolution). ¥£ ¡ ¦¤¢ The 12 chord inputs (12 notes in a chromatic scale), are 1 or 0. A chord is represented as its ﬁrst, third, ﬁfth, and seventh notes and it “wraps around” within the 12 inputs. E.g., the Fm7 chord F-Ab-C-Eb is represented as C, Eb, F, Ab or 100101001000. One plan input per song enables distinguishing between songs. The 26 context inputs use eligibility traces, giving the hidden units a decaying history of notes played. CHIME (as did Todd) uses teacher forcing [13], wherein the target outputs for the previous step are used as inputs (so erroneous outputs are not used as inputs). Todd used from 8 to 15 hidden units; CHIME uses 50. The learning rate is 0.075 (Todd used 0.05). The eligibility rate is 0.9 (Todd used 0.8). Differences in values perhaps reﬂect contrasting styles of the songs and available computing power. Todd used 15 output units and assumed a rest when all note units are “turned off.” CHIME uses 24 output note units (2 octaves). Long rests in the Rollins tunes require a dedicated output unit for a rest. Without it, the note outputs learned to turn off all the time. Below are results of four representative experiments. In all experiments, 15,000 presentations of the songs were made. Each song has 192 16th note events. All songs are played at a ﬁxed tempo. Weights are initialized to small random values. The squared error is the average squared error over one complete presentation of the song. “Finessing” the network may improve these values. The songs are easily recognized however, and an exact match could impair the network’s ability to improvise. Figure 2 shows the results for “Solid.” Experiment 1. Song: Blue Seven. Squared error starts at 185, decreases to 2.67. Experiment 2. Song: Tenor Madness. Squared error starts at 218, decreases to 1.05. Experiment 3. Song: Solid. Squared error starts at 184, decreases to 3.77. Experiment 4. Song: All three songs: Squared error starts at 185, decreases to 36. Figure 1: Jordan recurrent net with addition of chord input 2.2 Phase 1 Human Computer Interaction in Real Time In trading fours with the trained network, human note events are brought in via the MIDI interface [7]. Four bars of human notes are recorded then given, one note event at a time to the context inputs (replacing the recurrent inputs). The plan inputs are all 1. The chord inputs follow the “Solid” form. The machine generates its four bars and they are played in real time. Then the human plays again, etc. An accompaniment (drums, bass, and piano), produced by Band-in-a-Box software (PG Music), keeps the beat and provides chords for the human. Figure 3 shows an interaction. The machine’s improvisations are in the second and fourth lines. In bar 5 the ﬂat 9 of the Eb7 appears; the E. This note is used on the Eb7 and Bb7 chords by Rollins in “Blue 7”, as a “passing tone.” D is played in bar 5 on the Eb7. D is the natural 7 over Eb7 (with its ﬂat 7) but is a note that Rollins uses heavily in all three songs, and once over the Eb7. It may be a response to the rest and the Bb played by the human in bar 1. D follows both a rest and a Bb in many places in “Tenor Madness” and “Solid.” In bar 6, the long G and the Ab (the third then fourth of Eb7) ﬁgure prominently in “Solid.” At the beginning of bar 7 is the 2-note sequence Ab-E that appears in exactly the same place in the song “Blue 7.” The focus of bars 7 and 8 is jumping between the 3rd and 4th of Bb7. At the end of bar 8 the machine plays the ﬂat 9 (Ab) then the ﬂat 3 (Bb), of G7+9. In bars 13-16 the tones are longer, as are the human’s in bars 9-12. The tones are the 5th, the root, the 3rd, the root, the ﬂat 7, the 3rd, the 7th, and the raised fourth. Except for the last 2, these are chord tones. 3 CHIME Phase 2 In Phase 2, the network is expanded and trained by reinforcement learning to improvise according to the rules of Section 1.2 and using its knowledge of the Sonny Rollins songs. 3.1 The Expanded Network Figure 4 shows the phase 2 network. The same inputs plus 26 human inputs brings the total to 68. The weights obtained in phase 1 initialize this network. The plan and chord weights Figure 2: At left “Solid” played by a human; at right the song reproduced by the ANN. are the same. The weights connecting context units to the hidden layer are halved. The same weights, halved, connect the 26 human inputs to the hidden layer. Each output unit gets the 100 hidden units’ outputs as input. The original 50 weights are halved and used as initial values of the two sets of 50 hidden unit weights to the output unit. 3.2 SSR and Critic Algorithms Using actor-critic reinforcement learning ([2, 10, 13]), the actor chooses the next note to play. The critic receives a “raw” reinforcement signal from the critique made by the . A rules of Section 1.2. For output j, the SSR (actor) computes mean Gaussian distribution with mean and standard deviation chooses the output . is generated, the critic modiﬁes and produces . is further modiﬁed by a self-scaling algorithm that tracks, via moving average, the maximum and minimum reinforcement and uses them to scale the signal to produce .

4 0.044320717 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

Author: Andrew D. Brown, Geoffrey E. Hinton

Abstract: Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1

5 0.039911553 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

Author: Shimon Edelman, Benjamin P. Hiles, Hwajin Yang, Nathan Intrator

Abstract: To ﬁnd out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow’s criterion of “suspicious coincidence” (the ratio of joint probability to the product of marginals). We then compared the part veriﬁcation response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the signiﬁcance of their co-occurrence as estimated by Barlow’s criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain’s strategies for unsupervised acquisition of structural information in vision. 1 Motivation How does the human visual system decide for which objects it should maintain distinct and persistent internal representations of the kind typically postulated by theories of object recognition? Consider, for example, the image shown in Figure 1, left. This image can be represented as a monolithic hieroglyph, a pair of Chinese characters (which we shall refer to as and ), a set of strokes, or, trivially, as a collection of pixels. Note that the second option is only available to a system previously exposed to various combinations of Chinese characters. Indeed, a principled decision whether to represent this image as , or otherwise can only be made on the basis of prior exposure to related images. £ ¡ £¦ ¡ £ ¥¨§¢ ¥¤¢ ¢ According to Barlow’s [1] insight, one useful principle is tallying suspicious coincidences: two candidate fragments and should be combined into a composite object if the probability of their joint appearance is much higher than , which is the probability expected in the case of their statistical independence. This criterion may be compared to the Minimum Description Length (MDL) principle, which has been previously discussed in the context of object representation [2, 3]. In a simpliﬁed form [4], MDL calls for representing explicitly as a whole if , just as the principle of suspicious coincidences does. £ ©¢ £ ¢ ¥¤¥ £¦ ¢ ¥ £ ¢ £¦ ¢ ¥¤¥! ¨§¥ £ ¢ £ ©¢ £¦ £ ¨§¢¥ ¡ ¢ While the Barlow/MDL criterion certainly indicates a suspicious coincidence, there are additional probabilistic considerations that may be used and . One example is the possiin setting the degree of association between ble perfect predictability of from and vice versa, as measured by . If , then and are perfectly predictive of each other and should really be coded by a single symbol, whereas the MDL criterion may suggest merely that some association between the representation of and that of be established. In comparison, if and are not perfectly predictive of each other ( ), there is a case to be made in favor of coding them separately to allow for a maximally expressive representation, whereas MDL may actually suggest a high degree of association ). In this study we investigated whether the human (if visual system uses a criterion based on alongside MDL while learning (in an unsupervised manner) to represent composite objects. £ £ £ ¢ ¥ ¥ © §¥ ¡ ¢ ¨¦¤

6 0.03680978 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

7 0.036319688 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models

8 0.035874184 35 nips-2001-Analysis of Sparse Bayesian Learning

9 0.034307163 43 nips-2001-Bayesian time series classification

10 0.028900245 193 nips-2001-Unsupervised Learning of Human Motion Models

11 0.028627735 44 nips-2001-Blind Source Separation via Multinode Sparse Representation

12 0.028531363 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

13 0.026915619 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

14 0.026243707 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing

15 0.02608511 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network

16 0.025615767 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables

17 0.025496524 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

18 0.025029214 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

19 0.024926387 84 nips-2001-Global Coordination of Local Linear Models

20 0.024108151 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.089), (1, -0.027), (2, -0.002), (3, -0.033), (4, -0.075), (5, -0.03), (6, 0.013), (7, 0.02), (8, -0.022), (9, -0.031), (10, 0.006), (11, -0.01), (12, 0.004), (13, -0.011), (14, 0.013), (15, 0.019), (16, 0.031), (17, -0.021), (18, 0.007), (19, 0.026), (20, -0.088), (21, -0.003), (22, -0.056), (23, 0.032), (24, -0.054), (25, 0.024), (26, 0.062), (27, -0.04), (28, -0.244), (29, -0.156), (30, 0.166), (31, -0.146), (32, -0.039), (33, 0.064), (34, 0.099), (35, 0.18), (36, -0.19), (37, -0.127), (38, 0.062), (39, 0.071), (40, -0.095), (41, -0.054), (42, -0.145), (43, 0.016), (44, 0.147), (45, 0.255), (46, 0.111), (47, 0.001), (48, 0.122), (49, 0.092)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95525759 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

Author: Christopher Raphael

2 0.75821906 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo

Author: Ali Taylan Cemgil, Bert Kappen

3 0.57318068 91 nips-2001-Improvisation and Learning

Author: Judy A. Franklin

4 0.26972276 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists

Author: John C. Platt, Christopher J. C. Burges, Steven Swenson, Christopher Weare, Alice Zheng

Abstract: This paper presents AutoDJ: a system for automatically generating music playlists based on one or more seed songs selected by a user. AutoDJ uses Gaussian Process Regression to learn a user preference function over songs. This function takes music metadata as inputs. This paper further introduces Kernel Meta-Training, which is a method of learning a Gaussian Process kernel from a distribution of functions that generates the learned function. For playlist generation, AutoDJ learns a kernel from a large set of albums. This learned kernel is shown to be more effective at predicting users’ playlists than a reasonable hand-designed kernel.

5 0.26846266 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

Author: Stuart N. Wrigley, Guy J. Brown

Abstract: A model of auditory grouping is described in which auditory attention plays a key role. The model is based upon an oscillatory correlation framework, in which neural oscillators representing a single perceptual stream are synchronised, and are desynchronised from oscillators representing other streams. The model suggests a mechanism by which attention can be directed to the high or low tones in a repeating sequence of tones with alternating frequencies. In addition, it simulates the perceptual segregation of a mistuned harmonic from a complex tone. 1

6 0.24608487 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

7 0.22842303 75 nips-2001-Fast, Large-Scale Transformation-Invariant Clustering

8 0.2008097 53 nips-2001-Constructing Distributed Representations Using Additive Clustering

9 0.19170612 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation

10 0.17295587 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

11 0.16894971 196 nips-2001-Very loopy belief propagation for unwrapping phase images

12 0.16257635 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

13 0.15028758 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

14 0.14631069 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

15 0.14615238 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

16 0.14393269 35 nips-2001-Analysis of Sparse Bayesian Learning

17 0.13728949 42 nips-2001-Bayesian morphometry of hippocampal cells suggests same-cell somatodendritic repulsion

18 0.13100225 86 nips-2001-Grammatical Bigrams

19 0.12924504 154 nips-2001-Products of Gaussians

20 0.12746593 30 nips-2001-Agglomerative Multivariate Information Bottleneck

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.012), (17, 0.018), (19, 0.015), (27, 0.084), (30, 0.076), (35, 0.031), (38, 0.013), (47, 0.369), (59, 0.022), (72, 0.051), (79, 0.045), (83, 0.034), (91, 0.119)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77921945 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

Author: Christopher Raphael

2 0.43013763 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

Author: Andrew D. Brown, Geoffrey E. Hinton

3 0.42829728 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Author: Ran El-Yaniv, Oren Souroujon

Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically ﬁnd that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a signiﬁcantly more accurate classiﬁcation. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the eﬀectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1

4 0.42820084 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

Author: Bram Bakker

Abstract: This paper presents reinforcement learning with a Long ShortTerm Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task. 1

5 0.42691427 183 nips-2001-The Infinite Hidden Markov Model

Author: Matthew J. Beal, Zoubin Ghahramani, Carl E. Rasmussen

Abstract: We show that it is possible to extend hidden Markov models to have a countably inﬁnite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the inﬁnitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters deﬁne a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a ﬁnite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be inﬁnite— consider, for example, symbols being possible words appearing in English text.

6 0.42667657 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo

7 0.42652732 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex

8 0.4257319 56 nips-2001-Convolution Kernels for Natural Language

9 0.42569202 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

10 0.42524201 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

11 0.42460734 149 nips-2001-Probabilistic Abstraction Hierarchies

12 0.42367744 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

13 0.42347044 169 nips-2001-Small-World Phenomena and the Dynamics of Information

14 0.42342222 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

15 0.42328376 13 nips-2001-A Natural Policy Gradient

16 0.42281523 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

17 0.42216772 182 nips-2001-The Fidelity of Local Ordinal Encoding

18 0.42192239 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

19 0.42166018 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

20 0.42137796 132 nips-2001-Novel iteration schemes for the Cluster Variation Method