nips nips2001 nips2001-179 knowledge-graph by maker-knowledge-mining

179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo


Source: pdf

Author: Ali Taylan Cemgil, Bert Kappen

Abstract: We present a probabilistic generative model for timing deviations in expressive music. performance. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out. The resulting model is suitable for realtime tempo tracking and transcription and hence useful in a number of music applications such as adaptive automatic accompaniment and score typesetting. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 nl Abstract We present a probabilistic generative model for timing deviations in expressive music. [sent-3, score-0.296]

2 The structure of the proposed model is equivalent to a switching state space model. [sent-5, score-0.081]

3 We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. [sent-6, score-1.318]

4 The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. [sent-7, score-0.094]

5 For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out. [sent-8, score-0.25]

6 The resulting model is suitable for realtime tempo tracking and transcription and hence useful in a number of music applications such as adaptive automatic accompaniment and score typesetting. [sent-9, score-1.322]

7 1 Introduction Automatic music transcription refers to extraction of a high level description from musical performance, for example in form of a music notation. [sent-10, score-0.929]

8 Music notation can be viewed as a list of the pitch levels and corresponding timestamps. [sent-11, score-0.091]

9 Ideally, one would like to recover a score directly frOID: sound. [sent-12, score-0.102]

10 Such a representation of the surface structure of music would be very useful in music information retrieval (Music-IR) and content description of musical material in large audio databases. [sent-13, score-0.822]

11 However, when operating on sampled audio data from polyphonic acoustical signals, extraction of a score-like description is a very challenging auditory scene analysis task [13]. [sent-14, score-0.11]

12 In this paper, we focus on a subproblem in music-ir, where we assume that exact timing information of notes is available, for example as a stream of MIDI! [sent-15, score-0.191]

13 A standard communication protocol especially designed for digital instruments such as keyboards. [sent-17, score-0.123]

14 Each time a key is pressed, a MIDI keyboard generates a short message containing pitch and key velocity. [sent-18, score-0.13]

15 A computer can tag each received message by a timestamp for real-time processing and/or "recording" into a file. [sent-19, score-0.047]

16 A model for tempo tracking and transcription is useful in a broad spectrum of applications. [sent-21, score-0.755]

17 One example is automatic score typesetting, the musical analog of word processing. [sent-22, score-0.266]

18 Almost all score typesetting applications provide a means of automatic generation of a conventional music notation from MIDI data. [sent-23, score-0.62]

19 In conventional music notation, onset time of each note is implicitly represented by the cumulative sum of durations of previous notes. [sent-24, score-0.47]

20 quarter note, eight note), consequently all events in music are placed on a discrete grid. [sent-27, score-0.479]

21 So the basic task in MIDI transcription is to associate discrete grid locations with onsets, Le. [sent-28, score-0.316]

22 However, unless the music is performed with mechanical precision, identification of the correct association becomes difficult. [sent-30, score-0.361]

23 This is due to the fact that musicians introduce intentional (and unintentional) deviations from a mechanical prescription. [sent-32, score-0.242]

24 For example timing of events can be deliberately delayed or pushed. [sent-33, score-0.191]

25 Moreover, the tempo can fluctuate by slowing down or accelerating. [sent-34, score-0.428]

26 In fact, such deviations are natural aspects of expressive performance; in the absence of these, music tends to sound rather dull. [sent-35, score-0.505]

27 Robust and fast quantization and tempo tracking is also an important requirement in interactive performance systems. [sent-36, score-0.644]

28 These are emerging applications that "listen" to the performance for generating an accompaniment or improvisation in real time [10, 12]. [sent-37, score-0.118]

29 At last, such models are also useful in musicology for systematic study and characterization of express~ve timing by principled analysis of existing performance data. [sent-38, score-0.132]

30 2 Model Consider the following generative model for timing deviations in music Tk + "/k-1 Wk-1 + (k Tk-1 + 2 (Ck 'Yk Tk +â‚Ĺšk Ck Wk Ck-1 Wk - Ck-1) (1) (2) (3) (4) In Eq. [sent-39, score-0.546]

31 1, Ck denotes the grid location of k'th onset in a score. [sent-40, score-0.151]

32 The interval between two consecutive onsets in the score is denoted by "/k-1 . [sent-41, score-0.313]

33 For example consider the notation j n which encodes ,,/1:3 == [1 0. [sent-43, score-0.039]

34 We note that Ck are drawn from an infinite (but discrete) set and are increasing in k, i. [sent-51, score-0.038]

35 To allow for different time signatures and alternative rhythmic subdivisions, one can introduce additional hidden variables [1], but this is not addressed in this paper. [sent-53, score-0.105]

36 For example if the tempo is 60 beats per minute (bpm), w == log 1sec == O. [sent-57, score-0.459]

37 Since tempo appears as a scale variable in mapping grid locations on a score to the actual performance time, we have chosen to represent it in the logarithmic scale (eventually a gamma distribution can also be used). [sent-58, score-0.642]

38 This representation is both perceptually plausible and mathematically convenient since a symmetric noise model on w assigns equal probabilities to equal relative c~anges in tempo. [sent-59, score-0.031]

39 Depending upon the interval between consecutive onsets, the model scales the noise covariance; longer jumps in the score allow for more freedom in fluctuating the tempo. [sent-61, score-0.165]

40 3 defines a model of noiseless onsets with variable tempo. [sent-63, score-0.224]

41 We will denote the pair of hidden continuous variables by Zk == (Tk' Wk). [sent-64, score-0.074]

42 Here Yk is the observed onset time of the k'th onset in the performance. [sent-67, score-0.164]

43 The noise term tk models small scale expressive deviations in timing of individual notes and has a Gaussian distribution parameterized by N(tt("(k-l), "E("(k-l)). [sent-68, score-0.453]

44 Such a parameterization is useful for appropriate quantization of phrases (short sequences of notes) that are shifted or delayed as a whole [1]. [sent-69, score-0.194]

45 ill reality, a random walk model for tempo such as in Eq. [sent-70, score-0.492]

46 ill the dynamical model framework such smooth deviations can be allowed by increasing the dimensionality of W by include higher order "inertia" variables [2]. [sent-73, score-0.227]

47 The model is similar to a switching state space model, that has been recently applied in the context of music transcription [11]. [sent-82, score-0.564]

48 The differences are in parameterization and more importantly in the inference method. [sent-83, score-0.038]

49 The pair of continuous hidden variables (Tk' Wk) is denoted by Zk. [sent-85, score-0.074]

50 Both C and Z are hidden; only the onsets Y are observed. [sent-86, score-0.176]

51 We define tempo tracking as a filtering problem == argmax LP(Ck,ZkIYl:k) (5) Zk and rhythm transcription as a MAP state estimation problem argmaxp(Cl:KIY1:K) (6) Cl:K p(Cl:K IY1:K) (7) The exact computation of the quantities in Eq. [sent-87, score-1.06]

52 5 is intractable due to the explosion in the number of mixture components required to represent the exact posterior at each step k. [sent-89, score-0.084]

53 3 Sequential Monte Carlo Sampling Sequential Monte Carlo sampling (a. [sent-91, score-0.059]

54 particle filtering) is an integration method especially powerful for inference in dynamical systems. [sent-94, score-0.205]

55 See [4] for a detailed review of state of the art. [sent-95, score-0.038]

56 Particles at step k are evolved to k + 1 by sequential importance sampling and resampling methods [6]. [sent-97, score-0.155]

57 Once a set of discrete sample points is obtained during the forward phase by sampling, particle approximations to quantities such as the smoothed marginal posterior p(Xk IYl:K) or the maximum a posteriori state sequence (Viterbi path) xr:K can be obtained efficiently. [sent-98, score-0.452]

58 Due to the discrete nature of the approximate representation, resulting algorithms are closely related to standard smoothing and Viterbi algorithms in Hidden Markov models [9, 7, 6]. [sent-99, score-0.033]

59 wi wi Unfortunately, if the hidden state space is of high dimensionality, sampling can be inefficient. [sent-100, score-0.253]

60 Hence increasingly many particles are needed to accurately represent the posterior. [sent-101, score-0.073]

61 Consequently, the estimation of "off-line" quantities such as p(Xk IY1:K) and x~:K becomes very costly since one has to store all past trajectories. [sent-102, score-0.039]

62 For some models, including the one proposed here, one can identify substructures where integrations, conditioned on certain nodes can be computed analytically [5]. [sent-103, score-0.06]

63 In this case the joint marginal posterior is represented as a mixture N p(Ck' Zk IY1:k) ~ """" (i) (i) L. [sent-105, score-0.084]

64 J W k p(Zk ICk ,Y1:k)<5(Ck - C(i) ) k (9) i==l Ici i The particular case of Gaussian p(Zk ) ,Y1:k) is extensively used in diverse applications [8] and reported to give superior results when compared to standard particle filtering [3, 6]. [sent-108, score-0.285]

65 1 Particle Filtering We assume that we have obtained a set- of particles from filtered posterior p(Ck IY1:k). [sent-110, score-0.18]

66 Due to lack of space we do not give the details of the particle filtering algorithm but refer the reader to [6]. [sent-111, score-0.285]

67 One important point to note is that we have to use the optimal proposal distribution given as p(cklc~i~l' Y1:k) oc J dZk- 1:k p(Yklzk' Ck, c~i~l) C) C) p(Zk' Ck IZk-1, ck~_1)P(Zk-1Ick~_1' Y1:k-1) (10) Since the state-space of Ck is effectively infinite, this step is crucial for efficiency. [sent-112, score-0.048]

68 Evaluation of the proposal distribution amounts to looking forward and selecting i a set of high probability candidate grid locations for quantization. [sent-113, score-0.19]

69 Once ) are obtained we can use standard Kalman filtering algorithms to update the Gaussian potentials p(zklcii) , Y1:k). [sent-114, score-0.141]

70 ci 2We linearize the nonlinear observation model 2Wk (en - (Wk). [sent-117, score-0.031]

71 2 Modified Viterbi algorithlll The quantization problem in Eq. [sent-119, score-0.089]

72 Since Z is integrated over, in general all Ck become coupled and the Markov property is lost, i. [sent-121, score-0.032]

73 One possible approximation, that we adapt also here, is to assume smoothed estimates are not much different from filtered estimates [8] i. [sent-124, score-0.11]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ck', 0.442), ('tempo', 0.428), ('music', 0.312), ('onsets', 0.176), ('transcription', 0.171), ('wk', 0.163), ('particle', 0.144), ('filtering', 0.141), ('cl', 0.139), ('zk', 0.134), ('deviations', 0.131), ('tracking', 0.127), ('midi', 0.112), ('tk', 0.107), ('timing', 0.103), ('score', 0.102), ('viterbi', 0.098), ('musical', 0.098), ('quantization', 0.089), ('xk', 0.088), ('onset', 0.082), ('monte', 0.08), ('rhythm', 0.078), ('hidden', 0.074), ('particles', 0.073), ('iyl', 0.07), ('typesetting', 0.07), ('grid', 0.069), ('automatic', 0.066), ('sequential', 0.065), ('ill', 0.064), ('carlo', 0.064), ('expressive', 0.062), ('digital', 0.062), ('kp', 0.061), ('instruments', 0.061), ('filtered', 0.061), ('sampling', 0.059), ('consequently', 0.056), ('accompaniment', 0.056), ('nijmegen', 0.056), ('pitch', 0.052), ('events', 0.05), ('notes', 0.05), ('smoothed', 0.049), ('mechanical', 0.049), ('proposal', 0.048), ('defines', 0.048), ('kalman', 0.047), ('message', 0.047), ('posterior', 0.046), ('durations', 0.045), ('locations', 0.043), ('switching', 0.043), ('audio', 0.043), ('wi', 0.041), ('quantities', 0.039), ('notation', 0.039), ('delayed', 0.038), ('yl', 0.038), ('infinite', 0.038), ('parameterization', 0.038), ('marginal', 0.038), ('exact', 0.038), ('state', 0.038), ('extraction', 0.036), ('posteriori', 0.035), ('consecutive', 0.035), ('discrete', 0.033), ('dynamical', 0.032), ('integrated', 0.032), ('snn', 0.031), ('musicians', 0.031), ('rhythmic', 0.031), ('linearize', 0.031), ('ici', 0.031), ('acoustical', 0.031), ('keyboard', 0.031), ('pressed', 0.031), ('emerging', 0.031), ('improvisation', 0.031), ('argmaxp', 0.031), ('bert', 0.031), ('minute', 0.031), ('perceptually', 0.031), ('evolved', 0.031), ('intentional', 0.031), ('realtime', 0.031), ('substructures', 0.031), ('conventional', 0.031), ('forward', 0.03), ('integration', 0.029), ('useful', 0.029), ('conditioned', 0.029), ('listen', 0.028), ('material', 0.028), ('quarter', 0.028), ('ali', 0.028), ('fluctuating', 0.028), ('ez', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo

Author: Ali Taylan Cemgil, Bert Kappen

Abstract: We present a probabilistic generative model for timing deviations in expressive music. performance. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out. The resulting model is suitable for realtime tempo tracking and transcription and hence useful in a number of music applications such as adaptive automatic accompaniment and score typesetting. 1

2 0.17347205 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

Author: Christopher Raphael

Abstract: We describe a computer system that provides a real-time musical accompaniment for a live soloist in a piece of non-improvised music for soloist and accompaniment. A Bayesian network is developed that represents the joint distribution on the times at which the solo and accompaniment notes are played, relating the two parts through a layer of hidden variables. The network is first constructed using the rhythmic information contained in the musical score. The network is then trained to capture the musical interpretations of the soloist and accompanist in an off-line rehearsal phase. During live accompaniment the learned distribution of the network is combined with a real-time analysis of the soloist's acoustic signal, performed with a hidden Markov model, to generate a musically principled accompaniment that respects all available sources of knowledge. A live demonstration will be provided. 1

3 0.15504558 163 nips-2001-Risk Sensitive Particle Filters

Author: Sebastian Thrun, John Langford, Vandi Verma

Abstract: We propose a new particle filter that incorporates a model of costs when generating particles. The approach is motivated by the observation that the costs of accidentally not tracking hypotheses might be significant in some areas of state space, and next to irrelevant in others. By incorporating a cost model into particle filtering, states that are more critical to the system performance are more likely to be tracked. Automatic calculation of the cost model is implemented using an MDP value function calculation that estimates the value of tracking a particular state. Experiments in two mobile robot domains illustrate the appropriateness of the approach.

4 0.14944419 75 nips-2001-Fast, Large-Scale Transformation-Invariant Clustering

Author: Brendan J. Frey, Nebojsa Jojic

Abstract: In previous work on “transformed mixtures of Gaussians” and “transformed hidden Markov models”, we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. For N ×N images, learning C clusters under N rotations, N scales, N x-translations and N y-translations takes only (C + 2 log N )N 2 scalar operations per iteration. In contrast, the original algorithm takes CN 6 operations to account for these transformations. We give results on learning a 4-component mixture model from a video sequence with frames of size 320 ×240. The model accounts for 360 rotations and 76,800 translations. Each iteration of EM takes only 10 seconds per frame in MATLAB, which is over 5 million times faster than the original algorithm. 1

5 0.12842476 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation

Author: Christophe Andrieu, Nando D. Freitas, Arnaud Doucet

Abstract: In this paper, we extend the Rao-Blackwellised particle filtering method to more complex hybrid models consisting of Gaussian latent variables and discrete observations. This is accomplished by augmenting the models with artificial variables that enable us to apply Rao-Blackwellisation. Other improvements include the design of an optimal importance proposal distribution and being able to swap the sampling an selection steps to handle outliers. We focus on sequential binary classifiers that consist of linear combinations of basis functions , whose coefficients evolve according to a Gaussian smoothness prior. Our results show significant improvements. 1

6 0.10565155 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

7 0.075827226 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

8 0.072354361 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists

9 0.067536503 43 nips-2001-Bayesian time series classification

10 0.065911785 183 nips-2001-The Infinite Hidden Markov Model

11 0.065334395 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines

12 0.055721607 155 nips-2001-Quantizing Density Estimators

13 0.055691279 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex

14 0.053862177 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models

15 0.050246716 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

16 0.05016087 108 nips-2001-Learning Body Pose via Specialized Maps

17 0.04585034 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

18 0.045554515 91 nips-2001-Improvisation and Learning

19 0.0446443 84 nips-2001-Global Coordination of Local Linear Models

20 0.043944333 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.146), (1, -0.031), (2, 0.005), (3, -0.088), (4, -0.155), (5, -0.081), (6, 0.109), (7, 0.193), (8, 0.011), (9, 0.064), (10, 0.066), (11, -0.039), (12, 0.096), (13, -0.022), (14, 0.057), (15, 0.038), (16, 0.065), (17, -0.001), (18, -0.026), (19, 0.029), (20, -0.088), (21, -0.02), (22, -0.023), (23, -0.014), (24, -0.072), (25, -0.033), (26, 0.108), (27, 0.02), (28, -0.247), (29, -0.122), (30, 0.17), (31, -0.18), (32, -0.079), (33, 0.12), (34, 0.075), (35, 0.137), (36, -0.21), (37, -0.126), (38, 0.118), (39, 0.032), (40, -0.09), (41, -0.008), (42, -0.173), (43, 0.035), (44, 0.067), (45, 0.145), (46, 0.092), (47, -0.009), (48, 0.006), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9605754 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo

Author: Ali Taylan Cemgil, Bert Kappen

Abstract: We present a probabilistic generative model for timing deviations in expressive music. performance. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out. The resulting model is suitable for realtime tempo tracking and transcription and hence useful in a number of music applications such as adaptive automatic accompaniment and score typesetting. 1

2 0.83380049 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

Author: Christopher Raphael

Abstract: We describe a computer system that provides a real-time musical accompaniment for a live soloist in a piece of non-improvised music for soloist and accompaniment. A Bayesian network is developed that represents the joint distribution on the times at which the solo and accompaniment notes are played, relating the two parts through a layer of hidden variables. The network is first constructed using the rhythmic information contained in the musical score. The network is then trained to capture the musical interpretations of the soloist and accompanist in an off-line rehearsal phase. During live accompaniment the learned distribution of the network is combined with a real-time analysis of the soloist's acoustic signal, performed with a hidden Markov model, to generate a musically principled accompaniment that respects all available sources of knowledge. A live demonstration will be provided. 1

3 0.47491598 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation

Author: Christophe Andrieu, Nando D. Freitas, Arnaud Doucet

Abstract: In this paper, we extend the Rao-Blackwellised particle filtering method to more complex hybrid models consisting of Gaussian latent variables and discrete observations. This is accomplished by augmenting the models with artificial variables that enable us to apply Rao-Blackwellisation. Other improvements include the design of an optimal importance proposal distribution and being able to swap the sampling an selection steps to handle outliers. We focus on sequential binary classifiers that consist of linear combinations of basis functions , whose coefficients evolve according to a Gaussian smoothness prior. Our results show significant improvements. 1

4 0.47192156 91 nips-2001-Improvisation and Learning

Author: Judy A. Franklin

Abstract: This article presents a 2-phase computational learning model and application. As a demonstration, a system has been built, called CHIME for Computer Human Interacting Musical Entity. In phase 1 of training, recurrent back-propagation trains the machine to reproduce 3 jazz melodies. The recurrent network is expanded and is further trained in phase 2 with a reinforcement learning algorithm and a critique produced by a set of basic rules for jazz improvisation. After each phase CHIME can interactively improvise with a human in real time. 1 Foundations Jazz improvisation is the creation of a jazz melody in real time. Charlie Parker, Dizzy Gillespie, Miles Davis, John Coltrane, Charles Mingus, Thelonious Monk, and Sonny Rollins et al. were the founders of bebop and post bop jazz [9] where drummers, bassists, and pianists keep the beat and maintain harmonic structure. Other players improvise over this structure and even take turns improvising for 4 bars at a time. This is called trading fours. Meanwhile, artificial neural networks have been used in computer music [4, 12]. In particular, the work of (Todd [11]) is the basis for phase 1 of CHIME, a novice machine improvisor that learns to trade fours. Firstly, a recurrent network is trained with back-propagation to play three jazz melodies by Sonny Rollins [1], as described in Section 2. Phase 2 uses actor-critic reinforcement learning and is described in Section 3. This section is on jazz basics. 1.1 Basics: Chords, the ii-V-I Chord Progression and Scales The harmonic structure mentioned above is a series of chords that may be reprated and that are often grouped into standard subsequences. A chord is a group of notes played simultaneously. In the chromatic scale, C-Db-D-Eb-E-F-Gb-G-Ab-A-Bb-B-C, notes are separated by a half step. A flat (b) note is a half step below the original note; a sharp (#) is a half above. Two half steps are a whole step. Two whole steps are a major third. Three half steps are a minor third. A major triad (chord) is the first or tonic note, then the note a major third up, then the note a minor third up. When F is the tonic, F major triad is F-A-C. A minor triad (chord) is the tonic ¡ www.cs.smith.edu/˜jfrankli then a minor third, then a major third. F minor triad is F-Ab-C. The diminished triad is the tonic, then a minor third, then a minor third. F diminished triad is F-Ab-Cb. An augmented triad is the tonic, then a major third, then a major third. The F augmented triad is F-A-Db. A third added to the top of a triad forms a seventh chord. A major triad plus a major third is the major seventh chord. F-A-C-E is the F major seventh chord (Fmaj7). A minor triad plus a minor third is a minor seventh chord. For F it is F-Ab-C-Eb (Fm7). A major triad plus a minor third is a dominant seventh chord. For F it is F-A-C-Eb (F7). These three types of chords are used heavily in jazz harmony. Notice that each note in the chromatic scales can be the tonic note for any of these types of chords. A scale, a subset of the chromatic scale, is characterized by note intervals. Let W be a whole step and H be a half. The chromatic scale is HHHHHHHHHHHH. The major scale or ionian mode is WWHWWWH. F major scale is F-G-A-Bb-C-D-E-F. The notes in a scale are degrees; E is the seventh degree of F major. The first, third, fifth, and seventh notes of a major scale are the major seventh chord. The first, third, fifth, and seventh notes of other modes produce the minor seventh and dominant seventh chords. Roman numerals represent scale degrees and their seventh chords. Upper case implies major or dominant seventh and lower case implies minor seventh [9]. The major seventh chord starting at the scale tonic is the I (one) chord. G is the second degree of F major, and G-Bb-D-F is Gm7, the ii chord, with respect to F. The ii-V-I progression is prevalent in jazz [9], and for F it is Gm7-C7-Fmaj7. The minor ii-V-i progression is obtained using diminished and augmented triads, their seventh chords, and the aeolian mode. Seventh chords can be extended by adding major or minor thirds, e.g. Fmaj9, Fmaj11, Fmaj13, Gm9, Gm11, and Gm13. Any extension can be raised or lowered by 1 step [9] to obtain, e.g. Fmaj7#11, C7#9, C7b9, C7#11. Most jazz compositions are either the 12 bar blues or sectional forms (e.g. ABAB, ABAC, or AABA) [8]. The 3 Rollins songs are 12 bar blues. “Blue 7” has a simple blues form. In “Solid” and “Tenor Madness”, Rollins adds bebop variations to the blues form [1]. ii-V-I and VI-II-V-I progressions are added and G7+9 substitutes for the VI and F7+9 for the V (see section 1.2 below); the II-V in the last bar provides the turnaround to the I of the first bar to foster smooth repetition of the form. The result is at left and in Roman numeral notation Bb7 Bb7 Bb7 Bb7 I I I I Eb7 Eb7 Bb7 G7+9 IV IV I VI at right: Cm7 F7 Bb7 G7+9 C7 F7+9 ii V I VI II V 1.2 Scale Substitutions and Rules for Reinforcement Learning First note that the theory and rules derived in this subsection are used in Phase 2, to be described in Section 3. They are presented here since they derive from the jazz basics immediately preceding. One way a novice improvisor can play is to associate one scale with each chord and choose notes from that scale when the chord is presented in the musical score. Therefore, Rule 1 is that an improvisor may choose notes from a “standard” scale associated with a chord. Next, the 4th degree of the scale is often avoided on a major or dominant seventh chord (Rule 3), unless the player can resolve its dissonance. The major 7th is an avoid note on a dominant seventh chord (Rule 4) since a dominant seventh chord and its scale contain the flat 7th, not the major 7th. Rule 2 contains many notes that can be added. A brief rationale is given next. The C7 in Gm7-C7-Fmaj7 may be replaced by a C7#11, a C7+ chord, or a C7b9b5 or C7alt chord [9]. The scales for C7+ and C7#11 make available the raised fourth (flat 5), and flat 6 (flat 13) for improvising. The C7b9b5 and C7alt (C7+9) chords and their scales make available the flat9, raised 9, flat5 and raised 5 [1]. These substitutions provide the notes of Rule 2. These rules (used in phase 2) are stated below, using for reinforcement values very bad (-1.0), bad (-0.5), a little bad (-0.25), ok (0.25), good (0.5), and very good (1.0). The rules are discussed further in Section 4. The Rule Set: 1) Any note in the scale associated with the chord is ok (except as noted in rule 3). 2) On a dominant seventh, hip notes 9, flat9, #9, #11, 13 or flat13 are very good. One hip note 2 times in a row is a little bad. 2 hip notes more than 2 times in a row is a little bad. 3) If the chord is a dominant seventh chord, a natural 4th note is bad. 4) If the chord is a dominant seventh chord, a natural 7th is very bad. 5) A rest is good unless it is held for more than 2 16th notes and then it is very bad. 6) Any note played longer than 1 beat (4 16th notes) is very bad. 7) If two consecutive notes match the human’s, that is good. 2 CHIME Phase 1 In Phase 1, supervised learning is used to train a recurrent network to reproduce the three Sonny Rollins melodies. 2.1 Network Details and Training The recurrent network’s output units are linear. The hidden units are nonlinear (logistic function). Todd [11] used a Jordan recurrent network [6] for classical melody learning and generation. In CHIME, a Jordan net is also used, with the addition of the chord as input (Figure 1. 24 of the 26 outputs are notes (2 chromatic octaves), the 25th is a rest, and the 26th indicates a new note. The output with the highest value above a threshold is the next note, including the rest output. The new note output indicates if this is a new note, or if it is the same note being held for another time step ( note resolution). ¥£ ¡ ¦¤¢  The 12 chord inputs (12 notes in a chromatic scale), are 1 or 0. A chord is represented as its first, third, fifth, and seventh notes and it “wraps around” within the 12 inputs. E.g., the Fm7 chord F-Ab-C-Eb is represented as C, Eb, F, Ab or 100101001000. One plan input per song enables distinguishing between songs. The 26 context inputs use eligibility traces, giving the hidden units a decaying history of notes played. CHIME (as did Todd) uses teacher forcing [13], wherein the target outputs for the previous step are used as inputs (so erroneous outputs are not used as inputs). Todd used from 8 to 15 hidden units; CHIME uses 50. The learning rate is 0.075 (Todd used 0.05). The eligibility rate is 0.9 (Todd used 0.8). Differences in values perhaps reflect contrasting styles of the songs and available computing power. Todd used 15 output units and assumed a rest when all note units are “turned off.” CHIME uses 24 output note units (2 octaves). Long rests in the Rollins tunes require a dedicated output unit for a rest. Without it, the note outputs learned to turn off all the time. Below are results of four representative experiments. In all experiments, 15,000 presentations of the songs were made. Each song has 192 16th note events. All songs are played at a fixed tempo. Weights are initialized to small random values. The squared error is the average squared error over one complete presentation of the song. “Finessing” the network may improve these values. The songs are easily recognized however, and an exact match could impair the network’s ability to improvise. Figure 2 shows the results for “Solid.” Experiment 1. Song: Blue Seven. Squared error starts at 185, decreases to 2.67. Experiment 2. Song: Tenor Madness. Squared error starts at 218, decreases to 1.05. Experiment 3. Song: Solid. Squared error starts at 184, decreases to 3.77. Experiment 4. Song: All three songs: Squared error starts at 185, decreases to 36. Figure 1: Jordan recurrent net with addition of chord input 2.2 Phase 1 Human Computer Interaction in Real Time In trading fours with the trained network, human note events are brought in via the MIDI interface [7]. Four bars of human notes are recorded then given, one note event at a time to the context inputs (replacing the recurrent inputs). The plan inputs are all 1. The chord inputs follow the “Solid” form. The machine generates its four bars and they are played in real time. Then the human plays again, etc. An accompaniment (drums, bass, and piano), produced by Band-in-a-Box software (PG Music), keeps the beat and provides chords for the human. Figure 3 shows an interaction. The machine’s improvisations are in the second and fourth lines. In bar 5 the flat 9 of the Eb7 appears; the E. This note is used on the Eb7 and Bb7 chords by Rollins in “Blue 7”, as a “passing tone.” D is played in bar 5 on the Eb7. D is the natural 7 over Eb7 (with its flat 7) but is a note that Rollins uses heavily in all three songs, and once over the Eb7. It may be a response to the rest and the Bb played by the human in bar 1. D follows both a rest and a Bb in many places in “Tenor Madness” and “Solid.” In bar 6, the long G and the Ab (the third then fourth of Eb7) figure prominently in “Solid.” At the beginning of bar 7 is the 2-note sequence Ab-E that appears in exactly the same place in the song “Blue 7.” The focus of bars 7 and 8 is jumping between the 3rd and 4th of Bb7. At the end of bar 8 the machine plays the flat 9 (Ab) then the flat 3 (Bb), of G7+9. In bars 13-16 the tones are longer, as are the human’s in bars 9-12. The tones are the 5th, the root, the 3rd, the root, the flat 7, the 3rd, the 7th, and the raised fourth. Except for the last 2, these are chord tones. 3 CHIME Phase 2 In Phase 2, the network is expanded and trained by reinforcement learning to improvise according to the rules of Section 1.2 and using its knowledge of the Sonny Rollins songs. 3.1 The Expanded Network Figure 4 shows the phase 2 network. The same inputs plus 26 human inputs brings the total to 68. The weights obtained in phase 1 initialize this network. The plan and chord weights Figure 2: At left “Solid” played by a human; at right the song reproduced by the ANN. are the same. The weights connecting context units to the hidden layer are halved. The same weights, halved, connect the 26 human inputs to the hidden layer. Each output unit gets the 100 hidden units’ outputs as input. The original 50 weights are halved and used as initial values of the two sets of 50 hidden unit weights to the output unit. 3.2 SSR and Critic Algorithms Using actor-critic reinforcement learning ([2, 10, 13]), the actor chooses the next note to play. The critic receives a “raw” reinforcement signal from the critique made by the . A rules of Section 1.2. For output j, the SSR (actor) computes mean Gaussian distribution with mean and standard deviation chooses the output . is generated, the critic modifies and produces . is further modified by a self-scaling algorithm that tracks, via moving average, the maximum and minimum reinforcement and uses them to scale the signal to produce .

5 0.43620536 75 nips-2001-Fast, Large-Scale Transformation-Invariant Clustering

Author: Brendan J. Frey, Nebojsa Jojic

Abstract: In previous work on “transformed mixtures of Gaussians” and “transformed hidden Markov models”, we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. For N ×N images, learning C clusters under N rotations, N scales, N x-translations and N y-translations takes only (C + 2 log N )N 2 scalar operations per iteration. In contrast, the original algorithm takes CN 6 operations to account for these transformations. We give results on learning a 4-component mixture model from a video sequence with frames of size 320 ×240. The model accounts for 360 rotations and 76,800 translations. Each iteration of EM takes only 10 seconds per frame in MATLAB, which is over 5 million times faster than the original algorithm. 1

6 0.33027849 163 nips-2001-Risk Sensitive Particle Filters

7 0.29924962 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

8 0.29098079 113 nips-2001-Learning a Gaussian Process Prior for Automatically Generating Music Playlists

9 0.25135952 14 nips-2001-A Neural Oscillator Model of Auditory Selective Attention

10 0.23310514 43 nips-2001-Bayesian time series classification

11 0.22899865 183 nips-2001-The Infinite Hidden Markov Model

12 0.22875135 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables

13 0.21391192 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

14 0.20522414 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

15 0.20503326 79 nips-2001-Gaussian Process Regression with Mismatched Models

16 0.20368919 122 nips-2001-Model Based Population Tracking and Automatic Detection of Distribution Changes

17 0.20227686 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

18 0.20043866 3 nips-2001-ACh, Uncertainty, and Cortical Inference

19 0.19864652 84 nips-2001-Global Coordination of Local Linear Models

20 0.18381883 107 nips-2001-Latent Dirichlet Allocation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.043), (17, 0.028), (19, 0.023), (27, 0.087), (30, 0.1), (35, 0.297), (38, 0.02), (47, 0.018), (59, 0.06), (72, 0.032), (79, 0.044), (83, 0.027), (88, 0.022), (91, 0.108)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82495993 179 nips-2001-Tempo tracking and rhythm quantization by sequential Monte Carlo

Author: Ali Taylan Cemgil, Bert Kappen

Abstract: We present a probabilistic generative model for timing deviations in expressive music. performance. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out. The resulting model is suitable for realtime tempo tracking and transcription and hence useful in a number of music applications such as adaptive automatic accompaniment and score typesetting. 1

2 0.53024286 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

Author: Dieter Fox

Abstract: Over the last years, particle filters have been applied with great success to a variety of state estimation problems. We present a statistical approach to increasing the efficiency of particle filters by adapting the size of sample sets on-the-fly. The key idea of the KLD-sampling method is to bound the approximation error introduced by the sample-based representation of the particle filter. The name KLD-sampling is due to the fact that we measure the approximation error by the Kullback-Leibler distance. Our adaptation approach chooses a small number of samples if the density is focused on a small part of the state space, and it chooses a large number of samples if the state uncertainty is high. Both the implementation and computation overhead of this approach are small. Extensive experiments using mobile robot localization as a test application show that our approach yields drastic improvements over particle filters with fixed sample set sizes and over a previously introduced adaptation technique.

3 0.51705635 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

Author: M. Schmitt

Abstract: Recurrent neural networks of analog units are computers for realvalued functions. We study the time complexity of real computation in general recurrent neural networks. These have sigmoidal, linear, and product units of unlimited order as nodes and no restrictions on the weights. For networks operating in discrete time, we exhibit a family of functions with arbitrarily high complexity, and we derive almost tight bounds on the time required to compute these functions. Thus, evidence is given of the computational limitations that time-bounded analog recurrent neural networks are subject to. 1

4 0.51610041 149 nips-2001-Probabilistic Abstraction Hierarchies

Author: Eran Segal, Daphne Koller, Dirk Ormoneit

Abstract: Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in “nearby” classes in the taxonomy are similar. In this paper, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a probabilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clusters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchical agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data. We present experimental results on synthetic data, and on real data in the domains of gene expression and text.

5 0.51544011 46 nips-2001-Categorization by Learning and Combining Object Parts

Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio

Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.

6 0.5146389 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

7 0.51454329 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

8 0.51416141 56 nips-2001-Convolution Kernels for Natural Language

9 0.51394176 190 nips-2001-Thin Junction Trees

10 0.51367861 6 nips-2001-A Bayesian Network for Real-Time Musical Accompaniment

11 0.51302809 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

12 0.5126816 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

13 0.51094753 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's

14 0.50998408 161 nips-2001-Reinforcement Learning with Long Short-Term Memory

15 0.50887311 74 nips-2001-Face Recognition Using Kernel Methods

16 0.50703061 13 nips-2001-A Natural Policy Gradient

17 0.50540936 169 nips-2001-Small-World Phenomena and the Dynamics of Information

18 0.50496697 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

19 0.50444841 60 nips-2001-Discriminative Direction for Kernel Classifiers

20 0.5043813 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes