MORE THAN A FEELING—SOME TECHNICAL DETAILS OF
SWING RHYTHM IN MUSIC
Kenneth A. Lindsay
tlafx, Ashland, Oregon 97520
and
Peter R. Nordquist
Southern Oregon University, Ashland, Oregon 97520
Introduction “The real world of music Prior research
If you ask a musician what makes Cholakis (1995) cataloged an exten-
music swing, he will reply that Swing is as performed is more sive set of Jazz drummers and analyzed
a feeling, and may mention counting or the statistical nature of how each musi-
subdividing the beat. Commonly, in complex and interesting cian swung the beat in a different style by
classic Jazz for example, triplet note sub- extracting the ratio of temporal intervals
division is a feature in Swing music, but than the mechanistic world for notes as played. He claimed that this
this is not the entire story. Otherwise a analysis allowed MIDI sequencer music
waltz (3/4 meter), or a 6/8 or 12/8 meter of sheet music.” with a more “human” feel to be pro-
piece would inherently swing. Some duced. Gabriellson (1987 and 2000)
pieces do, and some do not. There are also musical examples observed that rhythmic variation is almost universal in music
that one knows intuitively have Swing, but on close analysis do performance and reports that listeners generally prefer music
not appear to have triplet subdivision either as the main or played with rhythmic expression than music played strictly by
only feature that contributes to the Swing. In this article, the the metronome. This phenomenon applies to popular music,
authors presume that accent (differences in loudness between European Classical music, and non-European traditions, such
note events) also contributes to Swing, but thus far our as African and Middle Eastern music. Waadelund (2004) has
research has focused solely on the timing aspects of swing linked swing style to body movement, and used video record-
rhythm. One aspect of Swing is interpreted to be the changes ings to study the body english of drummers in order to correlate
in the rhythmic structure around a solid and precise beat. It is their movements to the rhythmic style being played. Friberg
the variations in that structure that are swinging. and Sundstrom (1999 and 2002) extended Cholakis’ swing ratio
Since classic Jazz is not the only representative of Swing, work. Guoyon (2005) developed computational signal process-
the authors want to extend the definition of Swing to include all ing techniques to change the swing feel in a music sample.
musical styles that might be considered to “swing” by some Hamer (2000) puts a cultural slant on Friberg’s and other’s
valid metric, e.g., the musicians or dancers think the music is research, as does Birch (2003). Several software companies have
swinging. An ad hoc cultural definition rather than a technical products aimed at training musicians to understand and play
definition is used to describe Swing: it is a property of music as various types of Swing.
played which causes listeners to dance or otherwise move their In our research, extensive use was made of the standard
bodies in a cyclical, energetic, rhythmic manner. This defini- spectrogram, i.e., the short time Fourier transform (STFT),
tion allows consideration of a broader range of music than most for extracting the rhythms of different instruments. Fulop
prior research into Swing rhythm as well as to distinguish and Fitz (2006) describe a newly rediscovered form of the
between Swing and other types of rhythmic expression. spectrogram which we consider to be a major advance in this
Rhythmic expression is the parent category of Swing, and information processing approach. The new spectrogram
includes many examples of differences in music as played com- allows better time and frequency resolution for a given data
pared to the strict metronomic timing that is specified in the set, and makes available information which is ignored in tra-
written form of music. This rigid structural framework is ditional uses of fast Fourier transforms (FFTs) and STFTs,
referred to as Mozart-Bach or MB notation due to its historical such as instantaneous phase and frequency.
origins. It is not implied that European Classical music is only Many prior researchers have analyzed rhythm by using
played strictly by the metronome, however, this mind set is statistical analysis of note events in musical samples. This can
quite common in the training of musicians in the Western aca- be a useful technique, but we assert that the performance of
demic tradition. The real world of music as performed is more music, whether by human or computer, is not a statistical
complex and interesting than the mechanistic world of sheet process. Rather, each note event relates to other note events
music, just as a movie or stage play performance has more in very specific ways, and metaphors other than statistical
depth and expression than is apparent by reading the script. analysis, such as symbolic relationships or local measures of
This article will provide a short summary of prior computer sci- specific timing between note events, should be used as appro-
ence research into Swing rhythm and the analysis methods priate. An obvious example is the hierarchical timing rela-
used will be briefly described. Finally, the fun stuff—a detailed tions between repetitive groups of note events at different
technical analysis of the timing variation for a variety of styles time scales. This gives rise to common musical features such
of Swing music will be given. as meter, beat, and subdivision. Statistical analysis can be
Technical Anaylsis of Swing Music 31
Fig 1. Spectrogram of the introduction to “It don’t mean a thing (if it ain’t got that swing).”
those of the other instruments. In this
way the rhythm and the Swing for each
instrument was extracted.
Figure 1 illustrates a typical spectro-
gram image. The musical sample is the
first 19 seconds of the piece It Don't
Mean a Thing (if it ain't got that swing)
that was recorded by Duke Ellington and
Louis Armstrong in 1962. The first 4.5
seconds of the 19 second sample are
dominated by a series of thin yellow/red
spikes that are produced from the sound
Fig 1a. Close-up of Fig. 1 showing the first 4.5 seconds (piano, bass and hi-hat cymbal) of the 19 second sample. of the hi-hat cymbal. The remaining 14
seconds are dominated by Armstrong's
trumpet solo. The introduction is
expanded in Fig. 1a. In the low frequen-
cy portion of Fig. 1a there is a dense con-
centration of red that is produced by the
piano and bass. Further expansion of the
low frequencies (Fig. 1b) shows more
details. To analyze the timing details, a
high frequency band (7500 to 22,000
Hz) was chosen to isolate the hi-hat
cymbal note events, and several low fre-
quency bands that contain the piano
(850 to 1020 Hz), (240 to 850 Hz), and
Fig 1b. Close-up showing piano and bass portion.
bass (20 to 240 Hz). The objective was to
used for measuring the change of tempo or simple swing identify and separate musical note events for each instrument,
ratio, but it is a mistake to think that the meaning of the and to extract the relative timing details so that the rhythm
music is statistical. Rather, the meaning is in the specific could be specified directly from the recording, rather than
details of the many complex forms of the Swing. approaching from the perspective of sheet music.
After selecting useful frequency bands in the spectrogram,
Analysis methods time series graphs of the changing amplitude of each frequency
The first objective in the analysis was to identify and sepa- band was created. Figure 1c illustrates the process of creating
rate the various instruments that have combined to play the time series power graphs from the spectrogram. This was
musical selection. The basic tool that was used was the STFT, a accomplished by adding the values for each separate time slice
standard digital signal processing (DSP) tool. First, an audio in each frequency band. The sum of a time slice in a frequency
sample was divided into short time slices of a few milliseconds band was plotted as the Y value of the time series power graph
each. Then a window function (e.g., Hamming window) was for that frequency band at that point in time. The X value for
applied to each slice to reduce aliasing effects. Finally, an FFT the graph was set as the elapsed time (in the original recording)
was used to obtain the frequency spectrum of each time slice. that corresponded to the time slice in the spectrogram. Values
The choice of overlap between time slices determined the tem- of the amplitude were obtained from the color in the spectro-
poral resolution. This process yielded a view of how the sample's gram. Each color represented a value in the spectral data set,
frequency content that was plotted in a spectrogram changed specifically, the amplitude coefficient of the Fourier component
with time. Judicious selection of various frequency bands in the for a frequency. This process allowed the amplitude of each
spectrogram distinguished one instrument's note events from instrument in the ensemble to be isolated as a function of time.
32 Acoustics Today, July 2007
[Link] of the CHKDOT plot formed from the frequency bands containing the piano (upper notes and lower notes) and the bass.
The next objective was to establish the elapsed time in mond. Although the algorithm is adequate, it is not ideal since
the original recording for a “note event” (the “beat” when the in real music there may be numerous artifacts that can “mis-
note is played), i.e., the loudest time point in the vicinity of a lead” the algorithm. This is due to the fact that not all note
sound pulse. The number of values in a time/frequency tile events are clear, sharp and precise. The notes played by the hi-
(one time slice for a frequency range) is typically between a hat in Fig. 1d can be seen to be precise, but the notes played by
few dozen and a few hundred, depending on the size of fre- the piano are not. In the second and third graphs from the bot-
quency range. The set of summed totals in each frequency tom there are collections of two, three or four small ripples at
band for all time slices was used to create the time series the top of some of the peaks. These may be caused by two,
amplitude graph for that frequency range. There were the three or four fingers hitting the piano keys that are not pre-
same number of points in the time series as there were in the cisely synchronized. Meticulous listening of the original
time slices in the spectrogram simplifying time comparisons recording can reveal the multiplicity of key note events in this
between the different types of plots. The several time series as frequency range. The first event in such a cluster was chosen as
played by each instrument or group of instruments that was the keynote event. Addressing the question of what was the
generated from the chosen set of frequency bands were then musician's intention, or whether the choice of note event time
stack plotted from low to high frequen-
cy as determined by a MATLAB script
developed for this work. The plot and
the program are called CHKDOT. The
plot (see Fig. 1d) shows time aligned
musical events for all frequency bands,
as played by each instrument.
The computer code searched the
CHKDOT waveforms for peaks repre-
senting the note events (peak amplitude
in the frequency band). The time loca-
tions of these “note events” were extract-
ed automatically by an algorithm that
chose the point where the graph first
turns back downwards immediately fol-
lowing a sharp vertical rise above some
predetermined threshold level. This is
illustrated in Fig. 1d where the note
event in each peak is plotted as a red dia- Fig 1d. CHKDOT plot of the Introduction to “It don’t mean….”
Technical Anaylsis of Swing Music 33
tions where the precise time of a note
event becomes ambiguous, e.g., bot-
tom three time series—piano upper,
piano lower, and bass.
The less distinct waveforms, espe-
cially the bass, are spread out more in
time than the sharp events, indicating
that the attack envelope of the sound is
slower for these events. These note
events—actual time locations—may
also be imprecise. Often, the piano and
bass sounds obscure each other.
Separating these overlapping note events
would need more sophisticated signal
processing techniques than are current-
ly used in this study. Nonetheless, it is
fairly easy to identify enough note events
to specify the rhythmic timing details
Fig 1e. Construction of the DIFFDOT plot formed from the note event time deltas.
since there is much redundancy in the
rhythm. These details are enough to
reveal the Swing.
To mark the musical subdivision,
note events that represent a pulse are
first selected to use for the basic beat in
the musical sample, such as the down-
beat in a musical measure. This main
beat can be subdivided in any conven-
ient way, depending on the rhythm to
be measured. Because triplets are a
common timing feature in Swing, it
was decided to subdivide the main beat
by six. These subdivisions are marked
with pink lines that provide for the
location of a backbeat on the third pink
line and triplets on the other pink lines
in the same measure.
While triplets can be and often are
Fig 1f. DIFFDOT plot of the Introduction to “It don’t mean….” marked in sheet music, the standard
subdivision of “Mozart-Bach” (MB)
location for this study is identical to the perceived time location notation is by factors of 2. This is one reason why notating
by a listener calls for further research. In this article, the focus Swing music is somewhat difficult—triplets do not fit natural-
is only on characterizing rhythmic timing and it is believed ly into a “subdivide by 2” metaphor. It will be demonstrated
that our choice, while slightly arbitrary and ambiguous in later that Swing can also contain subdivisions that are neither
some cases, is nonetheless reasonable for the current context. factors of 2 nor 3. The approach used in this study avoided the
The musical meter and subdivision were marked in a limitations of subdivision that are inherent to MB notation.
straightforward way on the CHKDOT plots. The black and The actual note events in the recording are used to deter-
green vertical lines delineate the start or downbeat of each mine the musical meter and subdivision of the beat in the
musical measure. There are eight measures in the introduc- CHKDOT diagram (Fig. 1d). Essentially the reverse of play-
tion. It can be seen that Armstrong, on trumpet, picked up his ing a tune by reading sheet music, note information was
solo on the eighth measure. Figure 1d shows the breakdown extracted from the recording which could be used to generate
for the hi-hat cymbal and piano/bass parts in It Don’t Mean a sheet music. The pulse in Fig. 1d is marked by green and
Thing. The note events, marked by red diamonds, are placed black vertical lines, which correspond to the downbeat of the
along the invisible line in the center of the horizontal frequen- measure in MB notation (a two measure phrase, one green
cy band that contains the time series graph. In Fig. 1d there is and one black). Each musical measure was subdivided by six,
one set of note events for the low frequency band (850 to 1020 looking for triplet notes of the classic Jazz Swing pattern and
Hz) and a second set of note events for the high frequency this subdivision was marked by using six pink lines in the
band (7500 to 22,000 Hz). Notice that some note events are CHKDOT diagram. The pink line exactly in the middle
sharp and distinct, e.g., the upper waveform—the hi-hat cym- between a black and green line represents the time location of
bal—while other time series waveforms have many jagged sec- the backbeat of the rhythm. Thus it is observed that the
34 Acoustics Today, July 2007
piano/bass peaks are on the downbeat and backbeat, with by looking at the spatial patterns it is possible to get an intu-
diamond markers on certain backbeats in the third time itive sense of how the Swing works.
series up from the bottom of the chart. These events were In addition to the time differences between note events,
used to mark the pulse. The hi-hat cymbal note events in the the DIFFDOT plots can also show the variations in time loca-
time series at the top occur on the downbeat, backbeat and tions of repetitive musical events extracted from the CHK-
triplet pickup to the downbeat and backbeat. The triplet tim- DOT plots, such as pulse, backbeat and swung notes. This is
ing is indicated by note events on a pink line just ahead of a not a feature which can be written in MB notation. The
black or green line. DIFFDOT plots also clearly show how on some beats two
To analyze Swing rhythm it must be known when the instruments may not be precisely synchronized—in some
beat occurred, the deviations of the beats of each instrument cases, the hi-hat plays slightly before the piano note event,
from their mean, and from each other. This is performed in and in other cases, the reverse is true. This can be read direct-
MATLAB by a second program called DIFFDOT which ly by looking to see whether the green line is to left or right
extracts the time differences (time delta) between note of the red line for that particular time location. Only the beat
events. Delta corresponds to the length of a musical note in in the center of the graph is exactly synchronized.
MB notation—1/4 note, 1/2 note, 1/8 note etc. The pulse is Note that the CHKDOT diagrams are a direct represen-
used for the master time clock (whole note), and a time delta tation of standard MB subdivision and counting, albeit with
with length 1/2 of the pulse would be a half-note in MB nota- more fine grained timing information included, whereas the
tion, 1/4 of the pulse length would be a quarter-note and so DIFFDOT plots are a novel view of the same information,
on. Because the beat can be subdivided by any number that essentially looking at the “first difference” form of the origi-
makes sense for a musical sample, triplets can be easily nal timing information.
accommodated (divide by 3) or any other note time duration. To process each musical sample into a spectrogram, a
Since the pulse note event timings have some variation, the short audio clip that is typically ten to twenty seconds long
minimum, maximum, and mean or average of the time dif- was used. These are edited to be played with seamless loop-
ferences are notated, and the mean value is used as the ing, such as in a QuickTime player, to listen to the rhythm
canonical pulse time to subdivide the beat. very carefully for extended periods of time. While this is not
Figure 1e shows the mapping process from a CHKDOT strictly needed for the analysis, it was found that it can
plot to a DIFFDOT plot. The two time series in Fig. 1d that enhance greatly both enjoyment and understanding of the
were marked with note events—hi-hat cymbal and piano rhythms. Anomalies as short as five or ten milliseconds are
(upper)—are superimposed over the DIFFDOT plot, Fig. 1e,
for the same time range. The elapsed time on the X axis is the
same for both forms. A red diamond on the CHKDOT plot
maps to a circle on the DIFFDOT plot—red circles for the hi-
hat, and green circles for the piano. The X position of match-
ing diamond/circle pairs is the same. The Y position of the
circles indicates the time from that note event until the next
note event in the set. Thus longer notes, such as the pulse, are
at the top of the DIFFDOT plot, and shorter notes are in the
lower half of the plot.
In Figs.1e and 1f, the red circles are the hi-hat note
events. Notice the first three red circles are fairly evenly
timed on the backbeat (1/2) of the pulse. These three time
deltas correspond to the first four diamonds in the corre-
sponding time series graph. After four note events, the hi-hat
starts to play triplet notes, clearly visible on the pink subdivi-
sion lines in the CHKDOT diagram (Fig. 1d) and transferred
to the DIFFDOT diagram (Fig. 1f) onto the 1/3 and 1/6—
lying between 1/6 and 1/8, really. These note events on the
1/3 and 1/6 lines of Figs. 1e and 1f are the time deltas between
the swung notes in Figs. 1e and 1f, and the beats immediate-
ly before and after: i.e., 1/2 - 1/3 = 1/6. The slight imprecision
of the note timings in this example indicate a somewhat loose
rhythmic style for this recording. Later a recording which has
a very tight rhythmic style will be analyzed. This is another
aspect of the music performance that can be read directly
from the DIFFDOT diagram.
Note events are essentially transferred one for one from
the CHKDOT to the DIFFDOT plots. CHKDOT plots are
more intuitive to read since they parallel standard musical
notation. DIFFDOT plots may require careful inspection, but
Technical Anaylsis of Swing Music 35
sufficient to be perceptible as a break in the rhythmic flow, of Fourier analysis are clear in the coarse low frequency reso-
distinguishing them from editing artifacts that may cause an lution of the piano and bass data. Fulop and Fitz’s (2006) reas-
unnatural transition in the audio waveform, like a click or signed spectrogram technique would have revealed much
pop. For these reasons editing at zero crossing points in the more useful information that is obscure in the current figures.
audio waveform is desired although it may not be sufficient
to avoid all artifacts that can be perceived either explicitly or The rhythms played by the bass/piano and hi-hat are
intuitively by a well trained human ear. plotted in their corresponding frequency bands as shown in
Choice of frequency resolution and short time Fourier Fig. 1d as a time series. Note that events were marked at the
transform window overlap was constant for each processing power peaks of the waveforms, and their temporal locations
run, but may differ for different samples. Sometimes a single were collected. This sample was analyzed using five millisec-
sample was processed repeatedly, using several different ond temporal resolution that was sufficiently fine grained to
choices of parameters. These results provided an interesting measure accurately the timing of note events in this song.
insight into the “Heisenberg Uncertainty” aspect of the The time deltas are plotted in Figs. 1e and 1f with longer
time/frequency tradeoff that is inherent to Fourier analysis. times at the top and shorter times at the bottom. The musical
A 2048 point fast Fourier transform (FFT) and three to ten pulse note events, played by piano/bass, appear at the top,
millisecond time slice overlap are well suited to many sam- and the hi-hat syncopation is in the lower part of the figure.
ples. In some cases a time resolution as short as 0.5 millisec- Notice in particular that the pulse is not uniform. Rather, it
onds was used. Visual inspection of the spectrogram allowed alternates between slight “pushes” and slight “pulls” on the
a choice of the frequency bands most likely to distinguish beat, i.e., the notes are intentionally not played in a strict
musical notes played by various instruments. Sets of the pos- mechanical metronomic style. The longest, shortest and aver-
sibly overlapping frequency bands are summed to obtain age pulse time deltas are marked with blue horizontal lines.
time series plots of the audio power in the several bands. The average delta time has been used as the canonical pulse
clock tick. The backbeat (1/2 of the pulse) and swung note
Musical samples deltas (1/3 and 1/6 of the pulse) are more uniform than the
Analysis results for several Swing tunes are included in pulse, indicating that these syncopated notes follow more
this article: It Don’t Mean a Thing (if it ain’t got that Swing) by closely the uniform timing paradigm of MB meter, although
Duke Ellington and Irving Mills, performed by Duke Ellington departing from the “divide by 2” metaphor. Keep in mind
and Louis Armstrong (1962); Graceland by Paul Simon (1986); that the DIFFDOT plot is the time difference between notes,
Fever by Eddie Cooley and John Davenport, performed by Ray and should not be interpreted as mirroring standard musical
Charles and Natalie Cole (2004); examples of Brazilian Samba tablature form. The CHKDOT plot does correspond to the
batucada music from the CDs Grupo Batuque Samba de subdivision representation of tablature.
Futebol (2004) and Os Ritmistas Brasileiros Batucada Since this song must be regarded as one of the most fun-
Fantastica (1963/1998) by Luciano
Perrone and Nilo Sergio.
It Don’t Mean a Thing (if it ain’t got
that Swing) (Duke Ellington and
Louis Armstrong 1962)
Figures 1, 1a and 1b show spectro-
grams of the introduction to It Don’t
Mean a Thing (if it ain’t got that Swing).
Figure 1 is the overview of the 19.3 sec-
ond sample, showing the entire Fourier
spectrum up to 22,050 Hz (half of the CD
sampling rate of 44,100 samples/sec).
The main feature of the first few seconds Figure 2. Spectrogram of “Graceland” intro.
of the spectrogram is the high frequen-
cies produced by the hi-hat cymbal play-
ing the classic “tchzzz-tch-ta-tchzzz-tch-
ta-tchzzz…” Jazz swing rhythm. The low
frequency bass and piano parts are
shown in the lower portion of the plot as
a thick red swath. Louis Armstrong’s
trumpet dominates the remainder of the
sample, clearly revealing the harmonic
structure, timing and pitches of the
notes. Figures 1a and 1b show close-up
views of the cymbal and piano/bass sec-
tion. The inherent technical limitations Fig 2a. Close-up showing bass and drums.
36 Acoustics Today, July 2007
damental Swing tunes of all time, we
conclude that the triplet subdivision
which is clearly shown in Figs. 1d and
1f is an important feature of Swing
style. What is new is the evidence of
intentional time variation played in the
basic pulse of the rhythm. This feature
is examined more closely in subsequent
examples. While a triplet subdivision
can be reasonably written in MB nota-
tion, we are unaware of any similar
notational device for indicating the
Fig 2b. Ten frequency band note events.
variation of pulse timing.
Graceland (Paul Simon 1986)
Graceland by Paul Simon is a pop
tune that mimics the feeling of riding
on a railroad. A prominent rhythmic
feature is the song’s strong backbeat,
but without any great sense of the clas-
sical Jazz Swing feel. Nonetheless
Graceland elicits a very bouncy bodily
response. Figure 2 shows a spectro-
gram of the full audio sample, while
Fig. 2a shows a close-up of the bass and
drum parts. To the experienced eye, the Fig 2c. DIFFDOT plot showing time variations of drum/bass pulse and rhythm guitar.
backbeat rhythm in the low frequencies
is clearer in this sample than it is in the selection It Don’t Fever (Ray Charles 2004)
Mean a Thing. Fever is a classic Rhythm and Blues (R&B) song with a
Figure 2b shows the time series plots of note events for strong backbeat. Ray Charles’ 2004 version is played in a
ten frequency bands. The bass drum part marks the pulse in very tight, straight rhythmic style. Despite almost clock-
the bottom time series, including both downbeat and back- work precision, this song is never boring and led to a sec-
beat. The secondary note events are extracted from the high ondary defining feature of Swing (beyond inducing body
frequencies of the attack envelope of the electric guitar strum- movement). A 14 second loop made from this recording
ming. A triplet subdivision in the CHK-
DOT plots was used to look for Swing.
Surprisingly, all note events were repre-
sented better by a divide by two subdi-
vision scheme—hence half of the elec-
tric guitar notes fall between the triplet
subdivision lines. The DIFFDOT plot,
Fig. 2c, revealed the Swing feel for this
song. Both the pulse and the rhythm
guitar show a repetitive pattern of push-
ing and pulling the time locations of
their note events. There is a substantial
amount of variance to the time varia- Fig 3. Spectrogram for “Fever.”
tions, especially in the beginning of the
pulse, which indicates a short term
tempo fluctuation. The rhythm guitar is
much more consistent in the short/long
variations of note timing, similar to the
pulse of It Don’t Mean a Thing. There is
no evidence of any triplet subdivision in
note timing variations. The variance of
time deltas gives this song a fairly loose
feel, but no sense of rhythmic sloppi-
ness, due to the consistent repetitive
pattern of time variations. Fig 3a. Time series event plot for “Fever.”
Technical Anaylsis of Swing Music 37
with six subdivisions of the pulse.
Thus, the downbeat, backbeat and
triplet temporal locations are marked
by vertical lines. Finger snaps and
conga drum beats land exactly on these
time ticks. The precise clusters of note
events on the pulse, backbeat, and
triplet time lines in the DIFFDOT plot
are evident. There is a general absence
of note events on the quarter note line,
just as there was in It Don’t Mean a
Fig 3b. DIFFDOT plot for “Fever.”
Thing.
A very remarkable aspect of this
recording can be seen in the close-up
DIFFDOT plot (Fig. 3c) showing only
the pulse of Ray Charles snapping his
fingers on the backbeat. It is obvious
from the normal DIFFDOT plots of
Graceland and It Don’t Mean a Thing
that the variations in the pulse event
time deltas are much greater on those
two samples than on Fever. The close-
up shows that Ray Charles finger snap
time deltas are less than 5 millisec-
onds. This means that the deviation
Fig 3c. Close-up DIFFDOT plot for the pulse note events in “Fever.”
from the canonical MB metronome
beat times is less than +/- 2.5 milliseconds. Given the tight
could play endlessly and after more than an hour, it still rhythmic style of this recording, and the fact that Ray
sounded incredibly fresh. A sample which becomes perceptu- Charles was one of the 20th century’s best musicians, we
ally tedious after only a few repetitions almost certainly does believe this DIFFDOT plot represents an important data
not Swing. point regarding the limits of human time perception and
Beneath the excellent musicianship, there exists a strong physical action.
triplet element to the rhythm. The
conga drum plays around the backbeat
which is marked precisely by Ray
Charles’ finger snaps. About half of the
conga note events are on triplet pickup
beats before either the downbeat or the
backbeat, with a few on triplets follow-
ing these main beats. Unlike Graceland
or It Don’t Mean a Thing, this sample
shows virtually no rhythmic looseness.
The conga, drums, finger snaps, and
bass guitar are synchronized with each
other to a precision of better than 15
milliseconds in almost all cases. Fig 4. Spectrogram of pandeiro batida.
Contrast this precision with
Graceland’s consistent variations of 50
to 80 milliseconds, and It Don’t Mean a
Thing’s somewhat random looking
variations in the 30 to 40 millisecond
range. These are details that distinguish
between loose and tight rhythmic
styles.
Figures 3, 3a and 3b show the
familiar set of spectrogram, time series
event plots, and DIFFDOT diagram.
Subdivision of the meter in the time
series plot is a four beat pulse phrase Fig 4a. Time series plot of pandeiro batida.
38 Acoustics Today, July 2007
repetitions of the basic batida, indicat-
ing two sides of the larger phrase. The
pattern of these time variations is con-
sistent, since the DIFFDOT plot (Fig.
4b) clearly shows the Swing pattern as a
repeating waveform with variation,
rather than some kind of random pat-
tern. The plot shows a complete
absence of a backbeat (1/2 of pulse) and
the consistent presence of a note time
interval of 1/4 of the pulse. This is the
time delta between the ee and and
Fig 4b. Time series plot of pandeiro batida.
notes, and would be a standard quarter note in MB notation
Brazilian swinghee if the time location of the note events were on the canonical
Swing may include complex rhythmic patterns, but it quarter note subdivision of the meter, which is not the case.
can also be found in very simple rhythms. This is well illus- An accurate rendering of this rhythm (as played) in MB nota-
trated in a basic Brazilian rhythm, the “pandeiro batida,” lit- tion would need a convoluted pattern of multiple rest and
erally “beating pattern of the pandeiro.” The pandeiro is the note glyphs of various lengths (e.g., 1/4 plus 1/32 plus 1/64,
national instrument of Brazil and is approximately the same or 1/2 minus 1/3 etc, all very problematic for the music read-
as a tambourine in American music. The tambourine is also er) to capture the actual timing of the notes as played. The
found in many other musical traditions, but the Brazilian DIFFDOT diagram shows these non-standard note time
pandeiro has several playing styles that are unique. The durations in a very natural fashion as subdivisions of the
basic pandeiro batida is a simple 1-2-3-4 pattern played pulse. The DIFFDOT pulse shows the familiar push/pull on
continuously with slight temporal and accent variations that the canonical downbeat time locations, although in the time
denote which phrase of a larger pattern is being played. This series plot, this is a subtle feature.
pandeiro batida is invariably taught as straight time: one-ee- There are a variety of swinghee styles used to play the
and-uh played with thumb (one), fingertips (ee), palm heel basic pandeiro batida. As in American music, there are proba-
(and), fingertips (uh), over and over. This batida is both bly as many styles of Swing as there are drummers or pandeiro
taught and written as a succession of evenly spaced quarter players. Brazilian swinghee clearly has a very different feature
or eighth notes, but playing in Brazilian swing style (called set than American Swing, even in this simple example.
swinghee, or balance in Portuguese) is
far removed from even-spacing.
The spectrogram in Fig. 4 clearly
shows the basic simplicity of this
rhythm, and also illustrates how the
beats are not played with even timing
despite being written as equal notes.
The time series plot in Fig. 4a has the
pulse in the lower frequency band
which is the thumb hitting the pan-
deiro skin causing a low thump. All
four notes appear in the upper fre-
quencies which are caused by the
metallic jingles of the pandeiro. The
Fig 5. Spectrogram of shuffle rhythm.
“uh” note is consistently played on a
nearly exact triplet pickup to the
pulse. This classic Jazz feature is cer-
tainly part of the swinghee feeling. A
four beat pulse with three subdivisions
per pulse, that gives vertical lines on
the exact triplet note time locations is
illustrated in Fig. 4a.
The second and third notes (ee and
and) are played in two very odd loca-
tions in the first half of each measure.
Neither of these is played on a triplet,
quarter or eighth note location, and
there are slight time variations between Fig 5a. Close-up of low frequencies.
Technical Anaylsis of Swing Music 39
tributes to the flowing feel that many
Brazilian songs have.
The spectrogram in Figs. 5 and 5a
shows a shuffle pattern played by a surdo
(Brazilian bass drum) and an afoxe
(gourd instrument with a stick scraping
across it). The audio spectrum is quite
diffuse, although note events can be
identified. The time series plot, Fig. 5b
shows considerable complexity of the
waveforms in all frequency bands due to
Fig 5b. Time series of shuffle rhythm.
the instrument’s timbre—a stick scraping
across the grooves in the gourd produc-
ing many correlated closely spaced click-
ing sounds which spread across many
frequencies in a fairly non-harmonic
fashion. The pulse in the low frequency is
played by the surdo. The DIFFDOT plot,
Fig. 5c shows swinghee timing variations
in both the pulse and secondary events
tracks. Like Graceland, the two rhythms
are closely connected but playing the
Swing in different ways.
Ensemble swing in Brazilian
Fig 5c. DIFFDOT plot of shuffle rhythm. swinghee
In this section an example of complex interaction
between two instruments will be examined in some detail.
Brazilian swinghee 2: Shuffle The pandeiro plays a duet with a tamborim, a small Brazilian
Many percussion and drum note events have a very hand drum generally hit with a stick. The tamborim plays
sharp and precise onset, making them easy to identify by our many of the most complex rhythms in a Samba. The basic
approach while others have a much less precise sound. The rhythms are often difficult, and the interpretive timing is
term “shuffle” is used to describe a wide range of Swing very fine-grained and precise, typically 10 to 20 millisecond
rhythms played in this style. Shakers, brushes on a snare excursions from canonical beat locations.
drum or hi-hat cymbal, afoxe and guiro are all examples of Figure 6 shows the pulse played by the pandeiro, and the
shuffle instruments. Single note events from these instru- desinha (design, a Brazilian term for complex rhythmic orna-
ments can be identified, but overall there is a feeling of blur- ment) played by the tamborim. In the upper plot when the
ring and blending of each note event into the next. The meter tamborim starts playing, it is not at the standard beginning of
and subdivision of the rhythm is defined by the loudness the batida. Instead the drummer plays a variation on a por-
peaks which are identifiable but somewhat temporally tion of the second half of the entire tamborim phrase, which
ambiguous events. Shuffle is an odd combination of vague- leads into the downbeat. The downbeat is indicated by the
ness and precision, difficult to describe with language. green marker at time location 1700, except that there is a fur-
Identification of note events is more difficult for these ther variation—it is not the primary downbeat but the off-
less precise musical events, and marking the onset time loca- beat, so the tamborim is playing on the opposite side from
tions precisely can be subject to interpretation of how the the pandeiro. It is very common in Brazilian music for some
rhythm feels. The peak power was chosen to be used as the
location of the note event, although per-
ceptually there is some activity happen-
ing before the peak, unlike most percus-
sion sounds with their fast onset. The
standard Brazilian ganza (shaker)
rhythm usually has a noticeable snap that
precedes the downbeat and this snap is
fairly sharp, but the remaining notes are
more blurry. The snap gives a precise
anchor to the rhythm which makes the
blurry parts sound well integrated to the
ensemble Swing, rather than sounding as
if played carelessly. Overall, this con- Fig 6. Pandeiro pulse and tamborim desinha.
40 Acoustics Today, July 2007
relation between motion and rhythm
started in the early 20th century.
Seashore (1938) and Gabrielsson (1987)
both include a variety of reports,
insights and opinions about this phe-
nomenon. In this example, the tam-
borim plays the first beat right on top of
the pandeiro on the “real” downbeat,
instead of playing at the “standard”
temporal location for the note. This
portion of the batida starts its repetition
at the ninth event location (time 2000,
triplet pickup to downbeat), just before
Fig 6a. Close-up showing micro-timing.
the main downbeat, marked by the
black line at time 2050. You can see that the first beat ordi-
narily is on the triplet pickup to the downbeat, and the next
two phrase batidas to be played with the two phrases two beats are almost exactly evenly spaced on the subsequent
swapped. This is analogous to the 3-2 clave and 2-3 clave style triplet time points. The slight variations from playing exactly
in Cuban music. Swapping the sides gives a different feel, on temporal locations that correspond to a standard subdivi-
usually more syncopated if the unfamiliar variant is played. sion are part of the swinghee style. While one might think that
The tamborim batida is very syncopated even when this is rhythmic looseness similar to the Graceland example,
played straight. The “standard” place to start the basic tam- generally Brazilians play these slight temporal variations quite
borim batida is at note event #6 in Figs. 6 and 6a at time precisely, consistently and intentionally.
location 1700, very slightly ahead of the beat. Many batidas
have beats played ahead of the standard subdivision beat, Conclusions and future work
and/or also slightly ahead of, or behind the note events of Swing is a far more complex part of the musical land-
other instruments. In this example, the pandeiro plays scape than reported previously in the academic computer
about 30 milliseconds ahead of the standard downbeat at science literature. The authors have analyzed Swing
this temporal location, and the tamborim plays about 15 rhythms in American, Jamaican, and Brazilian music. Some
milliseconds ahead of the pandeiro. This technique is used of these are simple enough to allow a complete assessment
to give a push to the feeling of the rhythm by both instru- of the musical features that give rise to Swing feeling.
ments. A few beats on either side of the 1700 point, both Others point in the direction of subtle complexities that
instruments play notes exactly on a standard subdivision. require improvements to the pattern recognition and signal
The feeling of this pattern is consistent throughout the sam- processing techniques to characterize fully the Swing details
ple which is several minutes long. Figure 6a shows a closer described in this article. There are many other musical
view of the micro-timing. styles which have Swing characteristics including Cuban,
The 15 to 30 millisecond time variations are on the order Middle Eastern, African, Funk, and Hip Hop. Our analysis
of 1/64 or 1/32 notes at the typical quick Samba tempo of 140 results clearly point to a basic inadequacy of standard Euro-
beats per minute. We believe from our experience with American musical notation to annotate swing rhythm
Brazilian music that the musicians are playing these timing styles. Comments and observations from professional musi-
variations entirely by intuition and experience, rather than cians agree with this notational limitation. For the purpos-
explicitly subdividing the beat in the moment, i.e., by feeling es of musical analysis in the context of music information
rather than analysis. We have found that analytical under- retrieval (MIR), the authors feel that it is more fruitful to
standing has substantially improved our ability to play and omit most attempts to render a musical performance as tab-
hear these rhythms, but that in the performance too much lature. It would be more practical and accurate to maintain
analysis actually impedes our ability to play the groove well. the information in a form which is close to the actual audio
Looking at the two sets of three evenly spaced notes starting data, and the information features that can be extracted
at 1700 and 2000, observe that the first and third beats are from such [Link]
slightly ahead of the standard subdivision. These beats push
the rhythm slightly and give a somewhat more energetic feel- References for further reading:
ing to the music than if they are played straight. In this case, Birch, Alisdair MacRae (2003). “It Don’t Mean a Thing If It Ain’t
these two tamborim note events are also accented, further Got that Swing,” Just Jazz Guitar Magazine, August 2003. Online:
emphasizing the push to the rhythm at these two time points. [Link]
The combination of time push and accent are caused by the
tamborim player putting a little extra “juice” into the rhythm Cholakis, Ernest (1995). Jazz Swing Drummers Groove Analysis.
Numerical Sound. Online: [Link]
for these note events. Waadelund (2004) has studied the rela-
tion between this type of “body english” and the rhythms Friberg, A., and Sundstrom, J. (1999). “Jazz drummers’ swing ratio
played by drummers on drum kits. The investigation of the in relation to tempo,” J. Acoust. Soc. Am. 105, 1330(A) (1999).
Technical Anaylsis of Swing Music 41
Friberg, A., and Sundstrom, J. (2002). “Swing ratios and ensemble Sloboda, J. A.(ed.). (2000) Generative Process in Music. (Clarendon
timing in jazz performance: Evidence for a common rhythmic pat- Press, Oxford, UK).
tern,” Music Perception 19(3), 333-349.
Waadelund, C. H. (2004). Spectral Properties of Rhythm Performance
Fulop, Sean A., and Fitz, Kelly (2006). “A Spectrogram for the twen- (Norwegian University of Science and Technology, Trondheim).
ty-first century,” Acoustics Today 2(3) 26–33.
Online resources:
Fulop, Sean A., and Fitz, Kelly (2006). “Algorithms for computing [Link]
the time-corrected instantaneous frequency (reassigned) spectro- [Link] [Link]
gram, with applications,” J. Acoust. Soc. Am. 119(1), 360–371. [Link]
[Link]
Gabrielsson, A., ed. (1987). Action and Perception in Rhythm and [Link]
Music. Papers given at a Symposium in the Third International [Link] [Link]
Conference on Event Perception and Action. Royal Swedish [Link] [Link]
Academy of Music, #55. Stockholm, Sweden. [Link] [Link] (original maker
of Darbuka and Latigo software, now owned by [Link])
Gabrielsson, A. (2000). “Timing in Music Performance and its [Link]
Relation to Music Experience,” in Generative Process in Music edit-
ed by J. A. Sloboda (Clarendon Press, Oxford). Discography:
Louis Armstrong and Duke Ellington - Louis Armstrong meets
Guoyon, F. (2005). “A Computational Approach to Rhythm Duke Ellington (1962)
Description,” Ph.D. thesis. University of Barcelona, Spain. Paul Simon - Graceland (1986)
Hamer, M. (2000). “It don't mean a thing if it ain't got that swing. Ray Charles - Genius Loves Company (2004)
But what is swing?,” New Scientist 2270, 48. Bob Marley - Legend (1990)
Grupo Batuque - Samba de Futebol (2004)
Seashore, C. E. (1938/1967) Psychology of Music (Dover Luciano Perrone e Nilo Sergio - Os Ritmistas Brasileiros Batucada
Publications, Inc. New York). Fantastica (1963/1998)
Ken Lindsay is an Information Pete Nordquist is an assistant profes-
Science researcher with tlafx in sor in the Department of Computer
Ashland, Oregon. His current work Science at Southern Oregon
includes extracting previously University (SOU). He holds an MS in
unseen information from music Computer Science and Engineering
and biophysical signals. Previously from the Oregon Graduate Institute
he worked for 8 years at NASA and an MM in Choral Conducting
Ames Research Center in the Neuro from the University of Missouri
Engineering Lab, developing real- Kansas City Conservatory of Music.
time 3D graphics for flight simula- He worked for 14 years at Intel
tors. In other incarnations he has Corporation in various software
worked in hardware and software development groups and has taught
engineering, theatre, film and radio. computer science at SOU for the past
A serious student of Brazilian music five years. His musical involvement includes having sung
for over 10 years, he has performed with the Kansas City Chorale and the Oregon Repertory
in the San Francisco Bay area, New Singers. He currently sings with the SOU Chamber Choir,
Orleans, and Rio de Janeiro. He holds an MS in Math and the Rogue Valley Chorale, and serves as rehearsal assistant
Computer Science from Southern Oregon University. for the Rogue Valley Youth Ensemble.
42 Acoustics Today, July 2007