Chord Context Function White Quinn
Chord Context Function White Quinn
This article investigates several questions of harmonic function using aggressively data-driven
approaches. We apply Hidden Markov Modeling—a technique used to identify contextual regulari-
ties within streams of data—to the Kostka-Payne, McGill Billboard, and Bach chorale corpora.
Keywords: corpus analysis, function, tonality, Riemann, computational modeling, popular music,
harmony, Kostka and Payne, McGill Billboard, Bach chorale, machine learning, music cognition,
Hidden Markov models.
S
ince its articulation by Riemann (1893) in his chords they tend to precede and succeed. In doing so, we will be
Vereinfachte Harmonielehre of 1893, the concept of har- able to discuss certain statistical properties of the corpora
monic function has provided music theorists with re- themselves, thereafter relating these properties to larger dis-
markable explanatory power.1 Function theory explains the cursive and theoretical issues. (We further discuss some
basic intuition that certain sequences of chords are syntactical underpinnings of our corpus methods below, along with our
and others unexpected. What distinguishes function theory modeling techniques.)
from other syntactic theories of harmony (e.g., Stufentheorie) is We will investigate three corpora using these methods: the
that its small number of categories yield a very efficient system Kostka-Payne corpus, the McGill Billboard corpus, and the
of rules for harmonic progression. widely studied corpus of Bach chorales.2 After describing our
But while harmonic function remains a powerful tool findings, we will return to an examination of broader topics,
within contemporary music theory, with few exceptions, the including some generalizations that seem to hold across our
concept has remained relatively unchanged since Riemann. In datasets that problematize music theory’s standard
this article, we challenge commonly held assumptions regard- three-function model. We will demonstrate that certain char-
ing harmonic functions within music-theoretical discourse: acteristics generalize across these three repertoires, but these
first, while tonic and dominant are generally accepted as the similarities are outweighed by both stylistic differences be-
basic duality of any system of functions, the number, identity, tween the repertoires and the differences in representation
and behavior of additional functions is not necessarily clear; (e.g., Roman numerals versus lead-sheet symbols) that our
second, we question whether a chord’s functional identity is models use. Our findings also connect to larger theoretical
determined by its scale-degree content, its root, its local con- topics surrounding harmonic function: we will show evidence
text, or by some interaction between these factors. that surface events, such as passing tones and neighbor chords,
Our approach is fundamentally data-driven, creating exhibit syntactic regularities similar to consonant harmonies,
models based on large musical corpora. After reviewing the and argue that the perceived hierarchical superservience of cer-
basic definitional issues concerning harmonic function, we tain functions may be a simple matter of chord frequency.
will describe the technology used to model our data. This (Importantly, as we make clear in the ensuing prose, our goal
model will approach the idea of harmonic function from an is not to prove or solve any of these points but to provide novel
aggressively naı̈ve and skeptical standpoint: we will imple- observations in order to complicate and add subtleties to our
ment a model that groups chords into categories only by the understanding of harmonic function.)
314
chord context and function 315
associated with the IV chord, though when Rameau first used virtue of their formal position and their relationship to other
the term in his Nouveau Syst^eme, it referred to what we would chords rather than through any internal characteristics of the
now call ii65 .3 The latter term derives from the concept of domi- chords themselves. Thus, a given V chord might function as
nant preparation articulated by Allen Forte, under the influence the dominant in a phrase, not because it contains the leading
of Schenker.4 Some theorists conflate these usages: Deborah tone but because it resolves to the tonic and forms an authentic
Stein, for example, writes that “the subdominant functioned cadence.”9 Under this definition, functions gain their identity
either as preparation for the dominant or as a neighboring har- by their syntactic position, or the context in which the constit-
mony that prolonged the tonic chord.”5 Other theorists seek to uent chords occur.
tease these actions apart by arguing for two distinct functions. Much of contemporary music theoretical discourse, how-
things that we could not see with purely human capacities. former and domesticates the latter.18 Other computational
From this standpoint, our analyses offer observations into the models of harmonic function will produce different results,
way that chord-to-chord progressions might group together, given their contrasting inner workings. In what follows, we
because they do so with a precision, insight, or impartiality im- will endeavor to make clear which features of our results are
possible to human perceptions. more interesting for music theorists and which are side effects
The ability to strictly isolate the contextual parameter in or- of our reliance on HMMs.
der to create broader contextual categories is arguably unique
to the domain of computation, and therefore more of a COMPOSITE HIDDEN MARKOV MODELS
“microscope” approach. But, regardless of how exactly our ap-
prototype theory, though lexical probabilities can strongly favor A more detailed discussion of HMMs and examples of expec-
a particular realization of a given hidden state. For instance, in tation maximization are presented in Appendix A.21
the case of T, the most likely observed chord is I. (At this While the Baum-Welch algorithm runs largely unsuper-
point, it is important to remember that, even though we might vised, it requires an important decision to be made in advance:
use symbols that denote chord content to a human, these char- the number of hidden states (k). While there is no universally
acteristics are unavailable to the underlying algorithm. To the accepted method to determine the ideal number of states, we
computer, “I” and “vi” are arbitrary symbols, and it has no way introduce a novel approach in this essay. Our method exploits
to know that chords designated by these symbols share two a property of expectation-maximization algorithms, like
common tones. Rather, the algorithm is only aware of how Baum-Welch: in part because of the random initialization of
these random symbols are positioned in relation to one another the parameters, models estimated from the same training data
in the analyzed corpus. It is in this sense that our model is in repeated trials will not replicate each other consistently if
“purely contextual.”) the chosen value of k does not fit the data well. Example 4
If lexical and transition probabilities are known, the model depicts the procedure; we will begin our discussion in medias
can be used to analyze a series of observations, assigning the res, at what is labeled as the Cardinality Loop. For each value
most likely hidden state to each chord. This task is comparable of k under consideration, we use Baum-Welch to train a bank
to what we teach our undergraduate theory students to do of 300 HMMs on the same training data. We then use a stan-
when undertaking functional analysis: to reconcile an observed dard method (the Viterbi algorithm) to have each HMM de-
musical surface with an abstract knowledge of the rules of code a test data that was not included in the training data.
functional progression. For example, knowing whether to call Every observation (chord) in the test data is then associated
a given vi chord tonic or predominant depends both on what with a vector of 300 hidden-state labels, each assigned by one
the next chord is and what the acceptable functional progres- of the HMMs.
sions are. The novelty of our procedure lies in the next step, where
Techniques of machine learning make it possible to esti- we attempt to create a composite of the 300 HMMs. We use
mate transition and lexical probabilities directly from a given a different expectation-maximization algorithm, known as k-
series of observations. An iterative process called the Baum- medoids, to classify the set of observations in the test data into k
Welch algorithm finds an optimal fit between the hidden categories (where k is the number of hidden states in each
states and the observations. Baum-Welch begins with a ran- HMM) based on a simple dissimilarity measure equal to the
dom set of transition and lexical probabilities, and each itera- square of the number of HMMs that decoded the two observa-
tion of the algorithm consists of two steps known as tions in question into the same hidden state. The k-medoids al-
expectation and maximization. In the expectation step, these gorithm, as it is implemented in the R programming language,
(initially random) probabilities are used to decode or analyze the yields a quantity called silhouette width, a measure of the ease of
observation sequence, determining the most likely hidden state the algorithm’s task given the data. A low silhouette width indi-
for each observation given the current model. The model can cates overlapping and interpenetrating categories. The higher
then assign an overall probability to the sequence. Initially this the silhouette width, the clearer the boundaries between catego-
probability will be low, since the model’s parameters (the tran- ries are, and the higher the degree of consistency between the
sition and lexical probabilities) are set randomly. In the maxi- individual HMMs. A high silhouette width allows us to treat
mization step, the algorithm adjusts the parameters in an the categories learned by the k-medoids algorithm as hidden
attempt to improve the probability of the training data given states of a composite of all the HMMs.
the model. The expectation-maximization process is iterated
until the improvements reach a point of diminishing return. 21 See the online version of this article.
chord context and function 319
We prefer values of k for which the HMMs replicate each some fraction of the dataset and then assess how well the
other more consistently (indicated a higher silhouette width) model conforms to the remaining fraction. If one iterates this
than for neighboring values of k. A higher silhouette width indi- process with different divisions and the same results hold, one
cates more consistent solutions within the hundreds of models can be relatively certain that the model is capturing some gen-
using that k value: if that value produces a peak compared to the eralizable and robust property (rather an artifact of random
surrounding values, that means k hidden states organize the variation). The following analyses vary the types of validation
space better than both k-1 functions, and k þ 1 functions. In used: since the Kosta-Payne corpus is derived from a textbook
this study, we prefer values of k exhibiting the most dramatic that addresses issues of harmonic function, we compare our
peaks. All silhouette-width figures are shown in Appendix A.22 results to the authors’ own statements on this topic. Given
There are two approaches to validating an HMM. One way that we mean to be skeptical of how notions of harmonic func-
is to use it to decode some test data and compare the results to tion manifest on musical surfaces, the Bach chorale and popu-
a ground truth derived from human analyses of the same data. lar music corpora are both sufficiently large to engage in the
If the model reproduces the insights of the human analyst, the latter cross-validation tactics.
correlation suggests that the model is producing some salient Such a model of harmonic function takes only the context
result. We are, however, less interested in how a modern lis- of chords—and not their content—into consideration. If the
tener might analyze a corpus than we are in identifying consis- model puts two chords into the same functional category, it is
tent and reproducible properties of corpora. To this end, we on the basis of their behavior alone and not any shared
also use cross validation, in which we create a model using only scale-degree content. In what follows, we apply this modeling
procedure to several corpora, beginning with a relatively
22 See online version of this article. straightforward dataset: the Kostka-Payne corpus.
320 MUSIC THEORY SPECTRUM 40 (2018)
MODELING THE KOSTKA-PAYNE CORPUS WITH FOUR FUNCTIONS distributed. In particular, to make sense of IV–I progressions,
the model places IV chords into the “dominant” category, es-
The Kostka-Payne corpus comprises the analytic annotations sentially conflating S and D functions under Example 1(b)
to the musical examples provided in the instructors’ edition of into a single “pretonic” category. Chords with a root of ^6 have
the eponymous harmony textbook. This corpus records the also been removed from the tonic category in this version:
root of each chord in both absolute (pitch) and relative (scale while this simplifies the lexical probabilities by placing all such
degree) terms, along with metadata about the composer, title, chords into the predominant category, it makes deceptive
and mode (major or minor) of each excerpt. Relative-pitch cadences very unlikely events. Because of these difficulties, this
chord roots are represented as integers modulo 12, correspond- model also assigns relatively low probabilities to the observa-
But this model also reflects the outlook of other work link-
ing harmonic function to corpus analysis. Allen Irvine Scholars disagree about some very basic definitions about har-
McHose, an early advocate of a corpus-based approach to monic function in popular music, ranging from what consti-
tonal harmony, uses a model strikingly similar to our four- tutes dominant function to whether traditional harmonic
state model in his 1947 textbook, The Contrapuntal-Harmonic function has any relevance to pop-music at all. On the one
Technique of the 18th Century. Undertaking an exhaustive hand, scholars like Allan Moore, Ken Stephenson, David
quantification of the root motions in “Bach, Graun, Handel, Temperley, and Trevor deClercq argue for a fundamental dif-
and Other Contemporaries,” McHose groups chords into four ference between popular music and common-practice har-
“classifications,” shown in Example 9.27 The classifications mony.28 Temperley and DeClercq, in a corpus study of rock
schematize a series of falling fifths into a final tonic, creating harmony, emphasize many aspects of this difference: rock har-
five equivalence classes reminiscent of the four-function model mony does not have strong unidirectional tendencies (e.g.,
under Example 7. V progresses to IV as much as IV progresses to V), and, in
The most peculiar property of our model, which sets it many cases, IV (rather than V) functions as the primary non-
apart from McHose and nearly every other textbook, is that of tonic triad. On the other hand, several analysts have attempted
“dominant” IV chords. The model conflates dominant and to theorize pop/rock harmonic syntax as an extension of
subdominant functions, creating a “pretonic” hidden state. common-practice norms. Nicole Biamonte and Chris Doll, for
This conflation is a consequence of the model’s exclusive focus instance, argue for including modal harmonies into functional
on two-chord context: without knowledge of scale-degree con- models, with [VII functioning as dominant (Doll’s “rogue
tent or common tones, the model unifies the dominant and dominant”) or as IV/IV (Biamonte’s “Double Plagal” progres-
subdominant functions, recognizing that they tend to occur sion).29 Going even further, Drew Nobile entirely dissociates
in the same contexts. (Notably, k ¼ 5 solutions neither traditional harmonic functions from the scale-degree content
of chords.30 In Nobile’s formalization, almost any chord can The main circuit contains two primary poles, T and S, let-
function as a tonic, dominant, or predominant: “a chord’s ters chosen because of the similarity between these states and
function is given more by formal considerations—i.e., what the traditional tonic and subdominant functions. T is most
role it plays within the form—than by its internal structure or frequently represented by a I chord, and S most often produces
any specific voice-leading motion.”31 Nobile allows for pre- IV chords. These two chords are the most frequent in the cor-
dominant V chords, dominant IV chords, and so on. pus (23.5% and 21.7%, respectively). Furthermore, 22.3% of
all transitions in the corpus are between T and S. In this sense,
MATERIALS AND METHODS FOR THE MCGILL BILLBOARD CORPUS S is what we call the antitonic function: like the D/T– function
in the Kostka-Payne model, S provides the most frequent
the model recognizes that II occurs within the same context— chord—or, more precisely, appears in a place the model recog-
between chords of the T1 and S1 functions.38) The progres- nizes that V chords often appear.40
sion itself bears a resemblance to the four-state model of the The peripheral functional pairs, Q/P and X/W, arise from
Kostka-Payne corpus: four functions accommodate a series of portions of the corpus that differ in harmonic vocabulary from
falling fifths. However, the fact that this progression expands a the major-mode, triadic music we have examined so far. The
central T–S axis rather than a dominant–tonic pair is signifi- Q/P pair tends to include chords involving sevenths and
cant. It shows the crucial distinction between the pop/rock ninths (perhaps indicating harmonic languages more influ-
and the Kostka-Payne functional paradigms: while the enced by jazz). Example 14 shows the model’s analysis of the
dominant function acts both as pretonic and antitonic in Little River Band’s “We Two.” The alternation between I9
common-practice music, these positions are distinct in the and ii7 chords shuttles back and forth between the Q and P
pop/rock corpus, with S as antitonic and either S or S1 acting functions. To a human trained in music theory, the scale-
as pretonic. degree content of these chords would indicate equivalencies
This paradigm then allows for cases such as Example 13. between I9 and I chords and ii7 and ii (or IV); with such
This song exhibits the same functional progression, but in a equivalencies in hand, we might imagine this passage as going
realization foreign to common-practice norms. The post- between the T and S functions. However, with no such equiv-
antitonic S1 function becomes [VII rather than V.39 The alencies available in our strictly context-oriented approach, the
four-function circuit interprets both progressions as exemplars model recognizes seventh and ninth chords as completely dif-
of a more general case: the expansion of the primary T–S pair ferent objects from their triadic counterparts. Since fewer
with T1 and S1 functions. Unlike the T1-functioning [VII chords in this corpus have sevenths and ninths, and these non-
chord within “Paradise by the Dashboard Light,” the [VII in triadic chords tend to progress to other nontriadic chords, the
“I Want You to Want Me” seems to substitute for a V model relegates them to their own peripheral syntax, where it
simply groups chords into two categories that follow one an-
other. The tonic-antitonic polarity does not appear to hold on
the periphery of the model: note, for example, that chords
38 These categorizations reflect the general tendencies of groups of chords— with roots of I, IV, and V appear in both P and Q.
the broader the grouping, the less the model captures the behaviors of in-
The other significant lexical minority in the corpus consists
dividual chord types. The grouping of IV and II is an example of this:
taken on its own, the II chord primarily moves to chords in the S1 func-
of minor-mode chords, which the model relegates to the X
tion, while the IV chord is more equally disposed to move toward T and and W functions. Example 15 shows an excerpt from “Funky
S1. But regardless of these differences, the model recognizes the chords’ Nassau,” a 1971 song by The Beginning of the End that is
similarity and groups them together into one function.
39 Note that [VII is a very improbable, but still possible, chord to be output 40 The former functions akin to Biamonte’s “double plagal” progression, and
by the S1 function, coming under “others” in Example 10. the latter as Biamonte and Doll’s “dominant” [VII chords.
chord context and function 325
one of a small number of minor-key songs in the corpus. system with characteristics very different than that of the
Here, X and W appear to function as tonic and antitonic cate- common-practice. The most obvious difference is the identity
gories, respectively. In general, however, these two functions of the antitonic: while V functions as the most frequent non-
do not map onto tonic/antitonic categories; rather, as is the tonic chord in the common-practice Kostka-Payne corpus, IV
case for P and Q, each state simply comprises chords that tend fills that role in the pop McGill-Billboard corpus. This is a di-
to progress to chords in the other state. For instance, major IV rect consequence of the impact that chord frequency has on
and minor iv tend to appear in different categories, with iv the functional categories the HMM learns from the corpus.
appearing opposite the tonic and IV appearing in the same The most frequent chords in the Kostka-Payne corpus are I,
category as i but opposite [VII. V, and ii, each of which becomes the nucleus of a function. In
Several interesting topics arise from these examples, partic- the McGill-Billboard corpus, the most frequent chords are I,
ularly when comparing them to our earlier Kostka-Payne func- IV, and V. In both cases, the second most frequent chord takes
tions. Notably, because the pop-music model’s parameters on the antitonic role, while the behavior of the remaining
were learned directly from the corpus rather than adapted from chords defines the dynamics surrounding the main tonic/anti-
preconceived three-function model, it yields a functional tonic polarity, exerting influence relative to their frequency. In
326 MUSIC THEORY SPECTRUM 40 (2018)
the Kostka-Payne corpus, ii’s frequent preparation of the dom- practices, resulting in suboptimal peripheral functions. These
inant creates the predominant category, while in the McGill- cases reveal a significant weakness of the HMM approach: its
Billboard corpus, V’s role both before and after IV produce the underlying assumption is that a single set of probabilistic rules
T1 and S1 functions. governs the entire training set. In this case, different segments
This distribution of V over two functional categories of the corpus seem to be governed by different syntactic princi-
brings us to the second major difference between the two cor- ples, and the resulting model is tailored only to that segment
pora: where common-practice syntax contains a “dominant” with the greatest representation. Minority segments are rele-
category that acts as both antitonic and pretonic, these two gated to the periphery. We hypothesize that training separate
roles are differentiated in pop syntax, where the pretonic posi- HMMs on individual subdivisions would yield more robust
tion can be occupied either by an S chord, with antitonic and syntactic models for minor-key music, jazz-influenced music,
pretonic functions coinciding, or by an S1 chord that acts as and other distinct styles. In what follows, we examine these
pretonic and usually follows an antitonic S. issues in a completely uniform corpus, that of the Bach
Finally, our modeling method uses only contextual infor- Chorales.
mation with no knowledge of scale-degree overlap or voice-
leading similarity. The fact that purely contextual data can MODELING THE BACH CHORALE CORPUS: thirteen FUNCTIONS,
produce a workable functional model is notable in and of itself. AND SYNTACTIC DISSONANCE
It is not at all obvious that a content-blind method would be
able to reproduce workable categories at all. For instance, it Consider Examples 16(a) and (b), two excerpts from BWV
was noted that the method models the Kostka-Payne corpus 146 and BWV 402, transposed to C major for ease of compar-
with functional categories very similar to how the authors ison. The first excerpt will give a listener familiar with Bach’s
themselves describe tonal dynamics; similarly, the pop/rock chorale style momentary pause: something about the initial
results reproduce observations made by several theorists of chord progression seems unusual. In fact, in our corpus (de-
pop-music function. scribed below), only 21 of Bach’s 2,130 V7 chords are immedi-
This investigation did, however, show the difficulties that ately preceded by vi chords. In contrast, V7 is immediately
arise when analyzing a corpus containing multiple styles and preceded by IV 177 times: Example 16(b) therefore seems
chord context and function 327
somewhat more idiomatic. Similarly, the tonic expansion in then determined by comparing the eight windows containing
Example 16(c) seems extremely idiomatic, with the chord that chord (excluding ambiguous windows). The key of the
marked with a X, {^1, ^2, ^4, ^5}, prolonging tonic. Indeed, this window having the highest confidence value was taken as the
sonority occurs between two I chords 206 out of the 392 times key of the chord. Stretches of chords in a single key were used
this sonority appears. In contrast, {^1, ^2, ^4, ^5} provides a con- as the observations for the HMMs.
trapuntal prolongation of a ii7–V7 progression only five times, Unlike the constrained vocabularies of the earlier examples
one instance of which (from BWV 244) is shown in (the Kostka-Payne corpus uses 12 chords, the McGill-
Example 16(d). Billboard uses 68), this method produced a vocabulary of 329
Notably, these constraints in Bach’s choices are not en- distinct salami-slice types. Relative to the corpus size, this vo-
(b) illustrate further peculiarities of the model. In Example The three-function solution, then, produces a model with
19(a), apparent fourth-inversion I9 chords—verticalities that traditional tonal relationships along with some nontraditional
pass between two tonic triads—function as D, since these instantiations of those relationships. The asymmetry of the
sorts of chords always progress to T chords. Notice that even transitions between functions is familiar: the P–D transition is
though the second of these I9 chords does not proceed to I, it unidirectional, while T–P and T–D transitions are basically
does go to vi, a chord that may function as T. Similarly, the bidirectional, with a disposition for T–P–D–T motion.46
penultimate verticality of Example 19(b), {^1, ^4, ^6, ^7}, always However, the model sometimes applies these functions and
proceeds to I or vi chords and is, therefore, analyzed as D.
Both excerpts also include “predominant” V chords: the
model recognizes that V often precedes chords that them-
46 This results in a very messy predominant function. While 70% of tonic
selves precede T chords. In both examples, the V chord is chords are I triads and more than 35% of dominant chords are V triads,
followed by the addition of ^4 to the texture, creating the viio the most frequent predominant chord, IV, only instantiates that function
and V7 slices, respectively. around 16% of the time.
chord context and function 329
transitions to slices that we would not ordinarily consider as Example 24 shows the models in analytical action. The
function-bearing chords. three-state analysis assigns hidden states in a way that conforms
A distinct advantage of the thirteen-function solution (see to our intuitions, moving through two complete P–D–T cycles.
Ex. 20) is that it adds new functional categories that accom- Functions are sometimes instantiated by more than one chord,
modate such slices. This model retains the basic tonic-centric and the tonic of the first cycle is expanded by a passing domi-
circular flow we have seen in many of our HMM solutions, nant. The thirteen-state analysis distinguishes between the non-
but adds parallel pathways and detours. Example 21 illustrates tonic functions of the first measure and those of the second.
some idiomatic progressions of these pathways. The top of The initial measure uses the weak predominant (p) and domi-
Example 20 shows T (tonic), a function which contains mostly nant (d) functions, and both p and T are prolonged by their
I triads along with several vi and iii chords. Tx (tonic expansion) corresponding expanding functions. The cadential part of the
is composed of slices resulting from passing and neighbor mo- phrase uses the strong predominant (P) and dominant (D)
tion between two T slices, including the nontriadic {^ 1, ^ 4, ^
2, ^ 5} functions. The phrase, on the whole, moves first through the in-
of Examples 16(c) and 21(a). Example 21(b) shows the pri- ner loop of Example 20, and then through the outer loop.
mary pathway of “strong” functions around the outside of the Example 25 investigates several other functions within the
diagram, so named because of their phrase-ending cadential thirteen-state model. The phrase as a whole begins in E major
function. Here, T first moves to P (strong predominant), to and leads to what turns out to be a fleeting tonicization of the
D—the strong dominant—and then to D1, the late dominants, relative minor. Our model’s thirteen-state analysis shows three
comprised mostly of V7). The cadential progression I–ii–V8–7 functions (T, p, and R), each prolonged by their expanding
would traverse this outer pathway, as would the V-chord sus- functions. The tonicization of the submediant is treated as a
pension of Example 21(b). within-key phenomenon.
The inner cycles show progressions that tend to precede ca- Example 26 highlights the remaining functions available to
dential progressions, or the “weak” functions. Example 21(c) the thirteen-state system. The first measure involves a tonic
shows such a progression. First, the I42 chord functions as a late
tonic T1, a passing function that progresses from T to p, the
middle-of-phrase weak predominant. The px function (predom- 47 There are two further important characteristics concerning the x/ii func-
tion. First, in the thirteen-state model, this function involves both ii and
inant expansion) then prolongs the p function with its passing
its applied chords, while a fourteen-state model divides these two catego-
chord. Finally, the model involves three tonicization detours. ries (just like R and Rx in the thirteen-state model). This functional divi-
Example 22 shows how V, vi, and ii can receive their own ton- sion is the primary difference between thirteen- and fourteen-state
icization functions, and Example 23 summarizes the thirteen models. Second, it is notable that the models distinguish between ii as a
functions.47 tonal area and ii as a predominant chord.
330 MUSIC THEORY SPECTRUM 40 (2018)
example 21. Some representative instantiations of the thirteen functions: (a), (b), and (c)
expansion using first a late tonic followed by a p-functioning hand, the fact that the model does not recognize bass notes might account
iv7 chord.48 The second measure involves a dominant for the model’s apparent difficulty with I chords preceding cadences. But,
if this progression happened often enough, we would imagine that the
HMM would categorize those I chords that follow ii and IV chords and
48 Note that the following passing I chord is labeled as a T: this is because precede V7 chords as D or P functions. However, Bach simply does not
the model labels all I chords as T due to the overwhelming frequency of I use enough cadential 64 s for the model to incorporate this behavior. Only
chords occurring in the early tonic context. Another notable issue of the 30 of Bach’s 455 I–V progressions are in the proper inversion to be con-
model struggling to analyze a I chord involves cadential 64 chords. On one sidered cadential candidates.
chord context and function 331
expansion and late dominant function, and the final eighth and by the HMM method itself. The expectation-maximization
note of that measure shows the “early tonic” capacity of the Tx procedure for training the HMMs produces a model that
function, intervening between the cadential V7 (D1) and the assigns a high probability to observation sequences in the train-
cadential tonic (T). The second half of the third measure ing set, and this preference favors models that make strong pre-
briefly tonicizes the minor supertonic, something the thirteen- dictions about the corpus’s most frequent chords. High lexical
state model labels as a patch of sx states.49 This model treats probabilities associated with common chords and high transi-
tonicizations of ii differently from tonicizations of V and vi. tion probabilities associated with common hidden-state progres-
Each of the latter two Stufen has separate states for the tonic- sions will result in high probability estimates for observation
izer and the tonicized, but tonicizations of ii are rare enough sequences. In other words, because a hidden state that is equally
that the model only learns a single state for all chords involved likely to produce any of a handful of observable chords will re-
in tonicizations. sult in low probability estimates for any of those chords, the
Baum-Welch algorithm prefers high lexical and transition prob-
COMPARING AND CONTRASTING THE MODELS: THREE
abilities associated with the most frequent hidden states. This
GENERALIZATIONS
corresponds to a minimization of lexical diversity in the most
common hidden states: note that the larger pie charts in the
The differences between the models makes it difficult to gener- examples tend to have fewer, larger slices.
alize about context-oriented harmonic function, and the fact However, several characteristics arise neither because of the
that we present only three corpora makes generalizations about properties of HMMs nor the Baum-Welch algorithm, but be-
musical syntax even more suspect. Although our discussion so cause of the properties of the corpus. These recurrences are no-
far has primarily emphasized the differences between the con- table not only because they arise within all the corpora
textual regularities of each corpus, our models have some com- investigated, but because there is no reason to expect our
monalities that are shaped both by the properties of the corpora methods to produce models with these characteristics.
First, each model’s top two most frequently occurring
49 Recall that sx also involves ii itself. In the fourteen-state model, sx is di- chords create two functions that strongly associate with one
vided into two states, one of which is solely ii, while the other comprises another in some way (e.g., the I and V chords in the Kostka-
applied chords that move to ii. Payne and Bach corpora, and the I and IV chords in the
chord context and function 333
McGill Billboard corpus). In each case, the most frequent pie slices show the proportional frequency with which each
chord, I, creates a recognizable tonic function and, as such, scale degree occurs in each function: pies with fewer, larger sli-
creates an important pillar in the functional system. The sec- ces are dominated by fewer scale degrees. The pies are ordered
ond most frequent chord, then, provides a secondary pillar, from left to right by their ascending common-tone scores, indi-
and its relationship to tonic creates one of the most defining cated on the y axis. The score is a measure of the degree to
transitions of the model. We will refer to this general property which chords in the function share scale degrees. It is not a
as the tonic/antitonic dichotomy: each corpus has two hidden raw count of common tones, but a scaled and normalized mea-
states built around two most frequent chords in the corpus. sure: a score of zero corresponds to the average number of
Second, each corpus has a unidirectional relationship be- common tones between any two chords randomly chosen from
tween a state or states that move into tonic but to which tonic the corpus. A positive score for a function means that chords
progresses less frequently. We could imagine this as the within that function have a greater than average number of
“cadential progression”; however, not wanting to import the common tones; a negative score means a lower than average
meanings inherent in that term, we will adopt the term number of common tones. The score is scaled in terms of stan-
pretonic/tonic relationship. In the Kostka-Payne and Bach cor- dard deviations of the distribution over the corpus as a whole
pora, this dynamic involves the dominant states progressing (what statisticians call a z-score).50
to the tonic states, and in the popular music corpus the S1– Nearly every function in each corpus has a positive
T progression instantiates this relationship. Notably, while common-tone score, indicating that a small number of scale
these dynamics overlap with the tonic/antitonic dichotomy degrees dominates each function. In the Kostka-Payne corpus,
for the Bach and Kostka-Payne corpora (the dominant-tonic for instance, the T hidden state is dominated by ^1, ^3, and ^5,
“cadence”), the two relationships do not overlap in the pop- while the P hidden state uses primarily ^ 1, ^
2, ^4, and ^6. At first
music corpus: the unidirectionality of the S1–T “cadence” is glance this is not a surprising finding; however, recall that our
not the same as the S–T bidirectional tonic/not-tonic models are strictly context-based and thus entirely blind to
dichotomy. scale-degree content. In fact, a closer look at the distributions
Finally, each model contains less-probable pathways that, seems to suggest that certain scale-degree combinations seem
rather than constituting their own systems, expand the pri- to connote certain functions. In the Kostka-Payne corpus, for
mary tonic–antitonic or tonic–pretonic pair. These pathways instance, ^4 prominently occurs in two functions; however,
comprise the prolongational networks that either act as precur- when it occurs alongside ^ 6 it would likely appear in the P
sors or successors to the model’s most frequent pathways. For function, while if it appeared with ^ 7, that sonority would likely
instance, adding S1 between S and T in the McGill- be classified in the D function.51 Despite this limitation, our
Billboard model prolongs the tonic–antitonic dichotomy simi- models reveal a deep connection in our corpora between a
lar to the way that the P and P– prolong the T–D/ chord’s scale-degree content and its contextual tendencies.
T–transitions of the Kostka-Payne model. On a larger scale, Based on these results, the basic principle of harmonic func-
we could even imagine the “weak” inner pathways of the thir- tion that identifies function with scale-degree content seems to
teen-state Bach model prolonging the cadential “strong” pro- hold in most cases.
gressions of the outer pathways that provide the model’s
antitonic/pretonic function.
50 The process did not compare identical chords to one another. Our test is
designed to ask the question, “How much overlap exists between noniden-
COMMON TONES, FREQUENCY, AND WHAT IT MEANS TO BE A
tical chords within a function?”, removing any effect that a single chord’s
FUNCTION
overwhelming frequency within a particular function might have. We
have represented this property in terms of standard deviations (a z-score)
Example 27 shows several pieces of information concerning in order to use the same scale between corpora.
the scale-degree content of each corpus’s hidden states. The 51 This topic is taken up in more depth in Quinn (2017).
334 MUSIC THEORY SPECTRUM 40 (2018)
But content and context do not always coexist peacefully. model. These cases are, however, in the minority: in fact, con-
The D/T– category in the Kostka-Payne corpus has a lower- text- and content-oriented functional theories seem to overlap
than-average common-tone score: chords that progress to T more than we might at first expect.
can have various scale-degree structures. Nowhere is this more But what of the hierarchies and qualities we usually associ-
evident than the “dominant” IV of the Kostka-Payne corpus: ate with functions? For instance, we usually couple tonic with
our models’ analysis in this case suggests that a completely the quality of resolution and accept it as the deepest structural
contextual definition of function might conflate traditional harmony. Our results support previous research that connects
subdominant and dominant functions. The identical logic these sorts of qualities and hierarchies with the frequency with
applies to the T1 function in the popular music corpus, and which we hear stimuli. Over the past several decades, the work
the sx and R functions in Example 20: in all these cases, the of many researchers in the field of music cognition, most nota-
diversity of chords within these categories creates a disjunction bly that of Krumhansl, Huron, and Aarden,52 has made a
between a content-oriented theory and our context-oriented compelling case that our perceived hierarchy of scale degrees is
chord context and function 335
due to the frequency with which we hear them in music, recommendations concerning the general discourse surround-
especially at the ends of phrases. Their research has shown ing harmonic function.
that the most frequent scale degree, tonic, is imparted with a First, our models and analyses have shown degrees of sub-
feeling of resolution and hierarchical superservience; in con- tlety, stylistic regularities, and syntactic categories that are not
trast, chromatic scale degrees occur very infrequently and feel captured by traditional three-function harmonic theory. This
hierarchically subservient, unsettled and unresolved.53 It fol- can stem from the decoupling of proteinic and antitank roles
lows that the same might be true for harmonic functions. Our (as in the popular music repertoire). It can also result in the di-
model’s tonic categories may be more fundamental or struc- vision of predominant into more than one syntactic category
tural simply because they are more frequent. Antitonic chords (as in the P– chords in the Kostka-Payne corpus), the articula-
may be viewed as hierarchically secondary inasmuch as they tion of unique passing and neighboring functions (as in the
participate in the second-most-frequent function. Similarly, Bach corpus), or even some other contextual regularity not
the pretonic functions are only hierarchically subservient or addressed in the current work.
points of highest tension because they precede the most- Second, we suggest that discussions of harmonic function
frequent function. In sum, relative frequency produces a sense be moored to particular repertoires and be sensitive to differen-
of relative stability. ces between corpora. As theorists and teachers, we should em-
phasize the syntactic norms of a corpus rather than universal
rules. Instead of a tonic–subdominant–dominant paradigm
CULTURAL AND PEDAGOGICAL IMPLICATIONS that reaches from the Baroque to the Beatles, we might em-
phasize the constrained categories of chords that Bach deploys
Importantly, we are advocating for a theoretical framework at phrase endings, versus the recurrent riffs in rock music.
rather than any particular instantiation of that theory. We be- Assuming the three Riemannian functions as harmonic uni-
lieve that contextual modeling can provide insights into the versals is at best an oversimplification and at worst culturally
concept of harmonic function but remain open to revisions, hegemonic. Importing a model associated with the German-
caveats, and addenda to our specific models. Our findings language common practice inspired by Hegelian dualism onto
therefore apply more to how we approach teaching or discus- other culturally specific repertoires problematically asserts the
sing harmonic function rather than to precisely what to teach power of one culture over another. By deriving our models
or discuss. For instance, we readily acknowledge that any from preexisting and wide-ranging corpora, our methods pro-
classroom-oriented theory of harmony should involve bass vide a way to sidestep the difficult implications of using a
lines and melodic progressions, both absent in the current “common practice” as the main driver of theory and as a yard-
models. However, we will close by offering two stick for tonal norms.
.9). The reader is encouraged to experiment with the model, string. In order to claim that a model is generalizable and
proposing observations and adducing the hidden states. not over fitted to the peculiarities of a particular observation
HMMs can also derive their parameters from a body of string, the training and test corpus should not be the same
data. A model can compile lexical and transition probabilities data set. For the current article, we use the Viterbi decoding
by tallying how frequently observations accompany hidden algorithm for this task (again, formalizations appear in
states and how often the hidden states follow one another. Jurafsky and Martin 2008).
Given a body of data connecting air conditioner sales with As an example, consider the “good” two-state HMM and
temperature or typos with intended spellings, a programmer toy corpus in Example A4. The example includes (a) the lex-
could model those connections using an HMM. ical probability table, (b) the transition probability matrix,
However, in the current article, we begin only with a series and (c) the probabilities assigned by model to the observation
of observations—chords—and attempt to find a model to as- string. Here, the model includes a category A that has a
sess the unobserved hidden states—functions—not even 100% probability of manifesting I chords, a 50% probability
knowing how many hidden states underlie the observations. to transition to itself, and a 50% probability of transitioning
To do this, we use an iterative process of trial and error to to state B. B, on the other hand, shows V half the time and
determine which models best explain the observation series, V7 the other half of the time; B transitions to A 100% of the
called the training corpus. One could imagine, for instance, time. The probability for the sequence would be the product
that if we tried to explain an observation sequence that only of each lexical and each transition probability. (Note that V
ever moved from x to y to z back to x, the model in and V7 are grouped into the same hidden state not because
Example A2 would perform poorly. The (admittedly trivial) of their common tones, but because they occur in the same
model of Example A3 would return higher probabilities, contexts.) Compare this model to Example A5, and the
namely always 100%. This process of tweaking the transition “poor” two-state HMM. The toy solution now groups V7
and lexical probabilities within a model to increase the fit be- with I in state A, and both states move between one another
tween the model and the observations is called expectation with equal probability. The transition and lexical probabilities
maximization. In this paper, we use the Baum-Welch expec- of observed and hidden states given the model are lower
tation maximization algorithm, which cycles through possible (worse) than those in the Good two-state solution. Because
lexical and transition settings, trying to improve the probabil- the good solution has a higher probability than the poor solu-
ity of the analysis given the model for each iteration. The al- tion, an expectation maximization process would discard the
gorithm returns a final model either after a certain number of latter in favor of the former. In our modeling, we will prefer
cycles, or after it starts returning diminishing improvements solutions like Example A4 over Example A5 for this reason.
(this being a parameter that can be set by the analyst). While the comparison between the Good and Poor two-
After training, the HMM can then be used to analyze a state models illustrates how the algorithm chooses between
test corpus. During the testing phase, an HMM assigns the different possible solutions, the process also must choose be-
most probable Hidden States to the test corpus’ observation tween different numbers of hidden states. That is, how would
335c MUSIC THEORY SPECTRUM 40 (2018)
State A B
Observed I 100% 0%
Chord
V 0% 50%
A B
A 50% 50%
B 0% 100%
Observations: I V7 I I V I I I V7 I I V I I I V7 I
Good 2-State: A B AABAAABAABAAABA
Lexical Probs: 1 .5 1 1 .5 1 1 1 .5 1 1 .5 1 1 1 .5 1 = .0625
Transition Probs: .5 1 .5 .5 1 .5 .5 .5 1 .5 .5 1 .5 .5 .5 1 = .00049
we know that two states optimally underpin the observation To quantify this, we use a measurement called silhouette
string, as opposed to three, four, or more? width, a calculation used in cluster analyses. The silhouette
Consider the two hypothetical Good four-state analyses in width of point i can be found first by computing a(i), the av-
Example A6. Both distinguish between I, V7, and V as dif- erage distance between i and each other point in its cluster.
ferent functional entities (states A, B, and C), and both iden- The a(i) therefore tells us how well i is matched with other
tify the fourth state, D, as one which outputs I chords that points in its own cluster. We then compute b(i), the average
occur in particular relationships to states B and C. In the first distance between i and the points in its closest neighboring
model, D occurs before B or C, and in the second it occurs cluster. The b(i) therefore tells us how distinct i is from its
after. If the lexical and transition probabilities were constant closest cluster. The silhouette width then subtracts a(i) from
between both models and the latter only differed from the b(i) and divides by the larger of the two. Assuming the point
former in whether D proceeds to or from B and C, the prob- is situated in the best cluster, the number will always be be-
ability of the sequence would be identical between both mod- tween zero and one (if the point is in the wrong cluster the
els. It would not be clear which model represented the better number will be below zero). The closer the value is to one,
option for an expectation maximization procedure. Because the better the fit to its cluster. For an overall measure of an
there is no obvious solution, the algorithm would sometimes entire clustering solution, the average silhouette width simply
produce the first solution, and sometimes the second. This averages of the silhouettes of all the individual points.
lack of consistency suggests that four is not an ideal number The current article uses each model’s analyses of the hidden
of states to produce this observation sequence. In our model- states of the test corpus to create our cluster analyses. For
ing, we will prefer solutions like Example A4 over Example each chord in the test corpus, every HMM with a particular
A6 for this reason. cardinality of hidden states assesses the chord’s state, and
chord context and function 335d
State A B
V 0% 80%
A B
A 50% 50%
B 50% 50%
Observations: I V7 I I V I I I V7 I I V I I I V7 I
Poor 2-State: A A ABBAAB B ABB AA B A A
Lexical Probs: .8 .2 .8 .8 .2 .8 .8 .2 .8 .8 .2 .8 .8 .8 .2 .2 .8 = .0000055
Transition Probs: .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 .5 = .0000153
Observations: I V7 I I V I I I V7 I I V I I I V7 I
Good 4-State #1: A C ADBAADC ADBA AD C A
Good 4-State #2: A C AABDAAC DABD AA C D
example a6. Toy Corpus no. 1 analyzed by two Good four-State HMMs
that sequence of assessments is used to produce a dissimilar- particular iii chord in the test corpus might have a vector
ity matrix, which is then in turn used for a k-means cluster more similar to tonic, while another might be more similar
analyses where k equals the number of hidden states used in to the dominant. Each observation’s vector—each chord’s
the HMM. This process allows us to measure consistency in string of hidden states—within the test corpus is placed in a
a situation where “the first hidden state” might mean some- matrix, where the dissimilarity between each observation is
thing different for each training/test session. For instance, if calculated. The dissimilarity between [ADBCA] and
we were modeling five runs of the HMM process using four [BACBB] would be five, since there is no overlap. The dis-
hidden states, a possible vector might be [ADBCA], suggest- similarity matrix now captures units of difference between
ing that the first HMM assigned the slice to state A, the sec- how the models analyze every observation in the test cor-
ond to state D, and so on. If this were a I chord, one might pus.54 The cluster analysis is then performed using this ma-
imagine that most other I chords would receive the same or trix, dividing the observations into the number of hidden
similar analysis vector. Again, note that the state assignments states within the constituent HMMs. The result is an
are arbitrary: “A” might be “tonic” in one HMM, while “D”
could be tonic in another. A “dominant” slice would then 54 The current work squares the dissimilarities, simply to make the differen-
have a completely different vector, say, [BACBB]. A ces more pronounced, which in turn makes the data easier to work with.
335e MUSIC THEORY SPECTRUM 40 (2018)
aggregated picture of which chords tend to be analyzed by diagram and report are summations of the agreement be-
which hidden states. tween the 300 individual models, and—since the clusters
Example A7 shows the silhouette widths that result from capture the emergent properties of the constituent HMMs—
the article’s Kostka-Payne HMMs. The widths capture the we create our final models by treating the clusters like com-
amount of agreement—the tightness of the clustering—of posite functions. Since our clustering involves actual chords
the 300 models produced for each number of hidden states. within our test set, we treat each cluster of chords as exem-
Generally, the widths need to be read as relative to those plifying its own function. The transitions between the clus-
around them since fewer clusters will tend to produce higher ters’ constituent chords constitute the lexical probabilities,
overall values. Therefore, we look for peaks within the con- and the chords associated with each cluster become the emis-
tour: a peak will mean that the models improved their consis- sion probabilities.
tency compared to the predicted decline when adding Examples A8 and A9 show the silhouette widths of the five
clusters. (NB: there is no standard way of quantifying these different test/training pairs for the popular music and Bach cho-
peaks.) In Example A7, the widths peak at four clusters/ rale studies. If peaks are reproduced over several trials, we con-
states, indicating this provides the most consistent group of sider the clustering robust. Note that Example A8 has consistent
models, and therefore the preferred number of states for this peaks at k ¼ 8, while A9 has a consistent peak at k ¼ 3, and a
corpus is four. Throughout the article, the models we more subtle peak around k ¼ 13 or 14.
chord context and function 335f
APPENDIX B
PROBABILITY TABLES
D/T– T P- P End
example b1. Transitions between hidden states of Kostka-Payne four-function model. Table sums to 100%, background colors are graded
to reflect increasing probabilities.
Function
chord root D T P– P
^
#1 0.48% 0.00% 10.71% 0.00%
^
2 1.90% 0.00% 4.76% 65.00%
^
b3 0.00% 0.00% 0.00% 1.67%
#2
^
#4 1.43% 0.00% 1.19% 6.67%
^
5 70.00% 0.00% 0.00% 0.00%
^
b6 1.43% 1.53% 3.57% 3.33%
#5
^
6 1.43% 1.53% 51.19% 0.00%
^
7 12.38% 0.51% 1.19% 0.00%
example b2. Roots in Kostka-Payne associated with each hidden state (lexical probabilities). Columns sum to 100%.
335g MUSIC THEORY SPECTRUM 40 (2018)
P Q R T S U X W End
example b3. Transitions between hidden states of McGill-Billboard eight-function model. Table sums to 100%, background colors are
graded to reflect increasing probabilities.
chord context and function 335h
Functions
chords P Q R T S U X W
example b4. Chords in Kostka-Payne associated with each hidden state (lexical probabilities). Columns sum to 100%. Only chords that
occur thirty times or more within the corpus are shown (>.05% of the corpus’s chords).
335i MUSIC THEORY SPECTRUM 40 (2018)
^ ^ ^
{ 1, 2, b7} 0.0% 0.0% 0.0% 1.2% 0.0% 0.2% 0.0% 4.0%
D P T End
example b5. Transitions between hidden states of Bach Chorale three-function model. Table sums to 100%, background colors are graded
to reflect increasing probabilities.
chord context and function 335j
^ ^^
Functions { 1 , 4, 5} 1.2% 1.7% 0.0%
^ ^ ^
iii 1.9% 3.0% 4.0% { 1, 5, 6} 0.0% 0.5% 1.5%
^ ^ ^
{ 1, 2 , ^5}
^ ^
1.2% 3.5% 0.0% { 2, 4, 5} 1.6% 0.0% 0.0%
^ ^ ^ ^
vi 7 0.0% 4.1% 0.2% { 1, 4, 6, 7} 1.4% 0.2% 0.0%
example b6. Chords in Bach Chorales associated with the three hidden states (lexical probabilities). Columns sum to 100%. Only chords
that occur nineteen times or more within the corpus are shown (>.05% of the corpus’s chords).
335k MUSIC THEORY SPECTRUM 40 (2018)
T+ R Px p d T Tx P D sx Rx D+ Dx End
T+ 0.2% 0.5% 0.3% 3.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.3% 0.0% 0.0% 0.3% 0.0%
R 0.1% 0.0% 0.9% 0.4% 0.1% 0.6% 0.0% 0.2% 0.5% 0.2% 1.1% 0.0% 0.2% 0.1%
p 0.0% 0.0% 2.6% 0.0% 3.6% 1.2% 0.0% 0.5% 1.0% 0.1% 0.0% 1.0% 0.1% 0.4%
d 0.0% 0.2% 0.1% 0.1% 0.0% 3.7% 0.1% 0.2% 0.0% 0.2% 0.1% 0.1% 0.0% 0.0%
T 3.4% 1.0% 0.1% 3.9% 0.1% 0.0% 3.2% 4.6% 3.2% 0.6% 0.3% 0.5% 0.1% 1.1%
Tx 0.1% 0.0% 0.0% 0.0% 0.0% 4.0% 0.2% 0.2% 0.1% 0.0% 0.0% 0.1% 0.0% 0.0%
P 0.0% 0.0% 0.0% 0.0% 0.1% 0.2% 0.0% 1.3% 5.5% 0.0% 0.1% 0.4% 0.1% 0.1%
D 0.3% 0.4% 0.0% 0.5% 0.0% 5.3% 0.2% 0.0% 0.1% 0.6% 0.2% 4.5% 1.8% 0.4%
sx 0.0% 0.3% 0.2% 0.4% 0.2% 0.2% 0.0% 0.2% 0.4% 2.9% 0.3% 0.3% 0.1% 0.4%
Rx 0.0% 1.6% 0.0% 0.5% 0.0% 0.1% 0.0% 0.0% 0.1% 0.1% 0.9% 0.0% 0.1% 0.3%
D+ 0.0% 0.1% 0.0% 0.0% 0.0% 6.3% 0.9% 0.0% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0%
Dx 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 2.5% 0.0% 0.2% 0.3% 1.1% 0.1%
example b7. Transitions between hidden states of Bach Chorale thirteen-function model. Table sums to 100%, background colors are
graded to reflect increasing probabilities.
chord context and function 335l
functions
chords T+ R Px p d T Tx P D sx Rx D+ Dx
I 0.0% 0.0% 0.0% 0.0% 0.0% 87.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.6%
V 1.0% 0.0% 0.0% 0.0% 0.0% 0.0% 2.5% 0.0% 84.8% 0.0% 0.0% 0.0% 0.0%
V7 0.0% 0.0% 0.0% 0.0% 1.0% 0.1% 0.0% 0.6% 0.2% 0.0% 0.0% 86.0% 0.0%
vi 0.5% 97.3% 4.3% 1.2% 0.0% 2.8% 2.5% 0.0% 0.0% 0.0% 0.0% 0.0% 1.2%
ii 0.0% 0.0% 5.9% 12.7% 1.0% 0.0% 0.0% 0.3% 0.0% 44.0% 0.0% 0.0% 0.0%
ii7 0.0% 0.0% 0.0% 1.6% 7.9% 0.0% 0.0% 29.3% 0.0% 3.3% 0.0% 0.0% 0.0%
iii 7.2% 2.2% 4.8% 0.0% 0.5% 4.9% 3.0% 0.6% 5.0% 0.4% 4.5% 0.0% 1.2%
vii 3.6% 0.0% 0.0% 0.0% 39.9% 0.0% 6.0% 0.0% 0.2% 0.0% 0.0% 1.9% 0.0%
IV7 0.0% 0.0% 23.7% 2.6% 0.0% 0.0% 0.0% 8.1% 0.0% 0.0% 0.0% 0.0% 0.0%
^ ^ ^
{ 1 , 2 , 5} 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 4.0% 17.1% 1.5% 0.0% 0.0% 0.0% 2.3%
vi 7 0.0% 0.0% 27.4% 3.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 4.1%
vii 7 0.0% 0.0% 0.0% 0.0% 25.6% 0.0% 0.0% 0.0% 0.7% 0.0% 7.1% 0.0% 0.6%
I7 28.4% 0.0% 0.0% 0.0% 0.0% 0.0% 1.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.6%
^ ^ ^^
{1, 2, 4, 5} 0.0% 0.0% 0.5% 0.0% 0.0% 0.0% 3.0% 15.0% 0.0% 0.0% 0.0% 0.3% 0.0%
example b8. Chords in Bach Chorales associated with the thirteen hidden states (lexical probabilities). Columns sum to 100%. Only
chords that occur nineteen times or more within the corpus are shown (>.05% of the corpus’s chords).
335m MUSIC THEORY SPECTRUM 40 (2018)
I9 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 23.5% 0.0% 1.5% 0.0% 0.0% 0.0% 0.0%
I add 4 2.6% 0.0% 0.0% 0.0% 0.0% 0.0% 20.0% 0.0% 1.9% 0.0% 0.0% 0.0% 0.0%
iii 7 19.1% 0.0% 0.5% 0.0% 0.0% 1.3% 1.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.6%
^ ^ ^
{1, 4, 5} 0.0% 0.0% 0.0% 0.0% 2.5% 0.0% 5.5% 7.8% 0.0% 0.0% 0.0% 1.3% 0.0%
IV9 6.2% 0.0% 9.1% 0.9% 0.5% 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 0.0% 0.0%
V/vi 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 22.7% 0.0% 0.0%
^ ^ ^
{ 1, 2, 4} 0.0% 0.0% 0.0% 2.8% 0.0% 0.0% 6.0% 2.2% 0.0% 0.0% 0.0% 0.0% 0.0%
vii/vi 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1.6% 0.0% 0.0% 0.0% 0.0% 14.0%
V7 no 5th 0.0% 0.0% 0.0% 0.0% 2.5% 0.0% 3.0% 0.0% 0.2% 0.0% 0.0% 4.8% 0.0%
V7/IV 13.4% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
V7/vi 2.1% 0.0% 3.8% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 7.0%
^ ^ ^
{ 1, 5, 6} 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 15.6% 0.0% 0.0%
V add 4 5.7% 0.0% 0.0% 0.0% 0.0% 0.0% 2.5% 0.0% 0.0% 0.0% 0.0% 0.0% 3.5%
^ ^ ^
{ 1 , 2 , 6} 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 6.2% 0.0% 0.0% 0.6% 0.0% 0.6%
^ ^^
{ 2 , 4, 5} 0.0% 0.0% 0.5% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 11.7%
^ ^^ ^
{1, 4, 6, 7} 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 4.5% 0.3% 0.0% 0.8% 0.0% 2.9% 0.0%
V/iii
0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 8.7% 0.0% 0.0% 0.0%
^ ^ ^
{2, 5, 6}
1.0% 0.0% 0.0% 0.0% 9.4% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
v
1.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 1.5% 3.7% 0.0% 0.0% 0.0%
V7/ii 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.4% 0.0% 0.0% 10.5%
II 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 7.9% 0.0% 0.0% 0.0%
^ ^ ^ ^ 0.0% 0.0% 2.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 5.4% 0.0% 0.0% 0.0%
{2, 3, 4, 6}
vii/ii 0.0% 0.0% 0.0% 0.0% 0.0% 0.6% 0.0% 0.0% 0.0% 4.1% 0.0% 0.0% 0.0%
———. 1990. The Cognitive Foundations of Musical Pitch. Riemann, Hugo. 1893. Vereinfachte Harmonielehre, oder die
Oxford: Oxford University Press. Lehre von den tonalen Funktionen der Akkorde. London:
Lerdahl, Fred. 2001. Tonal Pitch Space. Oxford: Oxford Augener.
University Press. ———. 1900–1901. Pr€aludien und Studien III. Leipzig:
Lester, Joel. 1992. Compositional Theory in the Eighteenth Hermann Seemann.
Century. Cambridge, MA: Harvard University Press. Riepel, Joseph. 1755. Grundregeln zur Tonordnung insgemein:
Loui, Psyche, David Wessel, and Carla L. Hudson Kam. Abermal Durchgehends mit musicalischen Exempeln abgefaßt
2010. “Humans Rapidly Learn Grammatical Structure in a und Gespr€ach-weise vorgetragen. Frankfurt: Krippner.
New Musical Scale.” Music Perception 27: 377–88. Saffran, Jenny R., Elizabeth K. Johnson, Richard N. Aslin,