0% found this document useful (0 votes)
260 views10 pages

Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style

Uploaded by

Ademar Júnior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
260 views10 pages

Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style

Uploaded by

Ademar Júnior
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Word-Patterns and Story-Shapes: The Statistical

Analysis of Narrative Style


J. F. BURROWS
University of Newcastle, New South Wales, Australia

Abstract capable also of assessing the relative importance of such


The thirty most common word-types in many English texts factors in a given case. To take a speculative but not
make up about two-fifths of all the word-tokens used in implausible example, a proper comparison of a number
them. The respective frequencies of the word-types can be of eighteenth and early nineteenth century English tra-
arrayed hierarchically to form a frequency profile for each text. gedies with each other and with their Elizabethan and
When a number of profiles are correlated with each other, the Jacobean forerunners might well show that the linguistic

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
resulting matrix shows appropriate resemblances and differ- conventions of the genre were strict enough to shelter it
ences across the range of texts. The whole pattern of interre-
lationships can then be 'mapped', by applying eigen analysis
from two hundred years of change in the English
to the correlation-matrix. language. Such evidence might justify the conclusion
This form of analysis can be applied to comparisons of that the later tragedies were linguistic fossils. If it also
dialogue with narrative; of the idiolects of different charac- emerged that the earlier tragedies exhibited more
ters; or — in the territory to which the present paper is marked authorial differences, it would be reasonable to
addressed — of narrative with narrative. When several narra- claim that, after a period of innovation and achievement,
tives are compared, authorial and chronological determinants English tragedy slowly faded into a mere imitation of
predominate. When the segments of a single narrative are itself.
compared, even subtler stylistic determinants declare them-
selves and point towards the possibility of a computer- Until such comparisons can be set on a proper
assisted literary criticism. footing, opinion will outrun knowledge. 'Authorship',
for instance, is an elusive concept and the evidence of the
Introduction following pages cannot resolve the contest of faith
between those who believe in a self-generated authorial
Prose fiction, arguably, is the most mixed of literary individuality and those who regard an author as a tabula
forms. Even a single novel is usually a stylistic medley, rasa upon which larger cultural forces inscribe them-
harmonious or otherwise. The language of its dialogue selves. But the evidence to be presented does show that,
and that of its narrative usually differ from each other in whatever our opinion of its origins, a rather stable and
some obvious and many less obvious ways. There are extremely exact 'authorial' force is at work among the
likely to be further differences, on the one side, among cases studied. Even literary theorists are entitled to their
the idiolects of the several characters and, on the other, preferences — entitled to maintain, say, that questions of
between that part of the narrative which renders the authorship do not interest them as much as questions of
thoughts of the characters and that which bears the genre. But such a preference is no sound basis for
action forward. The language of a single idiolect may asserting outright that the concept of authorship is
alter greatly when the character forsakes the ordinary unimportant; that, as Bakhtin has it, 'the individual
interplay of dialogue to chronicle his (or her) life-history artistic personality of the author' (like the particular
or to justify an earlier line of conduct. It may also alter temper of a literary school or that of a literary era) is
when he speaks with one rather than another of his reflected only in 'relatively minor stylistic variations';
associates. It may even exhibit a powerful and sustained and that, in giving their attention to such lesser matters,
development as his attitude and circumstances change. scholars have been distracted from 'the major stylistic
On the other side, once more, that 'pure narrative' which lines determined by the development of the novel as a
bears the action forward can be teased out into its more unique genre'.2 Until we are better equipped to compare
discursive, more descriptive, and more simply mechan- the stylistic properties of authors, periods, and genres,
ical components: and each component has particular their relative importance cannot be assessed and remarks
stylistic attributes. like Bakhtin's must be regarded as polemical flourishes,
Such observations point in the direction of M. M. enthusiastic overstatements of a plausible idea. In pro-
Bakhtin's claim that the novel is distinguished from posing that statistical analysis is a valuable instrument
other literary genres precisely by its capacity for orches- for the large but exact comparisons required for such
trating diverse stylistic forms, some of which have been purposes, I do not suggest that it should supplant those
its own for many centuries and some of which it has other, albeit less exact, instruments with which most
absorbed from elsewhere.1 That potent doctrine will not literary scholars are more familiar.
be tested as closely as it deserves until we develop more It is evident, moreover, that the distinctions drawn in
adequate methods of stylistic comparison — methods my own opening paragraph need to be supported. So far
capable of comparing the languages of different genres, as they bear on the language of characterization,
different historical periods, or different authors and whether in dialogue or in the narrative rendering of the
Correspondence: Professor J. F. Burrows, Department of English, characters' ideas, they are discussed at some length in my
University of Newcastle, New South Wales 2308, Australia. recent book, Computation into Criticism: A Study of Jane
Literary and Linguistic Computing, Vol. 2, No. 2, 1987 © Oxford University Press 1987
Austen's Novels and an Experiment in Method (Oxford, 'you'll' and 'couldn't' are not incorporated with 'you',
Clarendon Press, 1987). The present paper carries the 'could', and 'not' and also in the sense that no distinction
argument somewhat further into the territory of 'pure between homographic forms like 'to' (inf.) and 'to'
narrative' and tries to show that statistical analysis can (prep.) is offered.
cast new light on some important determinants of Even a casual inspection of the list will show that it
narrative style. conforms to certain natural expectations. Within each of
The idea that 'pure narrative' and 'character narrative' the three formal divisions — pure narrative, character
(as I call them) should be distinguished is widely recog- narrative, and dialogue — the six novels show roughly
nized. The subtlety and exactitude with which Jane similar frequency patterns. But, as between the different
Austen manages such instruments as 'free indirect dis- categories, the hierarchy is often disrupted. T , 'you',
course' make a firm, if complex, ground for the exercise of and 'is' stand much higher in the list for dialogue than in
critical judgement. Yet the actual distinctions on which either of the narrative-lists. Simple mechanical differ-
the following evidence rests do rely on my judgement, ences, like these, between the direct speech of dialogue
contestable in every instance, of those moments in each and the indirect forms of expression that predominate in
text at which a transition from one mode of narrative to both kinds of narrative are so marked as to submerge
the other should be marked. The shape of Graph 1, a little any subtler evidence of stylistic difference and to trans-

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
later, shows that such an exercise ofjudgement can yield a form statistical analysis into a spectacular demonstra-
tenable result. tion of the obvious.
The analytical method employed in this paper differs But even if all the asterisked words in the list — the
from those of'stylometrics'. It is more holistic in empha- words whose forms are altered by the difference between
sis and it makes no firm distinction between 'function- direct and indirect speech — are excluded, eighteen
words' and 'content-words', between the grammatical words remain: and those eighteen word-types still make
and the lexical elements of a vocabulary. Although there up more than a quarter of all the word-tokens used in
are undoubtedly words of a more grammatical or a more Jane Austen's novels. Taking them as our basis, let us
lexical orientation, it is not easy and may not ultimately correlate each column of Table 1 with every other, using
be profitable to strive for a categorical distinction. There Pearson's 'product moment' correlation-formula. (It is
may be some virtue in the 'stylometric' premise that worth repeating, at this point, that the correlation-
authorial habits are most clearly reflected in those words coefficients represent the degrees of likeness between
over which an author's control is least likely to be whole columns, treated as frequency-profiles: since these
conscious. For the broader purposes of a computer- columns are not mutually dependent variables, the. prob-
assisted literary criticism, however, the more lexically lem of assessing appropriate orders of freedom can be set
oriented of the very common words ought not to be aside.)
excluded. The resulting correlation-matrix is set out in Table 2.
Except, then, for Graph 1 where special consider- If it is studied in the manner of an inter-city mileage grid,
ations intrude, the evidence offered in this paper will rest the overall pattern of three clusters is easily discerned.
upon differences of frequency-profile in the whole hier- Across the six novels, pure narrative correlates most
archy of the thirty most common word-types of Jane closely with pure narrative, character narrative with
Austen's pure narrative. On the basis of those profiles, character narrative, dialogue with dialogue. Whatever
the pure narratives of her six published novels will be their 'meaning', whatever the lexical or grammatical
compared with each other and with those of novels by properties of the words of which they are composed, the
Henry James, E. M. Forster, Georgette Heyer, and the frequency-profiles are mutually consistent and fall into
anonymous modern writer who completed Jane Austen's an appropriate array.
manuscript-fragment, 'Sanditon'. In every case, the The patterns of interrelationship in such matrices
thirty most common word-types amount, all told, to emerge more clearly and fully when they are graphed.
about two-fifths of all the word-tokens employed. (The Eigen analysis serves this purpose well. In this process,
proportions range from 38.4% in the modern Sanditon the interrelationships among all the correlation-coeffici-
to 41.9% in Pride and Prejudice. In Jane Austen's own ents are resolved into a series of 'eigen vectors', whose
narratives, the proportions range only from 40.6% to respective 'eigen values' indicate their relative contribu-
41.9%.) What goes on in two-fifths of a vocabulary, I tions to the shape of the whole matrix. Vector A, then, is
take it, must shed some light on the whole: the more so a 'line of best fit' for all the coefficients and its eigen-
when that two-fifths arrays the commonest of English value, usually represented as a percentage, shows how
connectives, personal pronouns, auxiliary verbs, and well it fits. Vector B is a line of best residual fit. And so
articles. If the patterning of the connectives testifies to on for each lesser vector. (If the coefficients are envis-
the general ordonnance of a style, that of the pronouns, aged as occupying an egg-shaped field, Vector A repre-
verbs, and articles begins to flesh it out. sents its longest axis, Vector B the next longest, and so
on.) When, as in the specimens examined in this paper,
the first two vectors bear well over ninety per cent of the
Comparisons of the major components of Jane
total eigenvalue, an ordinary bi-axial graph gives an
Austen's novels
adequate impression of the whole system of interrela-
Table 1 lists the thirty most common word-types of Jane tionships.
Austen's six published novels in the descending order of Graph 1 makes it clear that the three groups of entries
their overall frequency. The list is entirely literal in the differ sharply from each other. The cluster for dialogue is
sense that such contracted forms, rare in her novels, as the loosest of the three and diverges furthest from its
62 Literary and Linguistic Computing, Vol. 2, No.. 2, 1987
CD

s
0)

(Q
C
a Table 1: DISTRIBUTION (AS RAH F R E Q U E N C I E S ) OF THE T H I R T Y M O S T
o COMMON WORD-TYPES IN JANE A U S T E N ' S LITERARY V O C A B U L A R Y
O
o
T5 PURE NARRATIVE CHARACTER NARRATIVE DIALOGUE
All NA SS PP MP E P All NA SS PP MP E P All NA SS PP MP E P
5' FN pnl pn2 pn3 pn4 pnS pn6 CN cnl en 2 cn3 cn4 cnS cn6 D dl d2 d3 d4 d5 d6
(Q the 15257 2150 2754 2671 3180 2359 2143 2923 297 111 210 1070 904 331 8130 712 1243 1452 1949 1929 845
to 11320 1231 2267 2039 2416 1828 1539 2849 243 106 219 1044 914 323 9692 76-2" 1720 1881 1970 2439 920
O and 12292 1465 2100 2066 2732 2177 1752 2567 196 70 7572 624 1309 1316 747
194 1047 777 283 1654 1922
of 11316 1535 2244 1968 2290 1702 1577 2563 213 117 195 938 806 294 7281 599 1210 1459 1543 1780 690
p a 6043 815 1079 965 1269 1083 832 1549 140 55 122 533 499 200 5698 574 905 845 1287 1531 556
•her 8644 1224 1990 1645 1788 1155 842 2012 208 71 143 815 591 184 2491 130 483 435 524 740 179
•I 31 19 1 2 7 0 2 111 2 5 14 1 83 6 11910 1 2 6 4 1993 2054 2380 3104 1115
CO •was 7224 796 1312 1341 1615 1276 884 1781 151 65 131 653 577 204 2206 160 484 376 392 549 245
S3 in 5693 747 1151 957 1180 868 790 1327 132 61 118 452 390 174 4115 381 736 792 873 918 415
it 3320 389 650 539 666 640 436 1249 120 44 88 407 441 149 5659 590 1062 907 1199 1448 453
•she 6243 824 1188 1255 1085 1037 854 1807 152 55 124 708 623 145 2152 121 369 330 483 702 147
not 2651 323 509 452 512 479 376 957 96 31 69 317 329 115 4960 553 709 908 1004 1343 443
be 2392 285 464 357 480 434 372 1187 108 45 86 395 415 138 4586 402 789 793 103 5 1127 440
that 3187 379 674 676 514 536 408 1043 80 32 93 411 311 116 3901 339 678 820 755 953 356
•you 2 2 0 0 0 0 0 50 1 . 1 5 1 39 3 7675 915 1190 1354 1633 1959 624
•had 4314 448 681 781 875 785 744 1453 122 45 141 483 435 227 1560 132 274 253 283 407 211
•he 3277 242 572 649 721 657 436 1319 81 48 134 461 418 177 2727 220 494 553 375 736 349
as 3407 384 702 577 773 523 448 752 65 23 74 263 229 98 2886 234 499 536 667 685 265
for 2926 368 619 485 626 454 374 706 50 25 62 249 248 72 2905 305 581 500 644 627 2 48
•his 3456 375 655 743 757 524 402 972 65 61 129 333 280 104 1558 85 309 399 265 347 153
with 3425 431 644 643 739 586 382 578 45 23 49 220 169 72 1913 186 325 360 378 4.6 4 200
but 2163 227 365 402 425 412 332 639 50 20 49 209 233 78 3069 310 503 553 652 797 254
•have 975 112 154 137 243 155 174 543 48 17 48 179 196 55 3749 329 648 661 783 970 358
at 2470 290 507 467 506 396 304 468 40 23 41 164 146 54 1807 183 308 274 377 489 176
•is 128 42 14 11 28 16 17 53 5 1 12 0 34 1 4563 479 743 833 941 1190 377
all 1710 178 313 290 361 284 284 488 28 16 37 184 156 67 1691 141 329 301 329 407 184
very 1237 139 187 169 252 306 184 363 23 12 20 106 163 39 2170 217 301 298 401 743 210
•him 1519 125 287 335 303 266 203 536 29 32 58 182 154 81 1587 127 323 372 232 351 182
•could- 1933 1.9 6 366 307 414 362 288 685 85 28 61 233 206 72 991 83 182 155 213 268 90
by 2159 315 527 384 389 273 271 354 48 13 27 108 102 56 988 101 205 218 192 184 88

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
Tabl* 2: PRODUCT-MOMENT CORRELATION MATRIX DERIVED FROM TABLE 1

PN pnl pn2 pn3 pn4 pn5 pn6 CN cnl cn2 cn3 en 4 cn5 cn6 D dl d2 d3 d4 dS
HA pnl 990
SS pn2 994 983
PP pn3 998 9 86 995
HP pn4 998 983 989 993
E pnS 993 975 979 988 995
P pn6 998 989 989 994 997 993

CN 960 933 961 956 954 963 962


NA cnl 950 941 953 945 939 943 951 981
SS cn2 926 918 946 925 911 906 925 964 963
PP cn3 9S4 924 961 952 949 954 953 993 968 960
MP cn4 967 936 964 964 963 973 968 995 964 944 991
E cn5 940 912 941 934 934 946 943 996 977 961 983 986
P cn6 954 931 958 949 948 955 956 996 985 968 991 987 990

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
D 860 818 861 854 855 878 862 960 941 908 947 944 975 955
NA dl 784 755 780 773 778 810 788 898 899 846 877 873 921 897 975
SS d2 834 781 842 829 832 853 835 946 916 892 940 932 960 940 988 952
PP d3 863 818 877 865 853 869 863 960 937 925 956 947 969 953 985 938 978
MP d4 883 853 879 874 879 899 887 967 960 914 952 950 980 964 993 971 974 968
E d5 822 776 819 814 819 851 827 932 908 871 913 916 952 925 992 976 973 967 979
P d6 902 866 900 895 898 917 905 979 96 4 928 967 967 987 975 991 954 971 977 991 980

NB. For more compact representation, the coefficients are expressed as whole
numbers and not in the usual form of decimal fractions like +0.963

overall norm (indicated by the diamond-shaped marker) existing narrative style, modifying it, or abiding by one
because each entry represents a compound of the idio- of her own making: it does indicate, however, that this
lects of a particular set of characters. Notwithstanding facet of her novels remains comparatively uniform
the somewhat aberrant location of C2 (the entry for the throughout her literary career.
brief and simply rendered character narrative of Sense When, as here, the first two vectors are quite predomi-
and Sensibility), the cluster for character narrative is nant and there are few aberrant entries, these eigen-maps
tight enough to justify my claim that Jane Austen usually tend to assume the shape of outstretched wings. Like the
makes it possible to distinguish between her two main overall entry for character narrative, which arrives there
modes of narrative. Not unexpectedly, the entries for as a hybrid, the most typical entries lie at the lower
pure narrative make the most compact of the three centre. Any genuinely aberrant entries tend towards a
clusters. For pure narrative does not display such no-man's-land in the upper centre, beyond the location
marked internal differences as arise when the voices or of C2. And those entries, like Dl and PI, which lie at the
the ideas of different characters are to be heard. Graph 1 wing-tips represent the extreme expressions of the lead-
does not indicate whether Jane Austen is adhering to an ing differentiae. In Northanger Abbey, that is to say, the
differences between dialogue and pure narrative are
more strongly marked than in any of Jane Austen's later
. Graph 1 novels. (As the location of D6 and P6 indicates, those
differences are least marked in Persuasion.) These results
Pure Narrat i r e are in keeping with much other evidence, presented in
Character Narrative my recent book, that, by Jane Austen's later standards,
D i alogue
Totality Northanger Abbey deals in extremely sharp but simple
contrasts. 3

o3 Patterns in pure narrative: authorship and


chronology
o2
If the section of Table 1 that treats of pure narrative is
4 3 separated from the rest and the frequency-hierarchies for
•DIALOGUE A 1
Q
2
0 1 PURE Jane Austen's pure narratives are compared only with
4 Q j * NAR
o each other, their respective locations on an eigen-map
closely correspond to their received order of composi-
A1 tion. (While acknowledging some complicated questions
< o
of revision, almost all authorities agree that the order of
3 CM
3
A 'A A- composition ran from Northanger Abbey to Persuasion
THARACTFR NARRATIVE A
0. -20 0. 20. 40. in the sequence offered in Table 1.) If the word-list
[Link] B i.i*
MAJOR COMPONENTS OF JANE AUSTEN'S NOVELS
employed in Table 1 is replaced by the list of the thirty
(Non-deictic word-types 1-18 of her literary vocabulary)
most common word-types in Jane Austen's pure narra-
64 Literary and Linguistic Computing, Vol. 2, No. 2, 1987
3
3
Q.

(Q

Table 3: DISTRIBUTION (AS RAW FREQUENCIES) OF THE THIRTY HOST


COMMON WORD-TYPES IN JANE AUSTEN'S PURE NARRATIVE
O
O FREQUENCIES IN PURE NARRATIVE OF EACH N O V E L
T5 A B C D E F G H I K L N
ALL NA SS PP MP E P SI S2 F AA HE Wns
5 323824 40606 62306 57560 66727 53123 43502 10892 42164 53171 51997 55887 8199
the 15257 2150 2754 2671 3180 2359 2143 608 1760 2366 2533 3519 475
and 12292 1465 2100 2066 2732 2177 1752 479 1367 1404 889 2026 288
o_ 1968 2290 1577 431 1181 1361 1502 1380 292
of 11316 1535 2244 1702
her 8644 1224 1990 1645 1788 1155 842 187 1025 1066 974 1056 187
was 7224 796 1312 1341 1615 1276 884 202 712 976 813 1183 194
o to(inf) 6940 770 1363 1152 1484 1172 999 225 1028 1164 831 716 177
she 6243 824 1188 1255 1085 1037 854 122 997 1090 1153 1205 101
a 6043 815 1079 965 1269 1083 832 255 740 1249 1379 1180 193
CD in 5693 747 1151 957 1180 868 790 178 717 797 845 764 159
00
-si to(pr) 4380 461 904 8 87 932 6 56 540 123 527 720 624 629 113
had 4314 448 681 781 875 785 744 154 657 755 879 882 94
his 3456 375 655 743 757 524 402 117 499 1414 1003 539 72
with 3425 431 644 643 739 586 382 127 439 574 781 462 102
as 3407 384 702 577 773 523 448 115 324 435 693 439 93
it 3320 389 650 539 666 640 436 87 332 451 718 734 56
he 3277 242 572 649 721 657 436 112 453 1522 1252 966 104
not 2651 323 509 452 512 479 376 83 213 350 194 475 55
at 2470 290 507 467 506 396 304 88 336 409 563 378 71
for(pc) 2444 301 480 411 527 401 324 97 359 322 453 365 52
be 2392 285 464 357 480 434 372 83 212 225 89 242 30
that(cj ) 2298 264 444 526 341 404 319 66 310 612 246 371 34
but 2163 227 365 402 425 412 332 73 324 535 383 461 40
by(pr) 2140 313 525 382 38 5 267 268 68 228 325 192 205 57
could 1933 196 366 307 414 362 288 56 228 111 114 187 36
on(pr) 1887 248 4 29 364 362 257 227 67 298 282 381 231 55
whichf rp)1878 247 429 315 347 275 265 59 167 229 305 151 47
were 1834 210 321 370 345 307 281 78 171 121 126 279 57
all 1710 178 313 290 361 284 284 62 177 93 137 188 26
said 1702 158 343 357 343 354 147 48 202 809 357 538 60
f com 1614 196 354 283 348 228 205 52 194 241 189 224 35

134347 16492 25838 24122 2 778 2 22060 18053 4502 16177 22008 20598 21975 3355
41.49% 40 .6% 41.5% 41 . 9 % 41.6% 41 . 5 % 41.5% 41.3% 38 . 4 % 41 . 4 % 39 .6% 39 .3% 40 .9%

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
tive or by a version in which the major homographic matter of experiment and judgement: but segments of
forms are distinguished and then given their separate two thousand words are large enough to yield sound
places in the hierarchy, very similar results emerge. If the results and yet few enough (in narratives of the length
fragment of 'Sanditon' is added as a seventh column, it, dealt with here) to yield clear graphs.
too, assumes its proper station as the last member in a In Table 4, Part (a) shows the beginnings of a two-
chronological sequence. But if this brief account is to thousand word segmentation of the pure narrative of
take the place of several graphs, two anomalies in the Persuasion (whose 43,502 words fall into twenty-two
pattern should be noted. In some of these graphs, such columns). The 2143 instances of'the' and the 1752
Northcmger Abbey and Sense and Sensibility exchange instances of 'and' are distributed, like this, among the
places at the beginning of an otherwise perfect chrono- successive columns and, each in its turn, the other word-
logical sequence. And, if the unimpressive middle-period types are distributed in the same way.4 When a correla-
fragment called 'The Watsons' is included, it falls tion-matrix and an eigen-map are derived from a full
awkwardly on the edge of no-man's-land, not far from table of this kind, the results are not unilluminating. For
the later narratives. The former anomaly is too slight to more telling results, however, in which any general
be troubling. The latter is spectacularly resolved when tendencies are cleared of the effects of minor aberrations,
'The Watsons' is treated as only the beginning of a it is desirable to group the columns in 'rolling' segments

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
narrative and compared not with whole narratives but like those illustrated in Part (b) of Table 4. The principle
with their beginnings. When that is done it ceases to be is well known from its use in weather-records and
an anomaly, takes its proper place as a 'middle-period employment figures treating, say, of the periods from
beginning', and helps justify Aristotle's belief that begin- January to March, February to April, March to May,
nings are among the distinguishable constituents of a and so on. Its chief value is to override the arbitrariness
dramatic action. of the boundaries between months — or segments —
Table 3 supplies the reader who cares to undertake and so to allow any broad tendencies to reveal them-
them with the necessary information for a number of selves.
overall comparisons. The first column shows the total All the graphs that follow are derived from tables
frequency-hierarchy of the thirty most common word- constructed in precisely this fashion and are therefore
types (with the main homographic distinctions incorpor- strictly comparable with each other. In all except Graph
ated) when all Jane Austen's pure narrative is treated as 5, which answers to another purpose, the pure narrative
a single body. The next six columns take each of her six of Persuasion is compared with that of another novel.
pure narratives in turn. The last six columns (which do The effect is to show how much Jane Austen's last
not contribute to the grand total in Column 1) treat, narrative resembles (or differs from) those of other
successively, of: Jane Austen's 'Sanditon'; its modern novelists, those of her own earlier novels, and those of
continuation, referred to here as Sanditon 2; Georgette the immediate predecessors of Persuasion. (The tables
Heyer's Frederica; James's The Awkward Age; Forster's themselves are too voluminous to be printed.)
Howards End; and Jane Austen's "The Watsons'. (The Graphs 2, 3, and 4 compare the pure narrative of
missing columns, J and M, treat of Virginia Woolf s The Persuasion with those of The Awkward Age, Howards
Waves, which has no real equivalent for pure narrative, End, and Frederica respectively. The two clusters of
and of a modern novel not yet ready for a full analysis.) entries are most sharply separated in Graph 2, least so in
At the head of each column is the total word-count for Graph 4 — a result which supports the impression that
that particular pure narrative. At its foot is the sum of James differs more than either Georgette Heyer or
the entries in it and the percentage they make of the total Forster from Jane Austen and which gives that impres-
given at its head. sion a basis in the common stuff of the language. Graph
To go beyond these overall forms of comparison and 5, where The Awkward Age and Howards End are
look more closely into questions of narrative-shape, it is compared in the same fashion, adds weight to the
possible to break each pure narrative into a number of argument that the separation of the clusters in graphs
successive segments and compare them with each other like these is an expression of genuine authorial differ-
or with the corresponding segments of another narrative. ences in narrative style and not of some quality peculiar
Choosing the most appropriate size for the segments is a to Persuasion or Jane Austen. The fact that Vector B —

Table 4: EXTENDED VARIATION IN PURE NARRATIVE

(a) Segments of two thousand words


Persuasion 1- 2001- 4001- 6001- 8001- ,40001- >>>
2000 4000 6000 8000 10000 12000 >>>

the 2143 101 82 75 115 89 108


and 1752 85 67 82 92 91 83 ;;;

(b) 'Rolling' segments of six thousand words


Pe rsuasion 1-
6000
2001-
8000
4001-
10000
6001-
12000 m
the
and
258
234
272
241
279
265
312
266 >>>
66 Literary and Linguistic Computing, Vol. 2, No. 2, 1987
m Graph 2 Graph 4
n
c* •7

o g Perjuas ion
The Awkward A
A
g I'ersuns ion
Kredericii

I A • 1

A
n
-14.5

-I

A A
cP eg
o
in

T
A A QP&P •
D
rj
1 i
A
A

A
A $%
D
\ D DO
•3
D 0
"T ^
A

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
A
Vecto
16.0

±30. -20 -10. 0. 10. 20. -10. 10. iO.


Vtelor B B.7X FKPNR3XV2000
Vector B ?.SX FIPNR:XV2OOO

Graph 3 Graph 5
4.25

i . y
i
[1 Persuasion TIie Awkward Aga
A
t\ Howards End JQ S Howards End
a
-14.50

A
A ffa
/
1.75

•-•
i

/A A
D
D
OoD
-15.0'

D
Vector A 9Z.9X

A
I:
15.25

•130. -20. -10. 0. 10. 20. 30. -20 -10. 0. 20. JO.
Vector B 3.8Y FLPNR3XV2000 Vactor B [Link] KLPNR3XV2OOO

ANALYSIS OF PURE NARRATIVE ANALYSIS OF PURE NARRATIVE


(Word-types 1-30) (Word-types 1-30)

the horizontal axis of each graph — is the more influen- Sensibility respectively. In each case, the two clusters
tial in separating the clusters testifies to an underlying converge more closely than in Graphs 2-5 and one or
similarity in the columns of the original tables: in no two entries lie in the borderland of the other cluster. The
likely form of English will 'the', 'and', or 'of slide far convergence is closer yet and the interpenetration is
enough down the frequency-order to be displaced by deeper in Graph 8, where Pride and Prejudice is called
words like 'from', 'all', and 'which'. When the broad into account. But only in Graphs 9 and 10, where
hierarchical resemblances that give Vector A its Persuasion is compared with Mansfield Park and Emma
predominance have been taken into the reckoning, the respectively, is the interpenetration so complete that it is
residual differences, most of which are absorbed by unreasonable to speak of discrete clusters. Statistical
Vector B, are enough to separate the clusters of entries as comparisons are derived from numerical patterns and
completely as these graphs show. By the same token, the lead not to assertions of fact but to inferential judge-
predominance of Vector A is lessened when compara- ments. Yet, on examining this whole series of graphs, I
tively dissimilar texts are analyzed: compare Graph 2 see no other inference to rival the proposition that the
with Graphs 9 and 10. method of analysis I have described and illustrated is
Graphs 6 and 7 compare the pure narrative of Persua- able to distinguish, in appropriate degrees, between the
sion with those of Northanger Abbey and Sense and narrative styles of different novelists and, likewise, be-
Literary and Linguistic Computing, Vol. 2, No. 2, 1987 67
Graph 6 Graph 8

15.
D D
' I'cr1IUS ion D
g Nor l l i i u i j c r Abbe/
g- I'ersuas ion
l ' r i d o uud P r e j u d i c e a
o D \ A
to D
1
a A D D
A A
A a A a
r*i
A D D
to
A A
D
1
A^A u 1 a D
D D %°
A A^ A A*
A D ' °n A
A
(O J
X AA [ g
u
D A • a •

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
* a
Vector .
16.6

-40 -20. 0. 20. 40.


l
30. -20. -10. 0. 10. 20 JO.
Veclor B 1.0% FAPNR3XV2OOO
Veclor D 2. IX. FCPNR3XV2UOO

Grapl i ?
o
to
Graph 7
CO
-13.

1
A
CO
A 8 Persuasion
Sense u n d Sells ibi 1 i t y A
A g Persuas i on
UansI ie Id Purk

t-3 a in
c-
JD.
D
DD 1 D
D . A A D
D . A A
• o AA A'-' rD A
1 A A A n A A
A D
r\j
1
D
A D
A A/A A A
1
A
A in
CM

M •*
Q
9 dP D " . -
A
r i -#•
& D •

< ' 2^ ^ A /A fptl D


u
1-
> -*"
D
> „
v J-30. -20. -10. 0. 10. 20. JO.
-30. -20. -10. 0. 10. 20. JO. Vector B \.?X FDPNR3XV2OOO
Veclor 8 2.7X FBPNR3XV200O
ANALYSIS OF PURE NARRATIVE ANALYSIS OF PURE NARRATIVE
(Word-iypes 1-30) (Word-types 1-30)

tween those of the same novelist at different stages of her was ready to be published. (Although Graph 11 does not
literary career. Only when the three Chawton novels and show it, it is also worth noting that, if 'Sanditon' is
the fragment of 'Sanditon', all written in the last four regarded as no more than the beginning of a narrative,
years or so of Jane Austen's life, are compared with each the three entries are surrounded by those for the earlier
other does the chronological dimension cease to afford a part of Persuasion.)
clear overall separation between the two clusters in each The leading interest of Graph 11, however, is as a
graph. display of the effects of literary imitation. The numbered
Graph 11 carries the 'authorial' side of our argument a entries for Sanditon 2, the Other Lady's 1975 completion
step beyond anything in Graphs 2-5. In so far as it of the 1817 fragment, overlap the lettered entries for
compares the pure narrative of Persuasion with that of Persuasion and 'Sanditon' to a slightly greater extent
Jane Austen's 'Sanditon', it shows that the latter so than the entries for Persuasion and Northanger Abbey
closely resembles the former as to make a small cluster overlap each other in Graph 6. The sensitive reader of
surrounded by the entries for its larger neighbour. Jane these novels will recognize, I believe, that, though Sandi-
Austen, as we know, undertook 'Sanditon' in the last ton 2 offers a passable imitation of Jane Austen's style, it
months of her life, after bringing Persuasion to the state is a trifle too ponderous and altogether too immobile to
in which we have it but before satisfying herself that it be mistaken for the original. That verdict does not
68 Literary and Linguistic Computing, Vol. 2, No. 2, 1987
Graph 10 adds greatly to its value. Only the entries for the first five
segments of Sanditon 2 penetrate the cluster for Persua-
sion. While the next two entries lie fairly close at hand,
Q Persuasion
2 Enma they mark the beginning of a fluctuating but continuing
CD divergence that leads eventually to the remote location
A of the last two entries. Is it reasonable to conclude that,
A
D as she goes on with her task, the Other Lady gradually
A D loses sight of her model and writes more and more like
D herself? That interpretation finds support in other quar-
A ters. Essentially similar lines of divergence can be seen
A when Sanditon 2 is compared with other novels of Jane
A A4
A Q A Austen. And, more strikingly, our early results for the
D
0 A
Afc M narrative of another novel, written under her own name
A by the putative author of Sanditon 2, indicate that it will
D form a tightly clustered set of entries in the 'north-
western corner' towards which the sequence of entries
fi D

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
8 ru for Sanditon 2 is leading. When our work on this other
D novel is complete and it takes its place as Column M in a
| =
0. -20. 0. 20. 40. new version of Table 3, the comparisons to which it gives
Vtclor B 1JSf FEPNR3XV2OOO
rise will make the subject of a further paper. For the
moment, I must ask the reader to accept that the
indications are unambiguous. The Other Lady, in short,
is a skilful enough imitator to depart considerably from
ID Graph 11
her own style in the direction of her model but not so
1 skilful as to sustain that altered style throughout a long
ft Periuailon
G Sand 1tonl narrative.
n 1-1? SandltonS
1 • if

• ti
Patterns in pure narrative: the shapes
CM
.p of stories
f
I
••ta The method of analysis employed in this paper can,
.14 P.
•1 'C finally, be brought to bear not only on questions of
•c
•1* authorship and chronological change but on questions
T • 1] •< .P nearer to the leading interests of literary critics and
.17 •P
.12 .P , .p narrative theorists. These further possibilities have al-
lJ
•P .4
to
•FP ready been glanced at in the comment that the opening
1 •I sections of "The Watsons' and 'Sanditon' — the only
.p
IS
existing sections of either work — fall most naturally
in into place when they are compared with the openings
-20. -10. 0. 10. 20. 30.
of the other narratives. (That effect is less pronounced
V.€l.r PGHPNRIXV2000 in 'Sanditon' than in "The Watsons' presumably be-
ANALYSIS OF PURE NARRATIVE cause the pervasive new strain of social irony in the
(Word-types 1-30) later, subtler work is a more powerful stylistic determi-
nant than its Aristotelian status as distinguishably a
discredit the graphs or the method of analysis on which beginning.)
they rest. For such a reader would also be likely to When the narratives of Emma and Persuasion are
acknowledge that the fluidity and inwardness that char- brought together in Graph 10, the two sets of entries
acterizes the 'pure' narrative of Perusuasion does not interpenetrate each other to an exceptional extent. When
closely resemble the late Augustan balancing and each set is transformed into a numbered sequence, it
'pointing' that gives such vivacity to its counterpart in becomes clear that the interpenetration is no arbitrary
Northanger Abbey. My graphs do not deal in the exactly affair but reflects subtler stylistic determinants than any
(or inexactly) chosen epithets of traditional literary touched on so far. Those entries for Persuasion that lie in
criticism. But they rightly show that there are much the territory where Emma predominates come from the
closer resemblances in narrative style between Sanditon 2 socially oriented part of the novel beginning with Anne's
and Persuasion than between Persuasion and our speci- arrival in Bath. Those entries for Emma that lie in the
mens of Forster, James, and Georgette Heyer. They also territory where Persuasion predominates come from the
show, no less rightly, far closer resemblances yet between middle of the novel where Emma is imaginatively
Persuasion and Jane Austen's other Chawton novels. It (though not socially) isolated. Those lying in the north-
should also be noted that even the handful of entries for western corner come from the last phase of Emma, the
Sanditon 2 which penetrate the cluster for Persuasion lie hundred pages that remain after Mr Knightley rebukes
well away from the entries for 'Sanditon', their proper Emma at Box Hill, a phase in which hero and heroine
target. resolve their differences in a fuller and more closely
The sequence of the numbered entries in Graph 11 reasoned debat than that which precedes the ending of
Literary and Linguistic Computing, Vol. 2, No. 2,1987 69
Persuasion. And those lying in the north-eastern corner to remark that the distinct, dramatically appropriate
begin with the walk to Winthrop, continue with the visit subsets discernible in the narratives of Persuasion,
to Lyme Regis, and end just as Anne is about to leave for Emma, and also Mansfield Park are scarcely to be found
Bath. The three 'north-eastern' entries, in other words, in any of the other novels except The Awkward Age. The
are for those passages of Persuasion in which a delicate demonstrable presence, in so select a group of narratives,
and reflective style of natural description has led many of of clear evidence of stylistic modulation gives weight to
Jane Austen's critics to discern a mood of romanticism Bakhtin's concept of 'orchestration' in fiction. To the
unlike anything in her earlier novels. Their discrimina- extent that such 'modulation' or 'orchestration' can be
tion is borne out by the fact that these three entries lie at regarded as a mark of literary merit, statistical analysis
the north-eastern extremity of every graph where two of may even furnish evidence on questions of value. That is
Jane Austen's narratives are compared. It is furthered by not to propose a substitute for critical judgement but
the contrasting fact that these same entries lie closer to only to suggest a new way of informing it. In a period of
the narratives of James and Forster than do any others literary and social history when the very idea that
In all Jane Austen's novels. judgement is among the literary critic's fundamental
Such observations as these are not modified but set in tasks has survived some facile attacks but remains open
high relief by Graph 12, where the narrative of Persua- to some strenuous questioning, the prospect of a better

Downloaded from [Link] at University of California, San Francisco on December 10, 2014
sion is examined in isolation from the rest. Not modified, informed exercise of judgement can be no bad thing.
I say, because the pattern of the entries for Persuasion
holds good even when the influence of Emma is removed.
They are set in high relief by being given the freedom to Acknowledgements
spread out across the map. The fact that they spread so The research-project on which the paper is based could
far while sustaining the same grouping of subsets is clear not have been conducted without the generous funding
evidence that we are dealing with a genuine set of of the Australian Research Grants Scheme and the
resemblances and contrasts and using a method of University of Newcastle, N.S.W., or without access to
analysis delicate enough to detect them. the Oxford Concordance Package (OCP) and MINITAB
The appropriateness of this method of analysis for the (University of Pennsylvania). My more personal debts are
purposes of literary criticism can be tested in several acknowledged in a recent book (to which reference is
ways. Within the limits of the present paper, it rests on made).
the extent to which there is a worthwhile correspondence
between the evidence offered and the inferences drawn
from it and on the extent to which those inferences seem References
persuasive. Those criteria will remain applicable to the
1. See tte Dialogic Imagination: Four Essays by M. M.
more keenly edged evidence that will emerge from an
Bakhtin, ed. Michael Holquist (Austin, Texas, 1981), pas-
analysis that treats not of numerically equal segments sim.
but of actual episodes, the dramatic components of a 2. ibid., pp. 42-3; and cf. pp. 259-63.
narrative. Ultimately, however, the value of a statisti- 3. Eigen analysis is better fitted for illustrating relativities
cally-based approach to literary criticism will be tested than for stating absolutes. Since Vectors A and B incorpor-
by its application to a much wider range of novels. ate more than 98% of the information in Table 2, the
So far as the evidence extends at present, I am content pattern of resemblances and differences in Graph 1 is
obviously well-founded. Yet (as in the familiar example of
Mercator's projection of the globe) any two-dimensional
picture of a multi-dimensional phenomenon is inevitably a
Graph 12 distortion. Even in cases like Graph 1 where its effect is
very slight, this distortion precludes any simple reckoning
of the 'real' distance between two points on the map. At
their present stage of development, moreover, my maps are
not precisely scaled. The relative importance of Vectors A
and B is indicated by the rendering of one unit on the
vertical axis as equivalent to twenty or more units on the
horizontal. That sort of scaling is accurate enough to give
an adequate impression of the relationships among the
clusters but not accurate enough for measuring 'real'
distances or assessing the probability that a given outlying
entry belongs to a given cluster. To go further in those
directions, it will be necessary, I believe, to turn from eigen
analysis to multi-dimensional scaling.
4. The 'Word Index' program that numbers each word-token
according to its sequential location in a text and sorts the
output as an alphabetical array of word-types was de-
signed for me by Sandra Britz. The 'Serial Tables' program
that distributes the indexed entries among segments of any
chosen size and tabulates the results was designed for me
•"•50 -25 0 25 50 by David Hoole. He also designed our current graphics
Vector B I.?X FPNR3XV20OO program.
ANALYSIS OF PURE NARRATIVE
(Word-types 1-30)
70 Literary and Linguistic Computing, Vol. 2, No. 2, 1987

You might also like