Naylor (1966)
Naylor (1966)
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Management
Science
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
MANAGEMENT SCIENCE
Vol. 14, No. 2, October, 1967
Printed in U.S.A.
Duke University
B-92
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
VERIFICATION OF COMPUTER SIMULATION MODELS B-93
Rationalism
These are not postulates the existence of whose counterparts in reality admits of ex-
tensive dispute once their nature is fully realized. We do not need controlled experi-
ments to establish their validity: they are so much the stuff of our everyday experience
that they have only to be stated to be recognized as obvious. Indeed, the danger is that
they may be thought to be so obvious that nothing significant can be derived from their
further examination. Yet, in fact, it is on postulates of this sort that the complicated
theorems of advanced analysis ultimately depend. [24, pp. 80]
Thus the problem of verification has been reduced to the problem of searching
for a set of basic assumptions underlying the behavior of the system of interest.
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
B-94 THOMAS H. NAYLOR AND J. M. FINGER
Unfortunately, any attempt to spell out literally and in detail all of the basic
assumptions underlying a particular system soon reveals limitations to their ob-
viousness [16, p. 136]. Reichenbach goes so far as to deny the very existence of a
synthetic a priori.
Scientific philosophy . . . refuses to accept any knowledge of the physical world as ab-
solutely certain. Neither the individual occurrences, nor the laws controlling them,
can be stated with certainty. The principles of logic and mathematics represent the
only domain in which certainty is attainable; but these principles are analytic and
empty. Certainty is inseparable from emptiness; there is no synthetic a priori. [23, p. 304]
Empiricism
Positive Economics
Milton Friedman argues that critics of economic theory have missed the point
by their preoccupation with the validity of the assumptions of models. Accord-
ing to Friedman the validity of a model depends not on the validity of
the assumptions on which the model rests (as Hutchison would have one believe),
but rather on the ability of the model to predict the behavior of the dependent
variables which are treated by the model.
The difficulty in the social sciences of getting new evidence for this class of phenomena
and of judging its conformity with the implications of the hypothesis makes it tempting
to suppose that other, more readily available, evidence is equally relevant to the va-
lidity of the hypothesis-to suppose that hypotheses have not only "implications" but
also "assumptions" and that the conformity of these "assumptions" to "reality" is a
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
VERIFICATION OF COMPUTER SIMULATION MODELS B-95
test of the validity of the hypothesis different from or additional to the test by impli-
cations. This widely held view is fundamentally wrong and productive of much mis-
chief. Far from providing an easier means for sifting valid from invalid hypotheses, it
only confuses the issue, promotes misunderstanding about the significance of empirical
evidence for economic theory, produces a misdirection of much intellectual effort de-
voted to the development of consensus on tentative hypotheses in positive economics.
[13, p. 14]
Multi-Stage Verification
Like the scientist, the scientific philosopher can do nothing but look for his best posits.
But that is what he can do; and he is willing to do it with the perseverance, the self-
criticism, and the readiness for new attempts which are indispensable for scientific
work. If error is corrected whenever it is recognized as such, the path of error is the
path of truth. [23, p. 326]
We would not object to the argument that this set of postulates is formed from
the researcher's already acquired "general knowledge" of the system to be sim-
ulated or from his knowledge of other "similar" systems which have already been
successfully simulated. The point we are striving to make is that the researcher
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
B-96 THOMAS H. NAYLOR AND J. M. FINGER
cannot subject all possible postulates to formal empirical testing and must there-
fore select, on essentially a priori grounds, a limited number of postulates
for further detailed study. He is, of course, at the same time rejecting an infinity
of postulates on the same grounds. The selection of postulates is taken here to
include the specification of components and the selection of variables as well as
the formulation of functional relationships. But having arrived at a set of basic
postulates on which to build our simulation model, we are not willing to assume
that these postulates are of such a nature as to require no further validation.
Instead we merely submit these postulates as tentative hypotheses about the be-
havior of a system.
The second stage of our multi-stage verification procedure calls for an attempt
on the part of the analyst to "verify" the postulates on which the model is based
subject to the limitations of existing statistical tests. Although we cannot solve
the philosophical problem of "what does it mean to verify a postulate?", we can
apply the "best" available statistical tests to these postulates.
But in management science we often find that many of our postulates
are either impossible to falsify by empirical evidence or extremely difficult to sub-
ject to empirical testing. In these cases we have two choices. We may either aban-
don the postulates entirely, arguing that they are scientifically meaningless since
they cannot be conceivably falsified, or we may retain the postulates rtierely as
"tentative" postulates. If we choose the first alternative we must continue search-
ing for other postulates which can be subjected to empirical testing. However,
we may elect to retain these "tentative" postulates which cannot be fal-
sified empirically on the basis that there is no reason to assume that they are in-
valid just because they cannot be tested.
The third stage of this verification procedure consists of testing the model's
ability to predict the behavior of the system under study. C. West Churchman
states flatly that the purpose of simulation is to predict, and considers the point
so obvious that he offers no defense of it before he incorporates it into his dis-
cussion of the concept of simulation [4]. This point does indeed seem obvious.
Unless the construction of simulation models is viewed as a game with no pur-
pose other than the formulation of a model, it is hard to escape the conclusion
that the purpose of a simulation experiment is to predict some aspect of reality.
In order to test the degree to which data generated by computer simu-
lation models conform to observed data, two alternatives are available-historical
verification and verification by forecasting. The essence of these procedures is
prediction, for historical verification is concerned with retrospective predictions
while forecasting is concerned with prospective predictions.
If one uses a simulation model for descriptive analysis, he is interested in the
behavior of the system being simulated and so would attempt to produce a model
which would predict that behavior. The use of simulation models for prescriptive
purposes involves predicting the behavior of the system being studied under dif-
ferent combinations of policy conditions. The experimenter would then decide
on the most desirable set of policy conditions to put into effect by picking the
set which produces the most desirable set of outcomes. When a simulation model
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
VERIFICATION OF COMPUTER SIMULATION MODELS B-97
is used for descriptive analysis the actual historical record produced by the system
being simulated can be used as a check on the accuracy of the predictions, and
hence on the extent to which the model fulfilled its purpose. But pre-
scriptive analysis involves choosing one historical path along which the system
will be directed. Hence only the historical record of the path actually traveled
will be generated and the historical records of alternative paths corresponding
to alternative policies will not be available for comparison. Though, in this case,
the historical record cannot be used as a direct check on whether or not the model
did actually point out the best policy to follow, the actual outcome of the policy
chosen can be compared with the outcome predicted by the simulation model as
an indirect test of the model. In either case, the predictions of the model
are directly related to the purpose for which the model was formulated, while the
assumptions which make up the model are only indirectly related to its purpose
through their influence on the predictions. Hence the final decision concerning
the validity of the model must be based on its predictions.
Goodness of Fit
Thus far, we have concerned ourselves only with the philosophical aspects of
the problem of verifying computer simulation models. What are some of the prac-
tical considerations which the management scientist faces in verifying computer
models? Some criteria must be devised to indicate when the time paths generated
by a computer simulation model agree sufficiently with the observed or historical
time paths so that agreement cannot be attributed merely to chance. Specific
measures and techniques must be considered for testing the "goodness of fit" of a
simulation model, i.e., the degree of conformity of simulated time series to ob-
served data. Richard M. Cyert has suggested that the following measures might
be appropriate [10]:
1. number of turning points,
2. timing of turning points,
3. direction of turning points,
4. amplitude of the fluctuations for corresponding time segments,
5. average amplitude over the whole series,
6. simultaneity of turning points for different variables,
7. average values of variables,
8. exact matching of values of variables.
To this list of measures we would add the probability distribution and variation
about the mean (variance, skewness, kurtosis) of variables.
Although a number of statistical techniques exist for testing the "goodness of
fit" of simulation models, for some unknown reason management scientists and
economists have, more often than not, restricted themselves to purely graphical
(as opposed to statistical) techniques of "goodness of fit" for validating computer
models [5], [19]. The following statement by Cyert and March concerning the
validity of their duopoly model is indicative of the lack of emphasis placed on
"goodness of fit" by many practitioners in this field.
In general, we feel that the fit of the behavioral model to data is surprisingly good, al-
though we do not regard this fit as validating the approach. [11, p. 97]
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
B-98 THOMAS H. NAYLOR AND J. M. FINGER
This statement was made on the basis of a graphical comparison of the simulated
time series and actual data. Not unlike most other simulation studies described
in the literature, Cyert and March did not pursue the question of verification
beyond the point described in the aforementioned statement.
Within the confines of this paper it is impossible to enumerate all of the statis-
tical techniques which are available for testing the "goodness of fit" of simulation
models. However, we shall list some of the more important ones and suggest a
number of references which describe these tests in detail.
1. Analysis of Variance. The analysis of variance is a collection of techniques
for data analysis which can be used to test the hypothesis that the mean (or var-
iance) of a series generated by a computer simulation experiment is equal to the
mean (or variance) of the corresponding observed series. Three important
assumptions underlie the use of this technique-normality, statistical indepen-
dence, and a common variance. The paper by Naylor, Wertz, and Wonnacott
[20] describes the use of the F-test, multiple comparisons, and multiple ranking
procedures to analyze data generated by simulation experiments.
2. Chi-Square Test. The Chi-square test is a classical statistical test which can
be used for testing the hypothesis that the set of data generated by a simulation
model has the same frequency distribution as a set of observed historical data.
Although this test is relatively easy to apply, it has the problem of all teats using
categorical type data, namely, the problem of selecting categories in a suitable
and unbiased fashion. It has the further disadvantage that it is relatively sensi-
tive to non-normality.
3. Factor Analysis. Cohen and Cyert have suggested the performance of a fac-
tor analysis on the set of time paths generated by a computer model, a second
factor analysis on the set of observed time paths, and a test of whether the two
groups of factor loadings are significantly different from each other [6].
4. Kolmogorov-Smirnov Test. The Kolnogorov-Smirnov test is a distribution-
free (nonparametric) test concerned with the degree of agreement between the
distribution of a set of sample values (simulated series) and some specified theo-
retical distribution (distribution of actual data). The test involves specifying the
cumulative frequency distribution of the simulated and actual data. It treats in-
dividual observations separately and unlike the Chi-square test does not
lose information through the combining of categories [25].
5. Nonparametric Tests. The books by Siegel [25] and Walsh [29] describe a host
of other nonparametric tests which can be used for testing the "goodness of fit"
of simulated data to real world data.
6. Regression Analysis. Cohen and Cyert have also suggested the possibility
of regressing actual series on the generated series and testing whether the resultin
regression equations have intercepts which are not significantly different from
zero and slopes which are not significantly different from unity [6].
7. Spectral Analysis. Data generated by computer simulation experiments are
usually highly autocorrelated. When autocorrelation is present in sample data,
the use of classical statistical estimating techniques (which assume the absence
of autocorrelation) will lead to underestimates of sampling variances (which are
unduly large) and inefficient predictions. Spectral analysis considers data
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
VERIFICATION OF COMPUTER SIMULATION MODELS B-99
Summary
While we have argued that the success or failure of a simulation experiment
must be measured by how well the model developed predicts the particular phe-
nomena in question, we have not argued that care exercised in selecting assump-
tions and statistical testing of these assumptions are purposeless or wasteful
activities. Our defense of the first two stages of the three-stage process of veri-
fication we have proposed rests solidly on the law of scarcity. Any hypotheses
which can be rejected on a priori grounds should be so rejected because testing
by this procedure is cheaper than formal statistical testing. Only if the experi-
menter had an unlimited budget could he afford to subject all possible hypotheses
to statistical testing. Likewise, testing assumptions is cheaper than deriving and
testing predictions, so any increase of validity we can obtain at an early stage
is cheaper than additional validity gained at a later stage.
Having described multi-stage verification it is appropriate that we point out
that this approach to verification is by no means limited to simulation models.
For example, suppose that we were interested in verifying a simple econometric
model of consumer demand for a particular commodity, the model consisting of
one or two equations. First, we might take a look at the rationale or the a priori
assumptions underlying the model. These assumptions might take the form of
postulates about the shape of individual marginal utility functions, the sign and
magnitude of income and substitution effects, the shape of indifference curves,
etc. Are these assumptions in accordance with the body of knowledge known as
economic theory? Second, if we are satisfied with the model on purely a priori
grounds we may then attempt to verify one or more of the assumptions under-
lying our model empirically, if data are available. Third, we might then subject
the model to further testing by comparing theoretical values of consumer demand
(as indicated by the model) with actual or historical values.
However, if our demand model were relatively simple, then we might be willing
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
B-100 THOMAS H. NAYLOR AND J. M. FINGER
to bypass the first two steps of the multi-stage verification procedure and con-
centrate on the accuracy of the model's predictions. Whether we would be willing
to skip steps one and two in verifying a particular model will, in part, depend on
the cost of obtaining predictions with our model. If the model is characterized
by (1) a small number of variables, (2) a small number of linear equations, (3)
no stochastic variables, and (4) predictions for only one or two time periods, then
one may be willing to concentrate on the third step of the procedure with a mini-
mum of risk. But if one is dealing with a complex model consisting of a large num-
ber of nonlinear difference or differential equations and a large number of vari-
ables (some variables being stochastic), and the model is to be used to generate
time paths of endogenous variables over extended periods of time, then the cost
of omitting steps one and two of our procedure might be quite high. That is, it
may be prudent to use steps one and two of the multi-stage procedure to detect
errors in the model which otherwise might not become obvious until expensive
computer runs have been made.
Thus, while most of our argument is relevant to the general problem of verify-
ing hypotheses or theories, the nature of computer simulation experiments makes
the three stage procedure particularly relevant to computer simulation models.
This form of analysis is particularly useful when (1) it is extremely costly~or im-
possible to observe the real world processes which one is attempting to study, or
(2) the observed system is so complex that it cannot be described by a set of equa-
tions for which it is possible to obtain analytical solutions which could be used
for predictive purposes. Thus computer simulation is a more appropriate tool of
analysis than techniques such as mathematical programming or marginal analysis
when data against predictions can be tested are not available and/or when pre-
dictions can be obtained only at great expense (in human time and/or computer
time). In other words, computer simulation is most likely to be utilized when the
savings derived from improving the model at earlier stages are most pronounced.
References
1. BLAUG, M., Economic Theory in Retrospect, Richard D. Irwin, Homewood, Ill., 1962.
2. BURDICK, DONALD S., AND NAYLOR, THOMAS H., "Design of Computer Simulation Ex-
periments for Industrial Systems," Communications of the ACM, IX (May, 1966),
pp. 329-339.
3. CARNAP, R., "Testability and Meaning," Philosophy of Science, III (1963).
4. CHURCHMAN, C. WEST, "An Analysis of the Concept of Simulation," Symposium on
Simulation Models, Austin C. Hoggatt and Frederick E. Balderston (editors), South-
Western Publishing Co., Cincinnati, 1963.
5. COHEN, K. J., Computer Model of the Shoe, Leather, Hide Sequence, Prentice-Hall, Inc.,
Englewood Cliffs, N. J., 1960.
6. COHEN, KALMAN J., AND CYERT, RICHARD M., "Computer Models in Dynamic Eco-
nomics," The Quarterly Journal of Economics, LXXV (February, 1961), pp. 112-127.
7. CONWAY, R. W., "Some Tactical Problems in Digital Simulation," Management Science,
Vol. 10, No. 1 (Oct., 1963), pp. 47-61.
8. , An Experimental Investigation of Priority Assignment in a Job Shop, The RA
Corporation, RM-3789-PR (February, 1964).
9. --, JOHNSON, B. M., AND MAXWELL, W. L., "Some Problems of Digital Machine
Simulation," Management Science, Vol. 6, No. 1 (October, 1959), pp. 92-110.
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
VERIFICATION OF COMPUTER SIMULATION MODELS B-101
10. CYERT, RICHARD M., "A Description and Evaluation of Some Firm Simulations," Pro-
ceedings of the IBM Scientific Computing Symposium on Simulation Models and Gaming,
IBM, White Plains, N.Y., 1966.
11. -, AND MARCH, JAMES G., A Behavioral Theory of the Firm, Prentice-Hall, Inc.,
Englewood Cliffs, N.J., 1963.
12. FISHMAN, GEORGE S., AND KIVIAT, PHILIP J., "The Analysis of Simulation-Generated
Time Series,"' Management Science, Vol. 13, No. 7 (March, 1967), pp. 525-557.
13. FRIEDMAN, MILTON, Essays in Positive Economics, Univ. of Chicago Press, 1953.
14. HUTCHISON, T. W., The Significance and Basic Postulates of Economic Theory, Mac-
millan & Co., London, 1938.
15. KING, E. P., AND SMITH, R. N., "Simulation of an Industrial Environment," Proceedings
of the IBM Scientific Computing Symposium on Simulation Models and Gaming, IBM,
White Plains, N.Y., 1966.
16. KOOPMANS, TJALLING C., Three Essays on the State of Economic Science, McGraw-Hill
Book Co., New York, 1957.
17. MCMILLAN, CLAUDE, AND GONZALEZ, RICHARD F., Systems Analysis, Richard D. Irwin,
Inc., Homewood, Ill., 1965.
18. NAYLOR, THOMAS H., BALINTFY, JOSEPH L., BURDICK, DONALD S., CHU, KONG, Com-
puter Simulation Techniques, John Wiley & Sons, New York, 1966.
19. *-, WALLACE, WILLIAM H., AND SASSER, W. EARL, "A Computer Simulation Model
of the Textile Industry," Working Paper No. 8, Econometric System Simulation
Program, Duke University, October 18, 1966.
20. --, WERTZ, KENNETH, AND WONNACOTT, THOMAS, "Some Methods for Analyzing Data
Generated by Computer Simulation Experiments," Communications of the ACM
(1967).
21. --, WERTZ, KENNETH, AND WONNACOTT, THOMAS, "Spectral Analysis of Data Gen-
erated by Simulation Experiments with Econometric Models," Working Paper No. 4,
Econometric System Simulation Program, Duke University, September 1, 1966.
22. POPPER, KARL R., The Logic of Scientific Discovery, Basic Books, New York, 1959.
23. REICHENBACH, HANS, The Rise of Scientific Philosophy, University of California Press,
Berkeley, 1951.
24. ROBBINS, LIONEL, An Essay on the Nature and Significance of Economic Science, Mac-
millan, London, 1935.
25. SIEGEL, SIDNEY, Nonparametric Statistics, McGraw-Hill, New York, 1956.
26. SPROWLS, CLAY, "Simulation and Management Control," Management Controls: New
Directions in Basic Research, C. P. Bonini, et al. (editors). McGraw-Hill, New York,
1964.
27. TEICHROEW, DANIEL, AND LUBIN, JOHN F., "Computer Simulation: Discussion of Tech-
niques and Comparison of Languages," Communications of the ACM, IX (October,
1966), pp. 723-741.
28. THEIL, H., Economic Forecasts and Policy, North-Holland Publishing Co., Amsterdam,
1961.
29. WALSH, JOHN E., Handbook of Nonparametric Statistics, I & II, D. Van Nostrand Co.,
Princeton, N.J., 1962, 1965.
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
MANAGEMENT SCIENCE
Vol. 14, No. 2, October, 1967
Printed in U.S.A.
CRITIQUE OF:
"VERIFICATION OF COMPUTER SIMULATION MODELS"
JAMES L. McKENNEY
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
CRITIQUE OF: "VERIFICATION OF COMPUTER SIMULATION MODELS" B-103
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
MANAGEMENT SCIENCE
Vol. 14, No. 2, October, 1967
Printed in U.S.A.
CRITIQUE OF:
"VERIFICATION OF COMPUTER SIMULATION MODELS"l
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
CRITIQUE OF: "VERIFICATION OF COMPUTER SIMULATION MODELS"t B-105
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms
B-106 WILLIAM E. SCHRANK AND CHARLES C. HOLT
C = E =1 Wj Uj2,
where wj is a weight indicating the importance in the intended application that
we attach to errors in forecasting the jth variable. This function could then pro-
vide a basis of comparison between several models.
This is the simplest type of example. Introducing policy decisions and the
sensitivity of conditional forecasts to model errors presents formidable com-
plications. It would seem, however, that an analysis oriented toward the in-
tended use of the model could provide a framework for a validation theory.
The authors propose to meld all the classical approaches into a methodology
that proceeds critically at every stage making use of logical, empirical and pre-
dictive tests of the model. This is certainly sound advice although a bit vague.
In application, it seems their three steps involve first, the establishment of the
model by bringing to bear prior theory; second, parameter estimation and the
application of statistical tests of significance to the estimates; and third, the eval-
uation of the model performance through the application of goodness of fit tests.
The sudden break in the paper to consider "practical" goodness of fit tests
with little reference to the methodological discussion unfortunately raises more
questions than it answers. Their analysis of the methodological aspects of the
problem does not provide a framework within which to apply the tests, nor does
it provide a criterion for choosing between them.
By focussing on the verification problem within a broad philosophical frame-
work the authors have faced a crucial but long neglected problem. It is hoped
that the operational characteristics of their system will be developed further.
This content downloaded from 92.242.59.41 on Tue, 12 Feb 2019 17:24:25 UTC
All use subject to https://about.jstor.org/terms