0% ont trouvé ce document utile (0 vote)
471 vues9 pages

Validation des Tests en Biologie Médicale

Ce document décrit les procédures de validation des méthodes d'analyse de laboratoire. Il explique l'importance de valider les nouvelles méthodes pour s'assurer qu'elles répondent aux besoins cliniques avec un degré de fiabilité souhaité.

Transféré par

Laode Iski Ismail
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF, TXT ou lisez en ligne sur Scribd
0% ont trouvé ce document utile (0 vote)
471 vues9 pages

Validation des Tests en Biologie Médicale

Ce document décrit les procédures de validation des méthodes d'analyse de laboratoire. Il explique l'importance de valider les nouvelles méthodes pour s'assurer qu'elles répondent aux besoins cliniques avec un degré de fiabilité souhaité.

Transféré par

Laode Iski Ismail
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF, TXT ou lisez en ligne sur Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/266339580

Laboratory test method validation

Article  in  Revue de Médecine Vétérinaire · July 2000

CITATIONS READS

36 10,319

1 author:

John H Lumsden
University of Guelph
129 PUBLICATIONS   3,926 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

clinical paper View project

All content following this page was uploaded by John H Lumsden on 04 February 2015.

The user has requested enhancement of the downloaded file.


PLENARY LECTURE

Laboratory test method validation

J.H. LUMSDEN

Department of Pathobiology, University of Guelph, Guelph, ON N1G 2W1, Canada

SUMMARY RÉSUMÉ

Intelligent use and interpretation of any test procedure requires having Validation des analyses de biologie médicale. Par J.H. LUMSDEN.
knowledge of the test reliability in specific clinical situations. For laborato-
ry tests relating to specific diseases, clinical interpretations are based opti- L'utilisation rationnelle et l'interprétation de tout test de laboratoire
mally upon positive and negative predictive values, or odds ratios, prede- nécessite une connaissance préalable de la fiabilité du test dans des situa-
termined at useful medical decision limits. Due to the many species, mana- tions cliniques. Pour des tests de laboratoire de diagnostic de maladies spé-
gement and disease differences encountered in veterinary medicine, the cifiques, les interprétations cliniques sont fondées de manière optimale sur
interpretation of routine laboratory test values is usually made in relation to des valeurs prédictives positive et négative, des rapports de cotes (odds
reference intervals determined for a defined species subset, or to other deci- ratios), prédéterminés pour des limites de décision utiles en médecine. En
sion limits dependent upon the experience of the clinician. raison de la variété des espèces, des conditions d'élevage et des maladies
Introduction of any new procedure, instrument or reagent is based upon rencontrées en médecine vétérinaire, l'interprétation des tests de laboratoire
several features including anticipated clinical value and efficiency in a dia- de routine est généralement faite en relation avec les intervalles de référen-
gnostic laboratory environment. The procedures for validation of a new test ce déterminés dans une espèce donnée, ou bien en fonction d'autres limites
in the laboratory are well described. The reasons for each validation proce- de décision dépendant de l'expérience du clinicien.
dure and the interpretation as to whether the resulting observations indicate L'introduction d'une nouvelle procédure, d'un nouveau matériel ou de
likely ability of the test to meet clinical needs is less well described and nouveaux réactifs repose sur différents critères, dont la valeur clinique espé-
understood. If the clinical requirements of a test are the determining criteria rée et l'efficacité dans l'environnement du laboratoire. Les procédures de
validation procedures are much easier to understand and interpret. At pre- validation de nouveaux tests sont bien décrites. Les motifs de chaque pro-
sent, most veterinary laboratory tests must refer to the recommended clini- cédure de validation et d'interprétation, c'est-à-dire la probabilité que les
cal requirements for human diagnostic testing. résultats du test répondent aux besoins de la clinique, sont moins bien com-
The experimental plan for within-laboratory method validation is presen- pris et moins bien décrits. Si les besoins cliniques sont les critères essentiels,
ted in four phases including initial familiarization with the method, prelimi- les procédures de validation sont plus faciles à comprendre et interpréter.
nary and more extensive validation experiments and implementation. Actuellement, la plupart des tests de laboratoire se réfèrent aux besoins cli-
Experimental designs are reviewed briefly for linearity studies, recovery niques recommandés pour le diagnostic humain.
studies, interference studies, within-run, between-run and replication stu- Le plan expérimental de validation de méthode intra-laboratoire en
dies, comparison of method studies and reference intervals. Data analyses quatre phases comprend une phase initiale de familiarisation, puis des
including the requirements for various statistical tests are described. étapes de validation préliminaires et plus approfondies, avant la mise en ser-
If within-laboratory validation experiments indicate likely acceptable cli- vice. Les modalités expérimentales sont passées en revue rapidement pour
nical performance the test procedure can be implemented for initial clinical l'étude de la linéarité, de la récupération, des interférences, de la précision
use. For tests which relate to a specific disease, prospective studies should intra- et inter-séries, de la comparaison de méthodes et des intervalles de
be designed, in consultation with clinicians, to evaluate medical decision référence. L'analyse des données et les besoins pour effectuer les tests sta-
limits leading to determination of diagnostic sensitivity, specificity and pre- tistiques sont indiqués.
dictive values of positive and negative test values. If clinical interpretation Si les expériences de validation intra-laboratoire indiquent des perfor-
is dependent upon reference intervals, these should be determined according mances vraisemblablement acceptables au plan clinique, le test peut être
to recommended procedures and the clinicians should be informed as to the mis en service pour une utilisation clinique initiale. Pour les tests visant une
source and reliability. affection particulière, des études prospectives devraient être mises au point,
en liaison avec les cliniciens, pour évaluer les limites de décision médicale
conduisant aux déterminations de la sensibilité, de la spécificité diagnos-
tiques et des valeurs prédictives positives et négatives. Si l'interprétation cli-
nique dépend de l'intervalle de référence, celui-ci devrait être déterminé
avec des procédures recommandées et les cliniciens devraient être informés
de la source et de sa fiabilité.

KEY-WORDS : laboratory test - method validation - data MOTS-CLÉS : analyse de biologie médicale - validation
analyses. de méthode - analyse de données.

Revue Méd. Vét., 2000, 151, 7, 623-630


624 LUMSDEN (J.H.)

Introduction Candidate methods must meet the practical requirements


of the laboratory. Are there adequate space, equipment, and
All laboratory tests must be validated before being introdu- personnel? Laboratory personnel with the aid of technical
ced for patient testing to insure that the values reported will literature can assess most practical requirements. Discussions
meet clinical expectations with a desired degree of reliability. with clinicians may be required to estimate throughput and
Re-validation is required, to a less or greater degree, follo- whether turnaround will meet desired reporting time. The
wing any change in reagents, instrumentation or protocol. estimated cost per test must consider the level of anticipated
Method validation begins with the considerations for, and use. Is there adequate supporting literature to expect the test
selection of, a new test method for introduction into the labo- to be used for a reasonable time or is this a current 'hot'
ratory or for patient-side use. Evaluation and validation of the research topic with minimal clinical validation or expe-
method analytical performance is required to assess the rience? Economic aspects should be weighed only after prac-
degree of error expected due to inaccuracy and imprecision tical considerations appear attainable.
and to confirm that the degree of error meets the anticipated
clinical requirements. The clinical requirements should be
pre-determined prior to initiating method validation. The
Evaluating and validating methods
procedures recommended for method validation differ with The reasons for using various procedures when validating
the type of test and the anticipated use. Recommendations laboratory methods is not always made clear in scientific
frequently differ between authors and sources of information. publications. Similarly, the correct use and the interpretation
Experiments must be designed so that the correct data are of statistical tests and the objective interpretation of accepta-
obtained. The appropriate statistical tools must be used to bility for laboratory tests being validated is not always appa-
correctly estimate errors. Statistical tests, although necessary, rent. I was encouraged to read the description by Dr.
do not prove acceptability. The final objective decision for WESTGARD as to his early experiences (http://www.west-
acceptance of a method must consider error assessment, gard.com/essay15htm), since it mirrors my observations.
practical technical and financial aspects as well as the antici- Most of the uncertainty regarding the decision as to method
pated ability to meet clinical requirements. acceptability is alleviated if one understands that the 'inner,
hidden, deeper, secret meaning of method validation' is 'error
Validation procedures are more demanding for a method
assessment' (www.westgard.com/essay15thm). How much
developed within the laboratory than for one developed by a
error might be present in the test result within your laboratory ?
manufacturer. Most manufacturers provide some perfor-
Could this degree of error affect the interpretation and possi-
mance data which can be used for comparison when asses-
bly patient care ? If the potential error is large enough to lead
sing initial performance within the laboratory of interest.
to misinterpretation, then the method is not acceptable.
Tests using complex methodology usually require more vali-
dation steps than moderately complex or simple methods. The method validation process becomes much easier to
Point-of-care methods require some validation in each labo- understand if the focus is directed towards the sources of
ratory or clinic setting, especially when used for species other potential analytical errors and how these errors can be inves-
than those for which the methods were originally designed. tigated. What experiments, and which experimental designs,
will best demonstrate the errors ? How many observations
The objectives, procedures and the study designs for are required to obtain good estimates of the error ? For the
method validation are reviewed. The graphical and statistical experimental design which statistical method should be used
evaluations of method comparison studies are compared. to describe the extent of the error observed ? How much error
Validation of diagnostic performance is the final step in can be accepted within a method without affecting interpreta-
method validation and requires clinical cooperation and tion ? This latter question requires consultation with clini-
application of epidemiological principles. cians, if possible, and should be predetermined.

Selection of a new or revised ERROR


Analytical (total) error is the summation of random error
method (imprecision) and systemic error (inaccuracy or bias) [17].
New test procedures are considered for a variety of rea- Random error is the amount of variation inherent in the
sons. Research reports often initiate interest by clinicians or method. Systemic error is the difference from the true value.
instrument and reagent manufacturers. New instruments and Although total error is the important question for the clini-
reagents may appear to provide improved accuracy or preci- cian, random and systemic errors are usually examined inde-
sion, to reduce technical effort, to improve efficiency, to pendently in the laboratory.
reduce reporting time and/ or cost per test. Random error is both positive and negative relative to the
Method selection should begin with a clinical perspective. observed mean value of replicate determinations [17]. These
The primary consideration should be the ability to produce replicate observations from a single sample can be plotted as
results which have sufficient analytical reproducibility and a histogram to illustrate the spread around the mean ( m ).
accuracy to meet anticipated clinical requirements. This deci- The calculated standard deviation (SD) is used to quantitate
sion should be based upon examination of all available data the degree of imprecision. This imprecision can be expressed
estimating error of the method and insuring that the observed in percentage as the coefficient of variation (CV) using the
error is acceptable for diagnostic purposes. formula CV = SD x 100 / m .For methods where the observed

Revue Méd. Vét., 2000, 151, 7, 623-630


LABORATORY TEST METHOD VALIDATION 625

SD increases as the analyte concentration or activity of imprecision to the reference interval and expressed in per-
increases, the CV remains relatively constant. If the SD centage [39] using the formula ALE (%) = ± 1/4 normal
remains constant as the analyte concentration or activity range x 100 / mean of normal range. The original maximum
increases the calculated CV will be lower at higher analyte ALE of 10 % was increased to 20 % due to the inability of
concentrations. The opposite might also occur. Thus, the ana- many laboratory methods to reach the original objective.
lyte concentration or activity must be known for the CV to The current allowable error recommendations are listed for
have relevance. The CV provides ready comparison of routine tests including clinical chemistry, hematology, endo-
method imprecision at different analyte concentrations or for crinology and related markers, for immunology and for toxi-
comparison between analyte methods since it is expressed as cology and therapeutic drug monitoring [17]. The Clinical
a percentage and is independent of the concentration or acti- Laboratory Improvements Amendments (CLIA'88) guide-
vity units involved. lines [40] for acceptable analytical performance can be vie-
Systemic error (bias) may be constant or proportional and wed at (www.westgard.com/clia.htm). The European Calcu-
is either positive or negative, as compared to the true (cor- lated Biological Allowable Total Errors can be viewed at
rect) value [17]. Comparison of methods is used to assess (www.westgard.com/europe.htm).
systemic error. The bias is calculated as the average diffe- Examples of acceptable analytical performance [17, 40]
rence, or the difference between averages, for observations appear in Table I. Until species specific data is available vete-
obtained using both the 'new' and the 'old' or a 'reference' rinary diagnostic laboratories must rely upon criteria defined
method. for medical laboratories.

ACCURACY
Accuracy of a method is defined by the International Test or Analyte Acceptable Performance
Federation of Clinical Chemists (IFCC) as the closeness of
Alanine aminotransferase Target value ± 20%
the agreement between the measured value and the "true"
Albumin Target value ± 10%
value [17]. The accuracy of a new method can be described
in terms of either the systemic error or the total error relative Alkaline phosphatase Target value ± 10%
to the best available estimate of the "true' value. Definitive Amylase Target value ± 10%
methods, such as mass spectrometry, are used to develop pri- Bilirubin, total Target value ± 6.8 µmol/l or ± 20% (greater)
mary reference materials which can then be used for deve- Blood gas pO2 Target value ± 3 SD
lopment of reference methods by manufacturers [17]. Blood gas pCO2 Target value ± 5 mm Hg or ± 8% (greater)
Comparative method means have been shown to closely
Calcium Target value ± 0.2495 mmol/l
approximate true values. Comparative method means, to a
Chloride Target value ± 5%
great extent, are obtained from the observations generated by
multiple laboratories using a variety of instruments and tech- Creatinine Target value ± 26 µmol/l or ± 15% (greater
niques. Comparative method means have replaced use of Glucose Target value ± 0.33 mmol/l or ± 10% (greater)
reference laboratory means [17]. Peer group means are obtai- Magnesium Target value ± 25%
ned from the proficiency testing results of several laborato- Potassium Target value ± 0.5 mmol/l
ries using similar instruments and techniques. The manufac- Sodium Target value ± 4 mmol/l
turer is relied upon to have made comparison to some inde- Total protein Target value ± 10%
pendent measure of accuracy.
Leukocyte count Target value ± 15%
Many national and international committees continue to
Hematocrit Target value ± 6%
develop and provide information relating to the accuracy base
Platelet count Target value ± 25%
for analytes of clinical interest. Increasingly this information
can be obtained at various sites on the World Wide Web (e.g. Cortisol Target value ± 25%
http://www.aacc.org for the Standards Committee of the Thyroid stimulating hormone Target value ± 3 SD
American Association of Clinical Chemistry). For practical Thyroxine Target value + 13 nmol/l or ± 20% (greater)
reasons, various sources of sera are used as estimates of the
"true" value in veterinary diagnostic laboratories. TABLE I.— Acceptable performances for the most frequently used analyses.

Clinical requirements Experimental plan for validation


In order to be able to assess whether a new method is able
to meet expectations, the clinical requirements must be defi- of a new method
ned. Criteria for test method clinical requirements are usually Four phases have been described including the initial fami-
related to the biological distribution of values observed liarization with the method, preliminary validation experi-
within a healthy population [3, 7, 8, 9, 36, 39] or to consen- ments, more extensive evaluation of precision and accuracy
sus opinions based upon perceived requirements for clinical and implementation for routine use [17]. The preliminary
diagnosis [35, 40]. Tonk's 'Allowable Limits of Error' (ALE) validation includes those steps that can be done more easily
originally proposed in 1960 were based upon the relationship within a few days. If unsatisfactory, a decision may be made

Revue Méd. Vét., 2000, 151, 7, 623-630


626 LUMSDEN (J.H.)

that the method is unacceptable. More extensive precision sion studies [43]. The SD of a method can be calculated from
and validation evaluations must be studied over a minimum the values observed for patient samples done in duplicate
of 20 working days. where the SD equals the square root of (the sum of the diffe-
rences squared divided by two times the number of samples).
FAMILIARIZATION Interference studies
The familiarization phase includes establishment of the Interference studies are done initially to assess the effects of
working procedure, checking the working range and the cali- hemolysis, hyperbilirubinemia and hyperlipemia [10, 14, 15,
bration. The detection limit may be determined initially or in 25]. Other interferences may be studied at a later time [45].
later experiments. Establishment of the working procedure The interference studies provide information about potential
includes preparing reagents, setting up the instrument, cali- systemic error as may be caused by lack of specificity of the
brating the methods and obtaining results from test samples. method. Different experimental designs are described. If a
The standards must be carefully checked to insure that the method is available which is known to be free of such interfe-
rences a series of samples containing increased concentrations
calibration is correct.
of the interferent are analyzed using both methods and the
results compared [17]. Test samples may be prepared by
Linearity studies
adding interfering materials to one aliquot and an accurately
The desired working range is defined during the prelimi- pipetted equal volume of solvent or water to the other aliquot.
nary evaluation, or is assessed from the manufacturer's speci- The analyte concentration is tested in each aliquot, preferably
fications for the method. The working range can be validated in duplicate or triplicate, and the observed values are compa-
as part of the familiarization studies. Two pools of sera are red for interference effects [10]. Increasing amounts of the
selected, one with analyte concentration or activity close to interferent can be added to pooled species sera. Dilution
zero, or the detection limit, and the other with a high concen- effects must be considered when assessing results. The ave-
tration, or close to the expected upper limit of the working rage difference between aliquots with, and without, added
range. Varying proportions of sera are mixed to create an interferent can be calculated and plotted. The difference, or
additional three, or more, concentrations, thus at minimum error, when the interference is present is compared to the error
five specimen levels. Two [17], three [43] or four [24] mea- that is allowable for the test. The method is assessed as accep-
surements are made for each specimen. The observed values table or unacceptable for the analyte for the species.
are plotted on the x axis against absorbance on the y axis. Due to the species differences reported, for each method
Visual inspection for linearity is used as an estimation of interference experiments must be done for each species of
upper working range for the method. At the same time the interest for veterinary diagnostic laboratories [14]. Visual
absorbance of the zero blank can be assessed as to potential assessment can be observed from graphical presentation.
significance for the method. When automatic correction factors were developed for hemo-
lysis, bilirubinemia and lipemia in an automated instrument
PRELIMINARY VALIDATION designed for use with human sera but used with domestic ani-
mal species sera, the interspecies differences precluded
The preliminary validation experiments include within-run reporting patient values when these interferences were noted
replication, interference, recovery and judgement of analyti- (personal observations). Significant effort is required to
cal acceptability [17]. develop allowable limits for the three common interferences
in each animal species sera. In veterinary diagnostic labora-
Within-run replication tories, at minimum, the effects of these common interfe-
The initial replication study assesses random error within- rences should be studied to determine whether there is a posi-
run or within-day. Samples are chosen which approximate tive or negative bias. Ideally, the maximum concentration of
the interferent that creates error greater than allowable for the
medical decision levels of greater interest for the test. The
method should also be determined.
matrix of the sample, whether serum, urine, cavity fluid, etc.,
should approximate the matrix of clinical specimens for the Recovery studies
analyte of interest. Control solutions made with a similar Recovery studies assess proportional systematic error due
matrix as patient samples, e.g. lyophilized serum, have pre- to competitive reactions from substances within the speci-
determined analyte concentrations as well as long term stabi- men, including matrix effects [17]. Different amounts of the
lity. Aqueous solutions may be used for preliminary studies of substance of interest are added to baseline sera pools, using
some analytes. Pools of fresh patient samples are used fre-
concentrated solutions, so that dilution effects are less than
quently for initial within-run / within-day replication studies.
10 % [17]. Experimental samples are analyzed in quadrupli-
Two or three samples are selected with analyte concentrations
cate in order to detect small additions or duplicates if the
which approximate important medical decision limits [16, 17].
additions are large [17, 44]. The original concentration is
Minimums of 20 replicates are recommended for each subtracted from the final observed concentration to deter-
sample. The mean, SD and CV are calculated and compared mine the amount recovered. This amount is divided by the
to allowable SD, or CV, for the method. Within-run SD's amount added, multiplied by 100 and expressed as the per-
should be about 1/2 the allowable SD since additional cent recovery. The error observed should be less than the
sources of error are usually observed for between-run preci- allowable total error pre-assigned for the method.

Revue Méd. Vét., 2000, 151, 7, 623-630


LABORATORY TEST METHOD VALIDATION 627

Recovery studies are subject to problems associated with performed and the time over which the comparison will be
design and performance, calculation of the data and interpre- made. It is important that the samples are analyzed within a
tation of the results. If a very reliable comparison method is maximum of a few hours so that differences may not be due
available, e.g. reference method, a method comparison study to sample handling effects [17, 44]. Experimental design and
should be given higher priority than a recovery study [44]. the correct approach to data analyses continue to create dis-
When bias is observed for the method comparison study, the cussion [11, 16, 20, 27, 31, 37, 41, 43]. The following recom-
recovery study may help to explain the bias observed [17, 44]. mendations apply to continuous data. If data is not conti-
nuous, kappa statistics are designed to allow comparison of
DETAILED VALIDATION STUDIES categorical data [4, 5].
Providing that the preliminary validation studies are favo- Most authors, or working groups, recommend comparing a
rable, more extensive experiments are initiated. These minimum of 40 patient specimens. Selection of specimens
include replication studies conducted over at least 20 wor- with concentrations representing the working range of the
king days, comparison of methods studies, judgement of ana- method is critical to the success of the comparison. A larger
lytical acceptability, verification of reference interval and number of specimens, e.g. 100, will increase the probability
documentation studies. of detecting interferences due to sample matrix influences
Replication studies and thus whether the specificity of the new method is similar.
Ideally, each specimen is tested in duplicate by each method.
Replication studies are done to estimate method imprecision
The samples should be placed in a different order within a
due to random error. In addition to the initial studies confir-
run or ideally within different runs if completed within 2
ming acceptable within-run precision, between-run experi-
hours. Duplicate testing increases the probability of identi-
ments are designed to provide a more realistic estimation of
fying numerous potential sources of error including valid
overall errors which may occur due to instrument instability,
outliers [17, 44]. Without duplicate testing it is even more
reagent preparation, changes in ambient temperature, analyst,
important that large method differences are identified imme-
etc. Such estimations are termed "between-day", "day-to-day"
diately so that these specimens can be reanalyzed while still
or "total" imprecision [26]. Between-day precision should be
available.
examined over 20 days. Thus sample stability is a requirement.
Lyophilized control sera is often used because if greater stabi- Method comparison should be done over a minimum of 5
lity and because high and low analyte concentrations allow days, e.g. 8 specimens a day, by each method within 2 hours
evaluation of performance using concentrations which [44]. If discrepancies are noted the comparison should be
approximate important medical decision limits. The observed continued for an additional 5 days. Results should be graphed
mean, SD and CV are calculated and compared to allowable immediately and visually inspected for discrepancies [17,
total error. For within-run or within-day the acceptable SD is 44]. Where large differences are noted the patient samples
1/4 or less the total error while for between-day studies the SD should be reanalyzed immediately to eliminate errors due to
should be 1/3 or less than the defined total error [17, 40]. recording or sample identification.
Experimental studies of precision can be designed to make Data analysis
use of analyses of variance [26].
Analysis of data from comparison of method studies conti-
Comparison of methods nues to remain an actively discussed subject [2, 11, 12, 16,
A comparison of methods study is done to estimate systemic 17, 18, 27, 31, 33, 43] The agreed first step is to plot the
error (inaccuracy or bias) within the new method. If bias is pre- observed values and visually examine the data using diffe-
sent, and if the appropriate statistical calculations are done, sys- rence plots and / or comparison line plots. Comparison (line
temic error will be identified as constant, proportional, or a or x/y) plots are made with the comparative method values on
combination [17, 44]. The experimental design and the statis- the x-axis and the new method values on the y-axis. Visual
tical tests used are critical for identification of systemic errors. inspection and drawing a line of best fit provide indication of
The primary effort within the laboratory should be devoted to the analytical range of the data, the linearity and the relation-
collecting the appropriate data for method validation [43]. ship of the methods as shown by the closeness of the values
The 'new' method is compared to an existing 'comparative' to the line of identity and the intercept with the y-axis [16].
method. Ideally this will be a high quality 'reference' method Bias plots
with documented accuracy so that observed differences can
Difference, or bias plots can be made using various values
be assigned to the new method. In diagnostic laboratories the
new method is compared most frequently to a routine, or for the x-axis and the y-axis [16]. The values to be used are
'field' method using patient samples. When observed diffe- dictated by the degree of reliability of the comparative
rences between the methods are small each method is consi- method and whether duplicate analyses have been determi-
dered to have similar relative accuracy. If observed diffe- ned. If the comparative method is of high quality, such as a
rences are large, and unacceptable, further studies are requi- reference method or well studied routine method, the compa-
red to identify which method is inaccurate. rative method results are plotted on the x-axis and the diffe-
rence between the new and the comparative method on the y-
Experimental design axis [27]. If duplicate measurements are made, the difference
Experimental design must consider the number of samples between all measurements and the 'true' value can be plotted
to be compared, whether single or duplicate analyses will be allowing estimation of the precision of the replicate measure-

Revue Méd. Vét., 2000, 151, 7, 623-630


628 LUMSDEN (J.H.)

ments [16]. If the comparative method is a routine method of [33]. Similarly, when using t-test statistics there may be no
unconfirmed reliability, the average of the comparative and difference in mean observations even when there is a large
new method observations for a patient sample is plotted on proportional error. The t-test statistic is mainly useful for
the x-axis [2]. indicating if sufficient data has been collected to reliably esti-
When visually assessing difference plots the observations mate bias [43].
should distribute evenly above and below the zero line over For the values observed from many comparative studies
the range of observations [16]. If not evenly distributed alternative statistical tests should be used. These more com-
above or below the zero line there is likely a bias. If the dis- plex statistical tests are now readily available commercially
tribution changes with concentration, constant or proportio- and from the World Wide Web [[email protected] ;
nal errors may be apparent. As the concentration increases www.westgard.com]. Many of these programs provide 'help'
changes in precision may be observed, especially if the sites that include background information detailing use and
observations include duplicate values [17]. interpretation of the tests. The current debate within the
Statistical analysis scientific literature and the evolution and refinement of sta-
tistical programs should lead to increased agreement in opi-
Statistical analyses are used to provide numbers which can nions encountered by readers. One of the following statistical
be incorporated into the overall considerations as to whether tests should be used if the assumptions required for using
a method is acceptable. The use of statistical tests is easier to simple linear regression have not been met.
understand if detection of errors within the 'new' method is
the objective for using statistical tests to examine the method The DEMING regression method allows for imprecision in
comparison observations [43]. The statistical tests are not both methods when assessing the degree of agreement 16].
used to prove test acceptability [16, 43]. Also, it is useful to Single or duplicate observations can be analyzed and plotted
know that the recommended statistical tests for assessing to provide statistical and visual demonstration of error inclu-
agreement may not be the correct tests to use for prediction ding imprecision, bias and whether constant or proportional
of medical decision limits including reference intervals [16]. [[email protected]]. According to a recent report [23],
iteratively reweighted general Deming regression can be
The traditional statistical approach for analyzing compari-
used to produce statistically unbiased estimates of systemic
son of methods studies is to use linear regression to estimate
bias and reliable confidence intervals of bias for all cases.
the slope and intercept and a 't-test' of the mean observations
to detect bias. When using simple linear regression it is assu- The PASSING & BABLOK regression method allows for
med that the comparative method is free from error, that error imprecision in both methods, the imprecision does not have
in the 'new' method is normally distributed and that the error to be normally distributed nor constant over the range exami-
is constant over the sampling range [16]. If the patient ned and extreme values can be included [16, www@analyse-
samples chosen for study are inadequate, e.g. due to inclu- it.com].
ding too narrow a range of observations, and unless these ini- Concordance correlation coefficient [18] has been propo-
tial statistical tests are used correctly there may be significant sed as an alternative method to avoid the deficiencies of the
misinterpretation as to acceptability of the method [37]. The paired t-test and PEARSON'S product correlation [33]. The
40 to 100 samples traditionally recommended for method concordance correlation coefficient is used to calculate a
comparison studies may be inadequate if the range of obser- number which categorizes test performance as good to poor.
vations is narrow [20], as for electrolytes where the range This approach should be considered where allowable total
observed in healthy individuals may be wide (potassium) or error has not been predetermined.
narrow (sodium) relative to the average concentration, i.e.
Acceptable performance
range ratio. For a range ratio of 2, the author states that 544
samples are required to detect one standardized slope devia- Judgement as to the acceptability of a 'new' method
tion whereas if the range ratio is 10, then 64 samples can requires consideration of the study design, the reliability of
attain the desired differences of medical importance [20]. the comparative method and the visual and statistical exami-
nation of method comparison data. The method total error is
Simple linear regression provides a good estimate of slope
calculated as the measured error (bias) + 3 x the measured
and intercept if the correlation coefficient (r) is 0.99 or larger
SD, i.e. systemic plus random error [41], or systemic error +
[27, 37]. If r is less than 0.975, the range of data may be
4 x SD [17], or total error = 1.65 (imprecision) + inaccuracy
inadequate and simple linear regression method may not be
[13]. The calculated total error must not exceed the predeter-
able to provide a good estimate of slope and intercept [27,
mined allowable total error.
37]. If this applies, improvement of the data range or use of
alternate statistical methods is required [27]. Visual examina- A method evaluation decision chart has been designed to
tion of plotted observations should allow estimation as to incorporate measured bias, measured imprecision and allo-
whether the imprecision is similar for both methods and whe- wable total error [42, [email protected]]. The location of
ther there is a change with sample concentration. the 'operating point' is used to judge method acceptability
Examination of residual plots from regression analysis can be [42, 44].
useful for making this assessment [27]. It is important to note The use of scatter and difference plots and the graphical
that although the slope of the line for simple linear regression and statistical interpretation of acceptability of a new (field)
may be ideal, i.e. r = 1 using Pearson's product correlation, method versus a reference method have been addressed [12].
there may be significant constant bias or marked imprecision Difference plots allow examination and interpretation of the

Revue Méd. Vét., 2000, 151, 7, 623-630


LABORATORY TEST METHOD VALIDATION 629

new method observations according to specific criteria. The


standard deviation of the differences in comparative method
Point of care instrumentation
studies is considered to be an indispensable tool for evalua- It is anticipated that 'point of care' testing will increase
tion of aberrant-sample bias (matrix effects) [12]. significantly, in wards, in intensive care units and for field
use [28]. There are too few reports documenting validation of
Reference intervals these 'point of care' methods when used for various animal
Reference intervals, and other medical decision limits, species samples. Similarly, method comparisons with central
must be validated for each method used in a laboratory [40]. laboratory methodology is seldom determined or provided to
If the reference intervals were reliable for the comparative users. Critical care clinicians, technologists and nurses
method and there is acceptable agreement between methods, require further knowledge about method validation.
the method comparison data is used to predict reference Manufacturers specifications are often limited to study of
intervals for the new method [17]. human specimens for accuracy, precision and reference
If there is doubt about the reliability of the previous limits, limits. The comparison of methods used for point of care
or if the comparison data do not allow confident estimation of tests presents challenges similar to comparison of method
studies for tests used in the central laboratory and in critical
limits for the new method, such as due to apparent matrix
care or for 'out-of hours' testing. Do the tests provide similar
effects, de novo reference limits should be determined [17,
or different values? Are there differences in medical decision
21, 22, 34].
limits, reference limits, imprecision ? Should separate refe-
The method comparison data and the types of errors obser- rence intervals be used ? How should this data to be electro-
ved dictate the recommended statistical test for estimation of nically filed for future use ?
medical decision limits for a new method. Prediction of
What are the minimum steps required for validation of
limits is done using statistical programs which are similar to point of care methods for use with animal species ? Without
those used for agreement, but which incorporate different published and accepted guidelines one can only suggest
algorithms [16, [email protected]]. adherence to the basic principles discussed previously. The
Linear regression, using least-squares estimation, requires intent is to detect those 'errors' which might significantly
that the relationship is linear, that the comparative method is affect clinical decisions. Method selection requires careful
free from imprecision and that measurement error for the review of manufacturer's information and method specifica-
new method is normally distributed and constant over the tions. How can studies be designed that will detect likely
sampling range [16]. If these assumptions are not met alter- sources of error and remain practical and economical? Has
native methods should be used for predicting reference or the manufacturer, or other users, done method comparison
medical decision limits [1]. studies ? Is this data available? How has the data been pre-
DEMING regression has been widely adopted in clinical sented and analyzed ? Is this information likely transferable ?
medicine because imprecision can present in both methods Have reference values been developed using adequate num-
[6]. The imprecision should be normally distributed [38]. If bers of clinically healthy individuals for the species of inter-
est ? Can the purchaser expect assistance from the manufac-
the imprecision is not constant over the sampling range the
turer ? Each of these questions should be addressed.
weighted DEMING regression [19] is recommended.
PASSING & BABLOK regression is recommended when
imprecision occurs in both the comparative and the new Clinical performance
method data [30]. The imprecision need not be normally dis-
The clinical performance is the ultimate assessment of the
tributed nor have constant variance over the sampling range.
value of a laboratory test. Clinical performance, as compared
The primary restriction is that the ratio of the imprecision x /y
to analytical performance, further assesses and describes the
must be equal to the slope squared [29]. Extreme values do ability of a test to assist making clinical decisions (usually
not unduly affect the regression line [30]. diagnoses) under defined circumstances. It has been stated
that medical decision limits should be used by clinicians ins-
IMPLEMENTATION PHASE tead of reference intervals [32]. When laboratory tests are
During the implementation phase the method protocol is being used to rule-in and rule-out disease, or to guide pro-
written in a format that can be used to train new analysts and gnosis, decision limits are of great benefit, if determined.
to meet standard operating procedure requirements [17]. The When laboratory tests are being used to screen for disease
quality control procedure is chosen and implemented. The and to assess pathophysiology reference limits are used
method is introduced for service. The clinicians should be extensively. Alternatively, knowledge of analytical precision
informed as to the availability of the method, anticipated is required when monitoring patients. Only if the new and the
scheduling, precision at important medical decision limits, previous methods have been compared and validated to have
the reliability of reference limits, i.e., interim or final. It is similar analytical performance can medical decision limits,
important to monitor performance closely for the first month or 'cut points' be transferred to a new test method.
with regular reviews for several months. Sources of problems It is not the intent of this manuscript to address validation
are identified, preventive maintenance procedures are impro- of clinical performance except to emphasize the importance
ved and quality control procedures updated [17]. as a necessary step in method validation. At defined decision

Revue Méd. Vét., 2000, 151, 7, 623-630


630 LUMSDEN (J.H.)

limits diagnostic sensitivity and diagnostic specificity indi- 17. — KOCH, D.D. and PETERS T. : Selection and evaluation of methods.
In : Tietz Textbook of Clinical Chemistry, 3rd ed, C.A. BURTIS and
cate the ability of a test to detect disease in sick individuals E.R. ASHWOOD (Eds.), W.B. Saunders Company, Philadelphia,
and to indicate no disease in healthy individuals, respectively 1999, 320-335.
[32]. The more likely clinical scenario is to use a test to indi- 18. — LIN L.I. : A concordance correlation coefficient to evaluate reprodu-
cibility. Biometrics, 1989, 45, 255-268.
cate possibility of disease as well as to differentiate two 19. — LINNET K. : Evaluation of the linear relationship between the mea-
diseases. Thus, depending upon the clinical question, i.e. surements of two methods with proportional errors. Stat. Med., 1990,
how the test is used, the same test will have very different 9, 1463-1473.
20. — LINNET K. : Necessary sample size for method comparison studies
diagnostic sensitivity and specificity. If disease prevalence is based on regression analysis. Clin. Chem., 1999, 45, 882-894.
known, the predictive value of a positive and a negative test, 21. — LUMSDEN J.H. and MULLEN K. : On establishing reference
or likelihood ratios, can be calculated [32]. values. Can. J. Comp. Med., 1978, 42, 293-301.
22. — LUMSDEN J.H. : "Normal" or reference values : questions and com-
Some validation of diagnostic performance can be calcula- ments (Editorial). Vet. Clin. Path.,1998, 27, 102-106.
ted retrospectively, providing a method has been used long 23. — MARTIN R. : General Deming regression for estimating systemic
enough to accumulate the necessary data. Prospective study bias and its confidence interval in method-comparison studies. Clin.
Chem., 2000, 46, 100-104.
designs have many advantages. The primary requisite is a 24. — NCCLS : Document EP6-P. Evaluation of the linearity of quantitative
cooperative desire and effort by clinicians and laboratory analytical methods. NCCLS, 940 West Valley Road, Suite 1400,
specialists with adequate knowledge or with consultation Wayne, PA, 1986.
from epidemiologists. 25. — NCCLS : Document EP7-P. Interference testing in clinical chemistry.
NCCLS, 940 West Valley Road, Suite 1400, Wayne, PA, 1986.
26. — NCCLS : Document EP5-T2. Precision performance of clinical che-
mistry devices - Second edition - Tentative Guideline. NCCLS, 940
References West Valley Road, Suite 1400, Wayne, PA, 1992.
27. — NCCLS : Document EP9-A. Method comparison and bias estimation
1. — ARMITAGE P and BERRY G. : Statistical methods in medical using patient samples; Approved guideline. NCCLS, 940 West Valley
research, 3rd ed, Blackwell Scientific Publications, Boston, 1994, Road, Suite 1400, Wayne, PA, 1995.
156-163, 312-334. 28. — NCCLS : Document POL1/2-T3, Pol3-R. Physician's office labora-
2. — BLAND J.M. and ALTMAN D.G. : Statistical methods for assessing tory guidelines, procedure manual: Tentative Guidelines and
agreements between two methods of clinical measurement. Lancet, CLIA/NCCLS POL Index. 3rd ed. NCCLS, 940 West Valley Road,
1986, 307, 10. Suite 1400, Wayne, PA, 1995.
3. — COTLOVE E, HARRIS E.K and WILLIAMS G.Z. : Biological and 29. — PASSING H. and BABLOK W. : A new biometrical procedure for
analytical components of variation in longterm studies of serum testing the equality of measurement from two different analytical
constituents in normal subjects. III Physiological and medical impli- methods. J. Clin. Chem. Biochem., 1983, 21, 709-720.
cations. Clin. Chem., 1970, 16, 1028-1032. 30. — PASSING H. and BABLOK W. : A general regression procedure for me-
4. — COHEN J.A : Coefficient of agreement for nominal scale. Ed. thod transformation. J. Clin. Chem. Clin. Biochem., 1988, 26, 783-790.
Psychol. M., 1960, 20, 37-46. 31. — POLLOCK M.A. : Method comparison - a different approach. Ann.
5. — COHEN J. : Weighted kappa : nominal scale agreement with provi- Clin. Biochem., 1992, 29, 556-560.
sion for scaled disagreement or partial credit. Psychol. Bull., 1968, 32. — SACKETT D.L., HAYNES R.B., GUYATT G.H. and TUGWELL P. :
70, 213-220. Clinical epidemiology: a basic science for clinical medicine, 2nd ed.,
6. — CORNBLEET P.J. and GOCHMAN N. : Incorrect least-squares Little, Brown and Company, London, 1991.
regression coefficients in method comparison analysis. Clin. Chem., 33. — SHOUKRI, M.M. and PAUSE C.A. : Statistical methods for health
1979, 25, 432-438. sciences, 2nd ed., CRC Press, London, 1999, 35-41.
7. — ELVITCH R.(Ed.) : College of American Pathologists Conference 34. — SOLBERG.H.E. : Establishment and use of reference values. In :
Report. Conference on analytical goals in clinical chemistry, at Tietz Textbook of Clinical Chemistry, 3rd ed., C.A. BURTIS and E.R.
Aspen, Colorada. Skokie, Il, College of American Pathologists, 1976. ASHWOOD (Eds.), W.B. Saunders Company, Philadelphia, 1999,
8. — FRASER C.G., HYLTOFT PETERSEN P., RICOS C. and HAE- 336-356.
CHEL R. : Proposed quality specifications for the imprecision and 35. — STÖCKL D. : European specifications for imprecision and inaccu-
inaccuracy of analytical systems for clinical chemistry. Eur. J. Clin. racy compared with US CLIA proficiency-testing criteria. Clin.
Chem. Clin. Biochem., 1992, 30, 311-317. Chem., 1995, 41, 120-121.
9. — FRASER C.G. and HYLOFT PETERSEN P. : Quality goals in exter- 36. — STÖCKL D. and REINAUER H. : Development of criteria for the
nal quality assessment are best based upon biology. Scand. J. Clin. evaluation of reference method values. Scand. J. Clin. Lab. Invest.,
Lab. Invest., 1993, 53 (Suppl 212), 8-9. 1993, 53 (Suppl 212), 16-18.
10. — GLICK M.R., RYDER K.W. and JACKSON S.A. : Graphical com- 37. — STÖCKL D., DEWITTE K. and THIENPONT L.M. : Validity of
parisons of interferences in clinical chemistry instrumentation. Clin. linear regression in method comparison studies : is it limited by the
Chem., 1986, 32, 470-475. statistical model or the quality of the analytical input ? Clin. Chem.,
11. — HOLLIS S. : Analysis of method comparison studies (Editorial). Ann. 1998, 44, 2340-2346.
Clin. Biochem., 1996, 33, 1-4. 38. — STRIKE P.W. : Statistical methods in laboratory medicine. ISBN 0-
12. — HYLOFT PETERSEN P., STÖCKL D., BLAABJERG O., PEDER- 7506-1345-9, 1991, 307-330.
SEN B., BIRKEMOSE E., THIENPOINT L., LASSEN J.F. and 39. — TONKS D.B. : A quality control program for quantitative clinical
KJELDSEN J. : Graphical interpretation of analytical data from com- chemistry estimations. Can. J. Med. Tech., 1968, 30, 38.
parison of a field method with a reference method by use of diffe- 40. — US Department of Health and Human Services : Clinical laboratory
rence lots. Clin. Chem., 1997, 43, 2039-2046. improvements amendments of 1988. Final rule. Laboratory require-
13. — HYLOFT PETERSEN P., RICOS C., STOCKL D., LIBEER J.C. ments. Federal Register, 1992, 57, 7002-7288.
BAADENHUIJSEN H., FRASER C. and THIENPONT L. : 41. — WESTGARD J.O and HUNT M.R. : Use and interpretation of com-
Proposed guidelines for the internal quality control of analytical mon statistical tests in method-comparison studies. Clin. Chem.,
results in the medical laboratory. Eur. J. Clin. Chem. Biochem, 1996, 1973, 19, 49-57.
34, 983-999. 42. — WESTGARD J.O. : A method evaluation decision chart (MEDx
14. — JACOBS R.M., LUMSDEN J.H. and GRIFT E. : Effects of bilirubi- chart) for judging method performance. Clin. Lab. Science, 1995, 9,
nemia, hemolysis, and lipemia on clinical chemistry analytes in 277-283.
bovine, canine, equine, and feline sera. Can. Vet. J., 1992, 33, 605-608. 43. — WESTGARD J.O. : Point of care in using statistics in method com-
15. — JACOBS R.M., LUMSDEN J.H. and TAYLOR J.A. : Canine and parison studies (Editorial). Clin. Chem., 1998, 44, 2240-2242.
feline reference values. In : Kirk's Current Veterinary Therapy XIII 44. — WESTGARD J.O. : Basic method validation, Westgard Quality
Small Animal Practice, BONGURA J.D. (Ed.), W.B. Saunders Corporation, Ogunquit ME, USA, 1999.
Company, Philadelphia, 2000, 1207-1227. 45. — YOUNG D.S. : Effects of drugs on clinical laboratory tests, 4th ed.,
16. — JONES R. and PAYNE B. : Clinical investigation and statistics in AACC Press, Washington, D.C. 1995.
laboratory medicine. ACB Publications, London, 1997.

Revue Méd. Vét., 2000, 151, 7, 623-630

View publication stats

Vous aimerez peut-être aussi