0% found this document useful (0 votes)
92 views7 pages

Simple Linear Regression Guide

1. Simple linear regression is a mathematical technique that models the relationship between a single independent (predictor) variable and a single dependent (outcome) variable. It can be used to draw inferences about a larger population based on analyzing sample data. 2. The article reviews the four fundamental assumptions of simple linear regression - that there is a linear relationship between variables, the variation around the regression line is constant, the variation follows a normal distribution, and the deviations are independent. 3. Small clinical examples using theoretical data sets on diabetic ketoacidosis and aspirin overdose treatment are introduced to illustrate key concepts of simple linear regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views7 pages

Simple Linear Regression Guide

1. Simple linear regression is a mathematical technique that models the relationship between a single independent (predictor) variable and a single dependent (outcome) variable. It can be used to draw inferences about a larger population based on analyzing sample data. 2. The article reviews the four fundamental assumptions of simple linear regression - that there is a linear relationship between variables, the variation around the regression line is constant, the variation follows a normal distribution, and the deviations are independent. 3. Small clinical examples using theoretical data sets on diabetic ketoacidosis and aspirin overdose treatment are introduced to illustrate key concepts of simple linear regression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ACAD EMERG MED d January 2004, Vol. 11, No. 1 d www.aemj.

org 87

Advanced Statistics: Linear Regression, Part I:


Simple Linear Regression
Keith A. Marill, MD
Abstract
Simple linear regression is a mathematical technique used to transformations, dummy variables, relationship to inference
model the relationship between a single independent testing, and leverage. Simplified clinical examples with
predictor variable and a single dependent outcome variable. small datasets and graphic models are used to illustrate the
In this, the first of a two-part series exploring concepts in points. This will provide a foundation for the second article
linear regression analysis, the four fundamental assump- in this series: a discussion of multiple linear regression, in
tions and the mechanics of simple linear regression are which there are multiple predictor variables. Key words:
reviewed. The most common technique used to derive the regression analysis; linear models; least-squares analysis;
regression line, the method of least squares, is described. statistics; models; statistical; epidemiologic methods. ACA-
The reader will be acquainted with other important DEMIC EMERGENCY MEDICINE 2004; 11:87–93.
concepts in simple linear regression, including: variable

Linear regression is a mathematical technique that more complex technique termed ‘‘nonlinear regres-
attempts to describe the relationship between two or sion’’ may be used.
more variables with a linear or straight-line function. In this article, four clinical questions with associ-
Based on an analysis of the available data or sample, ated small theoretical data sets are introduced to
the technique also can be used to draw inferences illustrate the major points. The clinical questions focus
about a larger population or data set, or to make on the treatment of diabetic ketoacidosis (DKA) and
predictions about future data. Simple linear regres- aspirin overdose. The four data sets are illustrated
sion is a subtype of linear regression in which there is graphically in Figures 2 through 5. We will return to
a single outcome or dependent variable and a single these data sets later in the corresponding figures in
predictor or independent variable. the second article when a multiple regression ap-
Linear regression is a popular technique because proach is taken. The raw data for these examples are
many phenomena of interest have a linear relation- listed in Table 1.
ship, and the technique is able to demonstrate math-
ematically and visually the relationships between FUNDAMENTAL ASSUMPTIONS
clinically important variables. The linear equation Simple linear regression uses the equation for a line to
is inherently simple and elegant, and a unique so- model the relationship between two variables. If z is the
lution usually exists. Furthermore, nonlinear terms outcome variable and x is the predictor variable, then:
can be introduced into the linear framework as
z ¼ kx þ c ðequation 1Þ
needed, primarily to improve the fit to the data and
to satisfy the basic assumptions of the model. When a where k is a coefficient that represents the slope of the
data set cannot be properly described using a linear linear relationship between the variables x and z, and c
regression approach, a different and mathematically is a constant. The constant c is termed the ‘‘z intercept’’
because this is the value of z where x ¼ 0 and the
regression line crosses the z axis. Consider Figure 1,
From the Division of Emergency Medicine, Massachusetts General which contains a data set of 12 points and the associated
Hospital, Harvard Medical School, Boston, MA. linear regression line, z ¼ 2x þ 1. This example can be
Received July 24, 2001; revisions received July 9, 2002, and April 21, used to review the four necessary assumptions in
2003; accepted September 10, 2003.
performing simple linear regression and inference
Series editor: Roger J. Lewis, MD, PhD, Department of Emergency
Medicine, Harbor–UCLA Medical Center, Torrance, CA. testing:
Based on a didactic lecture, ‘‘Concepts in Multiple Linear Re-
gression Analysis,’’ given at the SAEM annual meeting, St. Louis, 1. There is some linear relationship between the
MO, May 2002. predictor and outcome variable. As the values of
Address for correspondence and reprints: Keith A. Marill, MD, the points increase along the x-axis, their values along
Massachusetts General Hospital, 55 Fruit Street, Clinics 115, Boston,
MA 02114. Fax: 617-724-0917; e-mail: [email protected]. the y-axis also increase. The cloud of points seems to
Part II follows on page 94. center around a straight line rather than a curve or
doi:10.1197/S1069-6563(03)00600-6 other shape.
88 Marill d SIMPLE LINEAR REGRESSION

TABLE 1. Data
X Y Z Log Z
Figure 2 4 4 4
8 8 8
Figure 3 20 4 2
30 4 1
30 8 6
Figure 4 4 31 2 0.3
4 39 8 0.9
8 28 3 0.48
8 42 32 1.51
Figure 5 0 0 1
0 0 5
0 1 13
0 1 19
1 0 1
1 0 3
1 1 24 Figure 1. The four assumptions of linear regression.
1 1 32
X Z Residual
DKA as a function of the intensity of intravenous (IV)
Figure 7 2 2.5 0.06
3 3 0.11 insulin therapy. There were two patients: one received
4 4 0.22 4 units per hour of IV insulin, and the other received 8
6 2 3.13 units per hour. Examining Figure 2, we can see that,
7 8 2.2 after four hours of therapy, the serum bicarbonate
8 8 1.53
increased by 4 mEq/L in the first patient and 8 mEq/
9 9 1.86
10 3 4.81 L in the second. The regression equation z ¼ 1x
10 10 2.19 perfectly describes the relationship in this small data
set. In this case, z ¼ 0 at the z intercept where x ¼ 0,
and thus the constant c in Equation 1 equals zero.
2. The variation around the regression line is Each unit per hour of insulin therapy seems to be
constant (homoscedasticity). Some points may be associated with a 1-mEq/L increase in the serum
farther from the regression line than others. Homo- bicarbonate level after four hours of therapy.
scedasticity means that as the eye moves laterally The relationship in the data is usually not perfectly
along the x-axis, the average variation of the points linear. In a separate observational study, the inves-
from the regression line stays about the same. tigator studied the improvement in acidosis after four
hours in three patients with DKA as a function of their
3. The variation of the data around the regression initial respiratory rate. The investigator hypothesized
line follows a normal distribution at all values of that patients who were able to sustain an elevated
the predictor variable. If one examines the data respiratory rate might be able to ‘‘hyperventilate their
points at any particular value of x, they will form way out’’ of DKA. Figure 3 depicts the data and the
a bell-shaped or normal curve around the value of the associated regression line. The slope is positive, which
regression line at that point. The majority of points
will be close to the regression line, and fewer points
will be farther away.

4. The deviation of each data point from the


regression line is independent of the deviation
of the other data points. The value of one point and
its relationship to the regression line has no relation-
ship or bearing on the value of another point in the
dataset.
If these four assumptions are met, then the model is
considered valid and optimal. Now the methodology
of simple linear regression is examined using some
specific simplified clinical research scenarios.

THE METHOD OF LEAST SQUARES


Consider a small, retrospective study in which the
investigator reviewed the resolution rate of acidosis in Figure 2. z ¼ 1x, R2 ¼ 1.0. IV ¼ intravenous.
ACAD EMERG MED d January 2004, Vol. 11, No. 1 d www.aemj.org 89

To do this in our example, it may be desirable to label


each patient enrolled with a unique number starting
with 1 for the first patient and ending with the highest
number for the last patient. If we use the letter p to
denote the patient number, then in this example, there
are three patients with values p ¼ 1, 2, 3. The Greek
symbol sigma, +, is used to denote the summation of
a mathematical value. Thus, if we wish to sum the
residual distance for all of the patients, we would
3
write +p¼1 res, where the term p ¼ 1 below the sigma
denotes the first patient in the series, and the number
3 above the sigma denotes the last patient whose
value is included in the summation. In this example,
3
+p¼1 res ¼ 0 þ 2.5 þ 2.5 ¼ 5, where each of the three
Figure 3. z ¼ 0.15x  1, R2 ¼ 0.11. numbers 0, 2.5, and 2.5 represents the residual dis-
tance from the regression line to the data point for
each of the three patients, respectively.
In the method of least squares, our interest is
suggests that patients who present with higher minimizing the sum of the residuals squared,
3 3
respiratory rates do seem to improve more rapidly. +p¼1ðresÞ2 . In this example, +p¼1ðresÞ2 ¼ 02 þ (2.5)2
In this case, the regression line does not touch all of þ (2.5)2 ¼ 12.5. This summation of squared terms
the data points, but rather is the ‘‘best fit’’ for the data. 3
often is abbreviated as +p¼1ðresÞ2 ¼ SSres. Similarly,
This suggests that whereas the initial respiratory rate the summation of the distances from the mean line
may ‘‘explain’’ part of the improvement in the to the regression line, and from the mean line to the
observed acidosis, there may be other clinical and data points can be squared. For this example, these
experimental factors involved. Perhaps the intensity values would be depicted and abbreviated as fol-
of insulin therapy also is associated with the im- 3
lows: +p¼1ðregÞ2 ¼ SSreg ¼ (1)2 þ (0.5)2 þ (0.5)2 ¼ 1.5
provement in acidosis as we saw in Figure 2, or 3
perhaps our measurements of the respiratory rate or and +p¼1ðtotÞ2 ¼ SStot ¼ 12 þ (2)2 þ (3)2 ¼ 14.
serum bicarbonate are imprecise and there is a large The next step is to find the correct coefficient k
amount of error. How is the best-fit regression line and constant c in the regression equation, which
determined, and how can we quantitate the relative produce a regression line that has the smallest
strength of the association between the predictor possible SSres. The problem is solved using two
variable and the outcome? differential equations,1,2 and two interesting findings
The method of least squares is most commonly occur as a direct result of this process. First, except
used to find the best fit in linear regression. In Figure for the unusual situation in which all of the data
3, there are three data points with outcome values 1, 2, have the same value of x, there is always a unique
and 6. The mean of the outcome values is 3, and solution or single line that is best. Second, the line
a broken straight line representing the mean outcome that is found to be best always satisfies the following
value for the three points has been drawn horizontally equation1,2:
for the sample. Each data point has a certain variation
from the mean outcome value. A regression line also SStot ¼ SSreg þ SSres ðequation 2Þ
has been drawn across the figure. The variation of
each data point from the mean outcome can be
Thus, the sum of all the squared distances from the
represented by the sum of the vertical distance from
mean line, SStot, can be apportioned between re-
the mean outcome line to the regression line (reg), and
gression and residual components. If Equation 2 is
the distance from the regression line to the data point.
divided by SStot, then we have:
This latter distance is referred to as the ‘‘residual’’
(res). The total distance (tot) of each point from the
mean outcome value thus can be apportioned be- SSreg SSres
1¼ þ ðequation 3Þ
tween the regression and the residual. The goal in SStot SStot
formulating the regression line is to maximize the
portion attributed to the regression and to minimize The ratio SSreg =SStot can be viewed as the portion of
the residual for all of the data points. To be mathema- the variation in the outcome variable that can be
tically precise, the sum of all of the residuals squared attributed to, or explained by, the regression model,
is minimized—thus the method of least squares. and the other portion, SSres =SStot , can be considered
In linear regression, it often is necessary to calculate the error or unexplained portion. Together, the two
the sum of a value for all of the points in the data set. portions add up to one.
90 Marill d SIMPLE LINEAR REGRESSION

It is also true that:

SSreg
¼ R2 ðequation 4Þ
SStot

where R equals the Pearson correlation coefficient. So


R2 can be used as one measure of the strength of the
linear relationship between the two variables and the
validity of the linear model. For example, in Figure 2,
R2 ¼ 1 because there is a perfect linear relationship,
and if there is no linear relationship, then R2 ¼ 0. In
Figure 3, R2 ¼ 0.11 which means the residual portion,
SSres =SStot , is: 1 – 0.11 ¼ 0.89. The variation in the
initial respiratory rate only seems to ‘‘explain’’ 11/
100, or 11% of the improvement in the serum
bicarbonate. This suggests that there are other
important predictors of acidosis resolution besides
the initial respiratory rate, or that there is a large
amount of error in the measurements.

VARIABLE TRANSFORMATIONS
In addition to studying the resolution of acidosis in
patients with DKA, the investigator also is interested
in the length of the patient stay in the intensive care
unit (ICU). Perhaps the required intensity of insulin
therapy also can predict the duration of subsequent
ICU admission. Four patients who required ICU
admission were retrospectively enrolled, and the
duration of their ICU stay is plotted as a function of
the initial intensity of insulin therapy in Figure 4A. Figure 4. (A) z ¼ 3.125x  7.5. (B) q ¼ 0.097x þ 0.213, where q ¼
There seems to be an association of the intensity of log10z R2 ¼ 0.18. ICU ¼ intensive care unit.
insulin therapy with the duration of the ICU ad-
mission, but there also is a problem.
Notice the pattern of the data points around the
regression lines in Figures 1 and 4A. As previously q ¼ log z. Thus, if the length of ICU stay is 32 days,
noted, in Figure 1, there is homoscedasticity because then z ¼ 32 and q ¼ log (32) ¼ 1.5. Next, we try a new
the spacing of the points around the regression line transformed regression equation:
stays about the same as the eye moves along the line.
In Figure 4A, the distance of the points from the log z ¼ q ¼ kx þ c
regression line increases to the right, suggesting
the shape of a funnel on its side. This violates the which is graphed in Figure 4B. We can see this
assumption of homoscedasticity. When the native relationship satisfies the necessary linear regression
data do not satisfy this condition, variable transforms assumptions because the variation of the data points
may be used to produce a regression equation that is around the regression line is fairly constant moving
satisfactory. along the line. The large spread in the data due to the
Although linear regression is, by definition, a pro- patient who stayed in the ICU for 32 days has been
cess of linear modeling, it is possible to introduce relatively compressed down. Thus, the regression
nonlinear terms to the linear mathematical framework equation that includes a logarithmic transform is
by transforming variables. The primary motivation to a better fit than the original linear equation. Q can be
perform variable transforms is to improve the re- modeled as a linear function of x, and the results
gression fit and to satisfy the necessary regression eventually can be transformed back to the variable z.
assumptions such as homoscedasticity. Logarithmic In a similar fashion, a linear regression equation
transforms are particularly useful in this regard also can be developed after transforming the in-
because they differentially compress the spread in dependent predictor variable. For example, let z ¼ c þ
the data at high and low values of the transformed kx as in simple linear regression, and now let x ¼ s2,
variable. In our example, we define a new variable q, so that x is a transform of the variable s. Although
where q is the log of the length of ICU admission, or the equation remains linear with respect to x, it is
ACAD EMERG MED d January 2004, Vol. 11, No. 1 d www.aemj.org 91

now a quadratic equation with respect to the underly-


ing predictor variable s: z ¼ c þ ks2. In summary,
transforms allow one to extend the use of the well-
developed mathematical framework in linear regres-
sion to model some data sets, which otherwise would
not satisfy the necessary assumptions. Extreme care
must be taken when performing and interpreting such
transformations because the results often are not
intuitive. In our example, we have now modeled the
log of the length of the ICU stay as a function of the
initial intensity of insulin therapy, but the log of the
length of the ICU stay is a number with no units of
measure and no clear clinical meaning.

Figure 5. z ¼ 5.5x þ 9.5, R2 ¼ 0.06.


DUMMY VARIABLES AND
INFERENCE TESTING
Prescott et al. demonstrated that salicylate clearance
can be enhanced in the overdose setting with both approach, however, we may ask how certain are we of
bicarbonate infusion and forced diuresis, and alka- the derived value of this slope? Based on this small
linization of the urine is an important factor in sample, can we conclude that potassium infusion
enhancing excretion.3 It also is known that alkaliniza- increases the clearance of salicylate in this animal
tion of the urine is difficult to achieve in the setting of species? The slope or coefficient in the regression has
hypokalemia, because the renal tubules will retain a value, and an associated uncertainty or standard
potassium in lieu of hydrogen ions. An investigator error (SE). The SE of the coefficient is the standard
postulated that a potassium infusion may be helpful to deviation of the possible values of the coefficient in
clear salicylate in acute aspirin overdose regardless of the theoretical population from which this particular
the initial serum potassium level. A small study of experimental sample of animals was drawn. The
salicylate overdose was designed with eight laboratory formula for the standard error of the coefficient is:
animals divided into four groups of two. Each group of
" #1=2
animals was assigned to receive an identical acute SSres
salicylate overdose, subsequent infusion of a standard- SE ðcoefficientÞ ¼ n 2
ðn  2Þ+1 ðx  xmean Þ
ized IV fluid regimen, and one of four treatments:
placebo, potassium, bicarbonate, or both agents. ðequation 5Þ
The investigator first analyzed the results with
respect to the infusion of potassium, and a scattergram where n is the total number of subjects in the sample,
of the eight data points with an associated linear x is the value of x for each of the subjects, and xmean is
regression line is depicted in Figure 5. Notice that on the mean value of x for all of the subjects. In this
the x-axis, the two values are 0 or 1, and these are example, n ¼ 8, the values of x are 0, 0, 0, 0, 1, 1, 1, 1,
associated with the absence or presence of a potassium and xmean ¼ 0.5. The SE of the coefficient is high if the
infusion, respectively. In this situation, the variable x residual values and SSres are high and there is a large
was used as a ‘‘dummy variable.’’ To use a dummy amount of variation in the data that is not accounted
variable in simple linear regression, the sample is for in the regression. Conversely, the SE is relatively
divided according to the presence or absence of low if n is high and there are a lot of subjects, and if
a particular characteristic. A value of x ¼ 1 is the values of x are widely spread out along the x-axis.
associated with the presence of the characteristic, A wider base of data along the x-axis increases the
and a value of x ¼ 0 is associated with its absence. certainty of the regression slope.
There are no data with values of x outside of 0 and 1. If the value of the slope is large and its SE is small,
The use of dummy variables often is simpler and then we can be confident that it is not only different
preferred for analysis and inference testing. In this from zero in this particular sample, but it is likely to
example, the investigator could have used the total be different from zero if we repeat the experiment
amount of potassium infused in milliequivalent units with another sample of animals. Thus, regression
for the x-axis, but the dummy variable was preferred. analysis can be used for inference testing. This
For other data that have no associated units of particular case represents the situation of a Student’s
measure, such as the answer to a ‘‘yes’’ or ‘‘no’’ t-test. The p-value represents the probability of
question, a dummy variable must be used. obtaining a slope whose absolute value is equal to
The linear regression line drawn in Figure 5 clearly or greater than the one actually obtained under the
has a slope different than zero. Taking a statistical null hypothesis that the true slope is zero. If the p
92 Marill d SIMPLE LINEAR REGRESSION

value is small, then the result obtained would be sizes of area A and B after adjusting for the total num-
unlikely to occur under the null hypothesis, and the ber of data points, n. If the regression portion is rel-
null hypothesis would be rejected. In this example, atively large compared with the residual portion, then
p ¼ 0.56, so the null hypothesis is not rejected. the null hypothesis that there is no relationship be-
Alternatively, the 95% confidence interval (95% CI) of tween the predictor variable and the outcome is
the slope or predictor coefficient can be calculated rejected. It is concluded that there is a linear relation-
using the formula: ship between the predictor and the outcome.

95% CI for the coefficient k ¼ ½k  ðtÞðSEÞ; k þ ðtÞðSEÞ


ðequation 6Þ LEVERAGE
When a regression model is constructed, it is critical to
where k and SE are the value and SE of the coefficient examine the data and fit of the model visually. In our
of interest, and t is the t value with p ¼ 0.05 and the first example, the investigator collected data on two
appropriate degrees of freedom. The 95% CI for the patients with DKA to assess the relationship between
potassium infusion coefficient is –15.8 to 26.8. Because the intensity of insulin therapy and the resolution of
the 95% CI spans zero, it is not concluded that the metabolic acidosis. Suppose the investigator decided to
potassium coefficient is significantly different from confirm the initial result with a larger sample of
zero. patients. Figure 7A shows a graph of a larger, identical
It may be instructive to visualize a schematic study of patients with DKA. The increase in serum
representation of simple linear regression as in Figure bicarbonate after four hours of IV insulin therapy
6. Let the central circle represent the total variation in is plotted as a function of the intensity of insulin thera-
the outcome variable, z, and the upper lateral circle py, and the associated linear regression line has been
represent the total variation in the predictor variable,
x. The area of overlap of the two circles, area A,
represents the portion of the outcome that is ‘‘ex-
plained’’ by the regression of the predictor variable.
Then, area B, the remaining area in the outcome circle
outside of the overlap, represents the variability in the
outcome due to all other possible factors that are not
explained by the predictor variable. Area B represents
the residuals in the regression. In this schematic, the
correlation coefficient, R2 ¼ A/A þ B, represents the
portion of the total outcome explained by the pre-
dictor variable. It is interesting to note that the SE of
the coefficient is a function of the residuals, or area B,
but not the regression, area A. No matter how large
area A is, and how much of the outcome is associated
with the predictor variable, the uncertainty in the
regression coefficient is primarily a function of the
portion of the outcome that the regression doesn’t
explain, the residual area B. When a t-test is per-
formed, a comparison is made between the relative

Figure 7. (A) Regression lines with (solid) and without (broken)


the leverage point (10,3). (B) Residuals versus independent
Figure 6. Simple linear regression schematic. predictor variable, intravenous (IV) insulin therapy.
ACAD EMERG MED d January 2004, Vol. 11, No. 1 d www.aemj.org 93

added. To focus on and to further assess the relation- the analysis. If the studentized residual and Cook’s
ship between the data points and the regression line, it distance are relatively large for a given point in a data
often is instructive to plot the residual differences set, then that point may be exerting undue leverage
between the data points and the regression line as a on the regression model.
function of the predictor variable, x. Figure 7B demons-
trates the ‘‘plot of the residuals’’ for the data originally
CONCLUSIONS
graphed in Figure 7A. In Figure 7B, the line y ¼ 0 now
represents the regression line, and each data point is Simple linear regression is a mathematical technique
represented by its residual distance from this line. used to create a linear model of the association
Note that in Figures 7A and 7B there are two outlier between a single predictor variable and an outcome
points, one in the middle of the data set, and one measurement. The method of least squares is com-
toward the edge. Both of these outliers can have monly used to determine the slope and constant of the
a disproportionate effect on the regression model regression equation, and four assumptions must be
when compared with the other data, but outliers on satisfied to produce a valid model. The data should
the edge may be more troublesome. The middle always be graphed and visually inspected to ensure
outlier will tend to raise or lower the entire regression that the assumptions are satisfied, and transforma-
line by some amount that will change the value of the tions of either the predictor or outcome variables can
constant c. It will not, however, greatly affect the slope be performed as necessary. The researcher should also
of the regression line. The lateral outlier, however, has check for and investigate any outlying data points,
tilted the line down toward its side. It is exerting particularly those that exert undue leverage. A valid
leverage on the regression and has a large effect in model can be used to perform univariate inference
decreasing the value of the slope, k. Note the increase testing such as Student’s t-test, and predictions about
in the slope of the regression line when this single future data can be made.
point is removed from the analysis in Figure 7A. The
investigator should recheck this outlying data point to References
be sure that it is not an error, and perhaps collect more 1. Glantz SA, Slinker BK. The first step: understanding simple
data in its area to see if it represents a trend. linear regression. In: Primer of Applied Regression and Analysis
Regression tools used to assess the amount of of Variance, 2nd ed. New York, NY: McGraw-Hill, 2001, pp 10–49.
leverage exerted by an individual data point include 2. Draper NR, Smith H. Fitting a straight line by least squares. In:
Applied Regression Analysis, 3rd ed. New York, NY: John Wiley
the studentized residual and Cook’s distance.4,5 The
& Sons, 1998, pp 15–46.
studentized residual of a particular point is the 3. Prescott LF, Balali-Mood M, Critchley JA, Johnstone AF,
original residual value that has been standardized Proudfoot AT. Diuresis or urinary alkalinisation for salicylate
and adjusted for the leverage exhibited by that point poisoning? Br Med J. 1982; 285:1383–6.
on the regression. A residual is standardized when it 4. Glantz SA, Slinker BK. Primer of Applied Regression and
is divided by the standard deviation of all the Analysis of Variance, 2nd edition. New York, NY:
McGraw-Hill, 2001, pp 133–44.
residuals in the dataset. Cook’s distance quantifies 5. Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied
the combined change in the slope and z intercept as Regression Analysis and Other Multivariable Methods, 3rd ed.
a whole when the point in question is removed from Pacific Grove, CA: Duxbury Press, 1998, pp 212–37.

You might also like