1.
Think of three (3) different experiments where you could use three (3)
different types of data transformation to demonstrate the importance of applying
data transformation in statistical analysis of the experiments. Your conclusion
must show a distinction between before and after transformation.
If a measurement variable does not fit a normal distribution or has greatly different standard
deviations in different groups, you should try a data transformation. Using a statistical test
such as an ANOVA or Linear Regression on such data may give a misleading result. In some
cases, transforming the data will make it fit the assumptions (Gomez and Gomez, 1984) better.
i.
Log Transformation
This consists of taking the log of each observation. You can use either base-10 logs (LOG10
in SAS) or base-e logs, also known as natural logs (LOG in SAS). It makes no difference for
a statistical test whether you use base-10 logs or natural logs, because they differ by a
constant factor; the base-10 log of a number is just 2.303 the natural log of the number.
You should specify which log you're using when you write up the results, as it will affect
things like the slope and intercept in a regression. Base-10 logs is preferred because it's
possible to look at them and see the magnitude of the original number: log(1)=0, log(10)=1,
log(100)=2, etc.
For example a field study was conducted to evaluate the effect of different rates of poultry
manure used on maize number of leaves after 70 days of planting. The experimental design
used was a randomized complete block design. Six treatments were evaluated with three
replicate each as indicated below;
T1= 1200 g poultry manure + 50% fertilization
T2= 1000 g poultry manure + 50% fertilization
T3= 800 g poultry manure + 50% fertilization
T4= 600 g poultry manure + 50% fertilization
T5= 400 g poultry manure + 50% fertilization
T6= 200 g poultry manure + 50% fertilization
Table 1: Raw data after field experiment showing
Treatement
Block 1
Block 2
Block 3
Standard
Mean
Means of X
(Plant-1)
(Plant-1)
(Plant-1)
deviation
T1
16
17
16
0.5773503
16.3333333
17.3333333
T2
15
16
16
0.5773503
15.6666667
16.6666667
T3
14
15
15
0.5773503
14.6666667
15.6666667
T4
14
13
14
0.5773503
13.6666667
14.6666667
T5
13
13
12
0.5773503
12.6666667
13.6666667
T6
12
13
10
1.5275252
11.6666667
12.6666667
Table 2: SAS input before and after transformation.
Input before transformation
data log before;
input trt $ blk leaf;
datalines;
T1 1 16
T1 2 17
T1 3 16
T2 1 15
T2 2 16
T2 3 16
T3 1 14
T3 2 15
T3 3 15
T4 1 14
T4 2 13
T4 3 14
T5 1 13
T5 2 13
T5 3 12
T6 1 12
T6 2 13
T6 3 10
;
proc anova;
class trt blk;
model leaf = trt blk;
means trt blk/lsd;
run;
Input for transformation
data log after;
input trt $ blk leaf;
x=leaf +1;
y=log (x);
datalines;
T1 1 16
T1 2 17
T1 3 16
T2 1 15
T2 2 16
T2 3 16
T3 1 14
T3 2 15
T3 3 15
T4 1 14
T4 2 13
T4 3 14
T5 1 13
T5 2 13
T5 3 12
T6 1 12
T6 2 13
T6 3 10
;
proc print;
proc anova;
class trt blk;
model y=trt blk;
means trt/duncan;
run;
The ANOVA Procedure
Table 3: Anova Table before transformation
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
trt
47.77777778
9.55555556
14.58
0.0003
blk
1.44444444
0.72222222
1.10
0.3695
Error
10
6.55555556
0.65555556
Corrected Total
17
55.77777778
CV= 8.778059
R-Square
Coeff Var
Root MSE
leaf Mean
0.882470
5.737775
0.809664
14.11111
Means with the same letter
are not significantly different.
t Grouping
Mean
trt
16.3333
T1
15.6667
T2
14.6667
T3
13.6667
T4
12.6667
T5
11.6667
T6
A
A
B
B
B
C
D
D
D
E
E
From the results of the Anova Table 1, there is a significant different between treatment at =
0.05, P 0.0003 and there is no block effect since P 0.3695 is not significantly different at =
0.05 hence this experiment is RCBD. The R-Square value is 88.25%
Table 3: Transforming the data:
Obs trt
blk
leaf
1 T1
16
17
2.83321
2 T1
17
18
2.89037
3 T1
16
17
2.83321
4 T2
15
16
2.77259
5 T2
16
17
2.83321
6 T2
16
17
2.83321
7 T3
14
15
2.70805
8 T3
15
16
2.77259
9 T3
15
16
2.77259
10 T4
14
15
2.70805
11 T4
13
14
2.63906
12 T4
14
15
2.70805
13 T5
13
14
2.63906
14 T5
13
14
2.63906
15 T5
12
13
2.56495
16 T6
12
13
2.56495
17 T6
13
14
2.63906
18 T6
10
11
2.39790
The ANOVA Procedure
Table 5: Anova Table after transformation
Dependent Variable: y
Source
DF
Sum of Squares
Mean Square
F Value
Pr > F
trt
0.21983173
0.04396635
11.90
0.0006
blk
0.00781452
0.00390726
1.06
0.3831
Error
10
0.03694465
0.00369446
Corrected Total
17
0.26459090
CV = 2.244301
R-Square
Coeff Var
Root MSE
y Mean
0.860371
2.244301
0.060782
2.708287
Means with the same letter
are not significantly different.
Duncan Grouping
Mean
N trt
2.85227
3 T1
2.81301
3 T2
2.75108
3 T3
2.68505
3 T4
2.61435
3 T5
2.53397
3 T6
A
A
A
A
B
B
B
C
D
D
D
This is the SAS output after transformation, the values under X are values after adding one
since three of the values were below 10 and the values under Y are the transformed values of
X (Table 3) and there is a there is a significant different between treatment at = 0.05, P
0.0006. The P value here is higher than that of before (P 0.0006).There is no block effect
since P 0.3831 is not significantly different at = 0.05 hence this experiment is still RCBD
and the P value here is higher than that of before (P 0.3695). The R-Square value is 86.04%
which is also lower than that of before transformation (88.25%)
Table 6: Comparing the means before and after transformation
Treatment
T1
T2
T3
T4
T5
T6
Mean before transformation
16.3333 a
15.6667 ab
14.6667 bc
13.6667 cd
12.6667 de
11.6667 e
Mean after transformation
17.3271 a
16.6599 a
15.6595 ab
14.6589 bc
13.6029 cd
12.6034 d
Note: Means with the same letter are not significantly different
Means before transformation show that T1 is statistically higher than that of T3 however,
after transformation the statistical values show that T1 is not statistically different from T3
(Table 6).
Conclusion
The P values of the data before transformation are higher than that of after transformation
however, if the data was not transformed, we would have concluded that T1 is statistically
different from T3 and commit type 1 error. Transformed data have lower R-Square value
but it is more realistic and precise.