0% found this document useful (0 votes)
58 views25 pages

Correlation: A Bit About Pearson's R

1) The maximum value of the Pearson correlation coefficient r equals 1 because this indicates a perfect positive linear relationship between the two variables. 2) A positive correlation means as one variable increases, so does the other, while a negative correlation means as one increases, the other decreases. 3) The Fisher r to z transformation converts the sampling distribution of r from skewed to normal, allowing hypothesis tests to be conducted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views25 pages

Correlation: A Bit About Pearson's R

1) The maximum value of the Pearson correlation coefficient r equals 1 because this indicates a perfect positive linear relationship between the two variables. 2) A positive correlation means as one variable increases, so does the other, while a negative correlation means as one increases, the other decreases. 3) The Fisher r to z transformation converts the sampling distribution of r from skewed to normal, allowing hypothesis tests to be conducted.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Correlation

A bit about Pearson’s r


Questions
• Why does the • Give an example in
maximum value of r which data properly
equal 1.0? analyzed by ANOVA
• What does it mean cannot be used to infer
when a correlation is causality.
positive? Negative?
• Why do we care about
• What is the purpose of
the Fisher r to z the sampling
transformation? distribution of the
• What is range correlation coefficient?
restriction? Range • What is the effect of
enhancement? What do reliability on r?
they do to r?
Basic Ideas
• Nominal vs. continuous IV
• Degree (direction) & closeness
(magnitude) of linear relations
– Sign (+ or -) for direction
– Absolute value for magnitude
• Pearson product-moment correlation
coefficient

r
 z z
X Y

N
Illustrations
Plot of Weight by Height Plot of Errors by Study Time
210
30

180

20
Weight

Errors
150

120 10

90
60 63 66 69 72 75 0
Height 0 100 200 300 400
Study Time
Plot of SAT-V by Toe Size
700

600 Positive, negative, zero


SAT-V

500

400
1.5 1.6 1.7 1.8 1.9
Toe Size
Simple Formulas
r
 xy Use either N throughout or else
NS X SY use N-1 throughout (SD and
x  X  X and y  Y  Y denominator); result is the
same as long as you are
SX 
(X  X ) 2
consistent.
N

Cov( X , Y ) 
 xy
N
Pearson’s r is the average
r
 zz x y z
XX cross product of z scores.
N SX Product of (standardized)
moments from the means.
Graphic Representation
Plot of Weight by Height Plot of Weight by Height in Z-scores
210 2

180 1 - +
M e a n = 1 5 0 .7 lb s .

Z-weight
Weight

150 0

120 -1 + -
M e a n = 6 6 .8 In c h es

90 -2
60 63 66 69 72 75 -2 -1 0 1 2
Height Z-height

1. Conversion from raw to z.


2. Points & quadrants. Positive & negative products.
3. Correlation is average of cross products. Sign &
magnitude of r depend on where the points fall.
4. Product at maximum (average =1) when points on line
where zX=zY.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Ht 10 60.00 78.00 69.0000 6.05530
Wt 10 110.00 200.00 155.0000 30.27650
Valid N (listwise) 10

r = 1.0
r=1

Leave X, add error to Y.

r=.99
r=.99

Add more error.


r=.91
With 2 variables, the correlation is the z-score slope.
Review
• Why does the maximum value of r
equal 1.0?
• What does it mean when a correlation is
positive? Negative?
Sampling Distribution of r
Statistic is r, parameter is ρ (rho). In general, r is slightly
biased. Sampling Distributions of r
0 .0 8

Relative Frequ ency


0 .0 6

rho=-.5 rho=0 rho=.5


0 .0 4

0 .0 2

0 .0 0
-1 .2 -0 .8 -0 .4 0 .0 0 .4 0 .8 1 .2

(1   2 2
) Obs er v e d r
The sampling variance is approximately:  2r 
N
Sampling variance depends both on N and on ρ.
Empirical Sampling Distributions of the Correlation Coefficient

  .5; N  100   .7; N  100


  .5; N  50   .7; N  50
0.9 + 0
| 0 |
| 0 |
| 0 0 |
0.8 + 0 | |
| | | |
| | | +-----+
| 0 | +-----+ | |
0.7 + 0 | *--+--* *--+--*
| | | +-----+ | |
| | | | +-----+
| | | | |
0.6 + | | | |
| | +-----+ 0 |
| +-----+ | | 0 |
| | | | | 0 |
0.5 + *--+--* *--+--* 0 0
| | | | | 0 0
| +-----+ | | * 0
| | +-----+ 0
0.4 + | | 0
| | | * 0
| | | *
| | |
0.3 + 0 |
| 0 | *
| 0 |
| 0 0
0.2 + 0 0
| 0 0
| 0 0
| 0
0.1 + 0
| 0
| 0
| 0
0 + *
| *
| *
|
-0.1 +
------------+-----------+-----------+-----------+-----------
param .5_N100 .5_N50 .7_N100 .7_N50
Fisher’s r to z Transformation
Fisher r to z Transformation
1.5
r z
.10 .10 1.3
 (1  r ) 
.20 .20 z  .5 ln  
 (1  r ) 
1.1

z (output)
.30 .31 0.9
.40 .42
0.6
.50 .55
.60 .69 0.4

.70 .87 0.2

.80 1.10
0.0
.90 1.47 0.0 0.2 0.4 0.6 0.8 1.0

r (sample value input)

Sampling distribution of z is normal as N increases.


Pulls out short tail to make better (normal) distribution.
Sampling variance of z = (1/(n-3)) does not depend on ρ.
Hypothesis test: H0 :   0

r
t  N 2 Result is compared to t with (N-
1 r2 2) df for significance.

Say r=.25, N=100


.25 .25
t  98  9.899  2.56 p< .05
1  .25 2 .986

t(.05, 98) = 1.984.


Hypothesis test 2: H 0 :   value
1 r 1 
.5 log e .5 log e One sample z test where r is
1 r 1 
z sample value and ρ is
1/ N  3
hypothesized population value.

Say N=200, r = .54, and ρ is .30.


1.54 1.30
.5 log e .5 log e .60.31
z 1.54 1.30 z =4.13
1 / 200  3 .07

Compare to unit normal, e.g., 4.13 > 1.96 so it is


significant. Our sample was not drawn from a
population in which rho is .30.
Hypothesis test 3: H 0 : 1   2
Testing equality of correlations from 2 INDEPENDENT
samples. 1 r 1 r
.5 log e 1
.5 log e 2
1  r1 1  r2
z
1 / ( N1  3)  1 / ( N 2  3)

Say N1=150, r1=.63, N2=175, r2=70.


1.63 1.70
.5 log e .5 log e .74 .87
z 1 .63 1.70 z = -1.18, n.s.
1 / (150  3)  1 / (175  3) .11
Hypothesis test 4:H 0 : 1   2  ...   k
Testing equality of any number of independent correlations.
k

 (n  3) z
i i Q   (ni  3)( zi  z ) 2
z i 1

 (n  3) i
Compare Q to chi-square with k-1 df.

Study r n z (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2

1 .2 200 .2 39.94 .41 .0441 8.69


2 .5 150 .55 80.75 .41 .0196 2.88
3 .6 75 .69 49.91 .41 .0784 5.64
sum 425 170.6 17.21=Q
Chi-square at .05 with 2 df = 5.99. Not all rho are equal.
Hypothesis test 5: dependent r
H 0 : 12  13 Hotelling-Williams test

( N  1)(1  r23 )
t ( N 3)  (r12  r13 )
2( N  1) /( N  3) | R | r 2 (1  r23 )3
Say N=101, r12=.4, r13=.6, r23=.3
r  (r12  r13 ) / 2 r  (.4  .6) / 2  .5
| R | 1  r122  r132  r232  2(r12 )( r13 )( r23 )

| R | 1  .42  .62  .32  2(.4)(.6)(.3)  .534


(100)(1  .3)
t ( N 3)  (.4  .6)  2.1
2(100) /(98).534  .5 (1  .3)
2 3

t(.05, 98) = 1.98


H 0 : 12  34 See my notes.
Review
• What is the purpose of the Fisher r to z
transformation?
• Test the hypothesis that   
1 2
– Given that r1 = .50, N1 = 103
– r2 = .60, N2 = 128 and the samples are
independent.
• Why do we care about the sampling
distribution of the correlation
coefficient?
Range Restriction/Enhancement
Reliability
Reliability sets the ceiling for validity. Measurement error
attenuates correlations.
 XY  T X TY
 XX ' YY '
If correlation between true scores is .7 and reliability of
X and Y are both .8, observed correlation is 7.sqrt(.8*.8)
= .7*.8 = .56.
Disattenuated correlation

T X TY
  XY /  XX ' YY '
If our observed correlation is .56 and the reliabilities
of both X and Y are .8, our estimate of the correlation
between true scores is .56/.8 = .70.
Review
• What is range restriction? Range
enhancement? What do they do to r?
• What is the effect of reliability on r?
SAS Power Estimation
proc power; proc power;
onecorr dist=fisherz onecorr
corr = 0.35 corr = 0.35
nullcorr = 0.2 nullcorr = 0
sides = 1 sides = 2
ntotal = 100 ntotal = .
power = .; power = .8;
run; run;

Computed N Total
Computed Power Alpha = .05
Actual alpha = .05 Actual Power = .801
Power = .486 Ntotal = 61
Power for Correlations
Rho N required against
Null: rho = 0
.10 782
.15 346
.20 193
.25 123
.30 84
.35 61

Sample sizes required for powerful conventional


significance tests for typical values of the correlation
coefficient in psychology. Power = .8, two tails,
alpha is .05.

You might also like