0% found this document useful (0 votes)

19 views10 pages

Lecture06 Prel

The document discusses correlation coefficients, including Pearson's, Spearman's, and Kendall's, to measure associations between variables. It explains how to compute these coefficients and their interpretations in terms of linear and monotonic relationships. Additionally, it introduces time-series plots and emphasizes the importance of understanding parameters, variables, and populations in statistical analysis.

Uploaded by

ChanChingyan Yanice

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views10 pages

Lecture06 Prel

Uploaded by

ChanChingyan Yanice

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

In few words, the correlation measures how well the association between two variables can

be described by a straight line. Let us consider the variable x and the four variables y in Table
22. Figure 31 shows the corresponding scatter plots.
• The association between x and y1 (top–left panel) is clearly linear and very strong, in fact,
we have a perfect linear association. This fact is adequately described by the correlation
coefficient, we get rxy1 = −1. Note that the sign indicates a negative association.

• There is a perfect association between x and y2 (top–right panel), in fact, we have that
y2 = exp(x). However, the association is not linear and the correlation coefficient equals
rxy2 = 0.6914 which, in loose words, can be interpreted as “the association between x
and y2 can be adequately described by a straight line”.

• There is a perfect association between x and y3 (bottom–left panel), in fact, we have

that y3 = (x − 6)2 . However, the association is not linear and the correlation coefficient
equals rxy3 = 0, which can be interpreted as “the association between x and y3 cannot
be described by a straight line at all”.

• The points in the bottom–right panel show no clear association of any type, in particular,
they show no linear association. This fact is reflected by the correlation coefficient which
is rxy4 = −0.2545, indicating that a straight line will be very poor at describing the
association between both variables.

x y1 y2 y3 y4
1 11 3 25 9
2 10 7 16 7
3 9 20 9 6
4 8 55 4 1
5 7 148 1 10
6 6 403 0 5
7 5 1097 1 3
8 4 2981 4 11
9 3 8103 9 8
10 2 22026 16 4
11 1 59874 25 2
Table 22: Different types of association

In R, the function cor() can be used for computing the correlation coefficient.

Correlation matrix Let us say that we have J variables, x1 , x2 , · · · , xJ . It is possible to

calculate J(J − 1)/2 correlations between pairs of variables and present them in a matrix that
is called the correlation matrix:
 
1 rx1 x2 ,U · · · rx1 xJ ,U
 rx x ,U
 2 1 1 · · · rx2 xJ ,U 

 .. .. .. .. 
 . . . . 
rxJ x1 ,U rxJ x2 ,U · · · 1

The ones in the diagonal are obtained by taking into account that rxj xj ,U = 1. Alternatively,
taking into account that the correlation is symmetric, i.e. that rxj xj0 ,U = rxj0 xj ,U , we can simply

41
Figure 31: Scatter plots of the variables in Table 22.

write the correlation matrix as

 
1 — ··· —
 rx x ,U
 2 1 1 · · · — 
 .. .. .. .. 
 . . . .
rxJ x1 ,U rxJ x2 ,U ··· 1

5.2 Spearman’s correlation coefficient

In the previous subsection we discussed Pearson’s correlation coefficient, a measure of the
linear association between two variables. In this subsection we will discuss another measure
that allows to measure a more general class of associations, namely, monotonic associations.
In other words, this measure allows to determine if the association between the two variables
can be adequately described by a monotonic function.
First, let us clarify what a monotonic function is. Let us take a look at the functions
plotted in Figure 32:

• The top–left panel shows an increasing function: as x increases, y also increases;

• The top–central panel shows a decreasing function: as x increases, y decreases;

• The top–right panel shows a constant function: as x increases, y does not change;

• The bottom–left panel shows a non–decreasing function: a function that is partly in-
creasing and partly constant;

• The bottom–central panel shows a non–increasing function: a function that is partly

decreasing and partly constant.

Any of the five type of functions described above is called a monotonic function. In loose
words a monotonic function is a function that is not increasing and decreasing. For instance,
the bottom–right panel shows a function that is decreasing first and then becomes increasing,
this is not a monotonic function.

42
Figure 32: Six functions.

Before defining Spearman’s correlation coefficient, we need to define the ranks of the ob-
servations of a variable. The rank of the observation xi in the population U , denoted by R(xi )
or simply Ri , is the position occupied by the observation when the values are sorted from
smallest to largest. This is one of the situations when things are simpler than they sound as
illustrated by the following example.
Example 45. Consider the population of ten students. The ranks of the number of points in
the assignment (x) and the exam (y) is shown in Table 23.

i 1 2 3 4 5 6 7 8 9 10
xi 0 9 2 24 25 23 4 20 28 5
R(xi ) 1 5 2 8 9 7 3 6 10 4
yi 8 15 5 36 40 30 9 21 32 27
R(yi ) 2 4 1 9 10 7 3 5 8 6
Table 23: Ranks of the points of ten students in an exam in Statistics

Spearman’s correlation coefficient is simply the correlation coefficient (11) calculated over
the ranks of x and y, instead of the actual values. More formally:
Definition 46. Let xi and yi be the values of two variables associated to the ith element
in U (i = 1, 2, · · · , N ). Let also R(xi ) and R(yi ) be their corresponding ranks. Spearman’s
correlation coefficient between x and y is defined as
P
s U (R(xi ) − R̄U )(R(yi ) − R̄U )
rxy,U ≡ P (12)
2 1/2
P
2
U (R(xi ) − R̄U ) U (R(yi ) − R̄U )

where R̄U = (N + 1)/2.

Let us illustrate the steps needed to calculate Spearman’s coefficient.

43
Example 47. Let U be the population of N = 10 students taking a Master course in statistics.
Let xi and yi be, respectively, the scores in a home assignment and the final exam of the ith
student (i = 1, 2, · · · , N ). The ranks of x and y were shown in Table 23. Using them let us
find Spearman’s correlation coefficient. We have R̄U = (N + 1)/2 = (10 + 1)/2 = 5.5.
Let us calculate the numerator first:
X
(R(xi )− R̄U )(R(yi )− R̄U ) = (1−5.5)(2−5.5)+(5−5.5)(4−5.5)+· · ·+(4−5.5)(6−5.5) =
U
15.75 + 0.75 + · · · + −0.75 = 75.5.

Now we calculate the first term in the denominator

X
(R(xi ) − R̄U )2 = (1 − 5.5)2 + (5 − 5.5)2 + · · · + (4 − 5.5)2 = 20.25 + 0.25 + · · · + 2.25 = 82.5.
U

And the second term in the denominator is

X
(yi − ȳU )2 = (2 − 5.5)2 + (4 − 5.5)2 + · · · + (6 − 5.5)2 = 12.25 + 2.25 + · · · + 0.25 = 82.5.
U

This gives
P
s U (R(xi ) − R̄U )(R(yi ) − R̄U ) 75.5
rxy,U = P 1/2 = 1/2
= 0.9152,
(82.5 · 82.5)
2
P 2
U (R(x i ) − R̄U ) U (R(y i ) − R̄ U )

which means that there is a high monotonic association between the number of points that
students get in the assignment and the exam.
We have insisted in that Spearman’s correlation allows to measure monotonic associations
between two variables. Let us illustrate this with variables in Table 22 and Figure 31:
• Before we found that for the perfect linear association shown in the top–left panel,
Pearson’s correlation coefficient is equal to -1. This is the same value obtained by
Spearman’s correlation coefficient.

• We found that Pearson’s correlation coefficient for the increasing association shown in
the top–right panel is 0.6914. Spearman’s correlation coefficient is equal to 1. Indicating
that there is a perfect monotonic association between x and y: higher values of x are
associated to higher values of y.

• Spearman’s correlation for the variables shown in the bottom–left panel equals 0, the
same value as Pearson’s correlation. This indicates that there is no monotonic association
between x and y.

• Due to the the way the numbers were generated, Pearson’s and Spearman’s coefficient
correlation also coincide for the variables illustrated in the bottom–right panel: -0.2545.
Indicating a small monotonic association between the two variables.

5.3 Kendall’s correlation coefficient

Another parameter that allows for measuring the monotonic association between two variables
is Kendall’s correlation coefficient. In words, it works as follows:
It compares all pairs of observations and identifies them as either concordant or discordant.
A pair (xi , yi ) and (xj , yj ) is said to be concordant if the straight line that connects both points

44
has a positive slope. A pair (xi , yi ) and (xj , yj ) is said to be discordant if the straight line
that connects both points has a negative slope. This is illustrated in Figure 33 for a few pairs.
The three green segments represent three pairs that are concordant. The three red segments
represent three pairs that are discordant.

Figure 33: Three concordant (green) and three discordant (red) pairs in the scores of ten
students.

Once we have identified concordant and discordant pairs, we take the total number of
concordant pairs minus the total number of discordant pairs and divide this by the total
number of pairs. More formally:

Definition 48. Let xi and yi be the values of two variables associated to the ith element in
U (i = 1, 2, · · · , N ). Kendall’s correlation coefficient between x and y is defined as

k 2 X
rxy,U ≡ sgn(xj − xi )sgn(yj − yi ) (13)
n(n − 1) i<j

where 
1
 if x > 0
sgn(x) = 0 if x = 0

−1 if x < 0


Using R, we find that Kendall’s correlation coefficient of the number of points of the
students in the home assignment and the exam is 0.7778.

5.4 Time-series plots

We close this section by describing a time-series plot, but before doing so, we need to define
what a time-series is. We have already seen a time-series plot in Section 4.
Consider a set of N measurements of the type (t , yt ) where t indicates the time at which yt
occurs. As the variable y is observed over time, the order of the observations is important. As
an example, consider the number of new cases of Covid reported everyday during a period of
interest. y1 is the number of cases reported on the first day, y2 is the number of cases reported
on the second day, and so on.
A time-series plot can be seen as a scatter plot of the N pairs (t , yt ). In this case, however,
the dots are joined by a line.

45
Example 49. Figure 34 shows the time-series of the number of new cases of Covid-19 reported
in Sweden everyday from January 1, 2021 to December 31, 2021.

Figure 34: Number of new cases of Covid-19 reported in Sweden during 2021. Source: our-
[Link]

Up to this point we have introduced a set of parameters which are useful for describing
different characteristics of one or two variables measured on the individuals of one population
of interest. Note that we have highlighted in italics three words that are crucial to understand
the theory that we have covered: parameters, variables and population. We measure or observe
variables on the individuals of one population and these observations are used to compute the
parameters of interest. The notation we are using takes these three concepts into account:
we assign different symbols to the different parameters that have been defined, for instance,
we use a bar ( ¯ ) to denote the mean, a breve ( ˘ ) to denote the median, an r to denote
the correlation, and so on. Furthermore, we explicitly indicate which variable or variables are
involved in the calculation of the parameter and what population or set of individuals we are
talking about. For instance, x̄U means that we are calculating the mean of the variable x for
the individuals of the set U , rxy,U means that we are calculating the correlation between the
variables x and y for the individuals of the population U . Mastering this notation is a very
important and useful step for understanding the remaining part of the course.

6 Simple linear regression: the descriptive approach

Consider the situation where we have measurements of K + 1 variables x1 , x2 , · · · , xK and
y on a set s of n elements, i.e. the information available for the ith element is of the type
(x1,i , x2,i , · · · , xK,i , yi ) for all i = 1, 2, · · · , n. This information is typically stored in a rectan-
gular array as shown in Table 24.

x1 x2 ··· xK y
x11 x21 ··· xJ1 y1
x12 x22 ··· xJ2 y2
.. .. ... .. ..
. . . .
x1n x2n ··· xKn yn

Table 24: Array collecting the information in a regression problem

With this information we want to describe y as a function of the x-variables, i.e. we want
to express y as y = f (x1 , x2 , · · · , xK ). To begin with, in this section we will consider the case

46
with only one x-variable. Later in Section 7 we will return to the more general case with K
variables x.
Consider the situation where we have measurements of two variables x and y on a set s of n
elements, i.e. the information available is of the type (x1 , y1 ), (x2 , y2 ), · · · , (xn , yn ). With this
information we want to describe y as a function of x, i.e. we want to express y as y = f (x). To
begin with we will consider the case where f (x) = b0 + b1 x . Therefore our task is to express
y as a linear function of x as best as possible. Needless to say, if we want to express y as a
linear function of x it is because we have reasons to believe that there is a linear relationship
between the two variables.
We illustrate the idea with an example.

Example 50. Let us consider a set of n = 10 companies, s, producing tables, let xi = number
of workers in the ith company and yi = number of tables produced during one particular day
by the ith company. Table 25 shows the values of x and y and Figure 35 shows a scatter plot
of both variables. The scatter plot reveals some association between both variables, a higher
number of workers produce a higher number of tables. Furthermore, this association can be
adequately described by a straight line.

i xi yi
1 12 20
2 14 21
3 15 27
4 18 30
5 19 32
6 24 50
7 26 54
8 27 57
9 28 61
10 30 60
Table 25: Number of tables y produced by x workers

Figure 35: Scatter plot of number of workers x and number of tables y.

We have said that we want to express y as a function of x. In the previous example we

want to express the number of tables as a function of the number of workers. It makes sense
to believe that the number of tables depends on the number of workers, not the other way
around. For this reason, we call y the dependent variable and x the independent variable. In

47
some contexts x and y are given different names, for instance, y is also called the output or
the response; and x is also called the input or the explanatory variable.
Once the problem has been identified, i.e. once we have decided that we have a pair of
variables x and y one of which we would like to express as a linear function of the other one
as best as possible, the next natural step is to fit this line. However, there are infinitely many
lines, how do we choose one that is appropriate for our purposes? In other words, how do
we choose the best possible straight line that relates x and y? Some lines are, evidently, not
adequate. For instance, in the top-left panel of Figure 36 we have fitted the line ŷ1 = 80 − 2 x
to the workers dataset. This line clearly does not describe in an adequate way the relation
between both variables. In the top-right panel, we have fitted the line ŷ2 = 20 + 0.25 x, which
does not look too bad for some values but looks quite bad for some others. On the other hand,
in the bottom-left panel, we have fitted the line ŷ3 = −20 + 3 x and in the bottom-right panel,
we have fitted the line ŷ4 = −13 + 2.5 x. Both these lines seem to adequately describe the
relation between x and y. Which one should we prefer? Can we find any other line that can
be considered to be better?

Figure 36: Four lines fitted to the workers dataset.

First, let us formalize what do we mean by a line ŷ adequately describing y as a function

of x. Intuitively, we consider a line ŷ to be adequate if the distances from it to the points
are small. The lines in the top panels of Figure 36 have large distances to the points, that is
why we immediately consider them to be inadequate. But the lines in the bottom panels have
small distances to the points. Which one should we prefer?
In order to answer this question we need to define a criterion for measuring the distance
from a line ŷ = b0 + b1 x to the observed points. We will consider the sum of squares error
—SSE—:
X
SSE = e2i where ei = yi − ŷi and ŷi = b0 + b1 xi .
s

The reason for the name sum of squares error can be interpreted as follows. The value ŷi is

48
the approximation to yi made by the straight line. Therefore the difference yi − ŷi can be
interpreted as the error ei made by the straight line when approximating the ith observation.
Our criterion is simply the sum of the squares of these errors.

Example 51. Let us calculate the sum of squares error —SSE— for each of the four lines ŷ1 ,
ŷ2 , ŷ3 and ŷ4 fitted to the workers dataset in Figure 36.
For ŷ1 we have the intercept b0 = 80 and the slope b1 = −2, which yields fitted values ŷ1
and errors ei = yi − ŷ1,i as shown in the fourth and fifth columns of Table 26, respectively.
Therefore we have
X
SSE1 = e2i = (−36)2 + (−31)2 + · · · + 402 = 8012.
s

i xi yi ŷ1,i ei
1 12 20 56 -36
2 14 21 52 -31
3 15 27 50 -23
4 18 30 44 -14
5 19 32 42 -10
6 24 50 32 18
7 26 54 28 26
8 27 57 26 31
9 28 61 24 37
10 30 60 20 40
Table 26: Number of tables y produced by x workers

The sum of squares error for the remaining lines ŷ2 , ŷ3 and ŷ4 are found in a analogous
way, we get SSE2 = 10046, SSE3 = 207 and SSE4 = 66. Therefore, according to the SSE
criterion, among these four lines, ŷ4 = −13 + 2.5 x is the one that better fits the observations.

In Example 51 we used SSE for deciding which line, among four different options, fits the
observations best. The natural next step would be to find the best line according to the SSE
criterion, in other words, finding the values for the intercept b0 and the slope b1 that minimize
the SSE. The solution is known as the least squares regression.

Definition 52. Let (x1 , y1 ), (x2 , y2 ), · · · , (xn , yn ) be the observations of two variables x and
y on a set s of n elements. The line that minimizes the SSE is given by the least squares
regression,
Sxy,s
ŷ = b0 + b1 x with b1 = 2 and b0 = ȳs − b1 x̄s (14)
Sx,s
where
1 X 2 1 X
Sxy,s = (xi − x̄s )(yi − ȳs ) and Sx,s = (xi − x̄s )2 .
n−1 s n−1 s

Remark: The slope b1 in (14) can be written alternatively as

P P
Sy,s s (x i − x̄s )(yi − ȳs ) s xi yi − nx̄s ȳs
b1 = rxy,s or b1 = P 2
or b1 = P 2 2
,
Sx,s s (xi − x̄s ) s xi − nx̄s

49
where rxy,s is the correlation coefficient between x and y in s, Sx,s and Sy,s are the standard
deviations of x and y, respectively.
The functions =intercept() and =slope() can be used in Excel for obtaining the intercept
b0 and the slope b1 of a least squares regression.
The intercept b0 and, in particular, the slope b1 in a least squares regression have interesting
interpretations. The slope b1 indicates that, on average, a unit change in the independent
variable x is associated to a change of b1 in the dependent variable y. It is worth to emphasize
the need for the expression “on average” in the previous sentence: x and y do not follow exactly
a straight line, therefore the change is not deterministic, that is why we need to emphasize
that the slope measures the “average change”. The intercept, on the other hand, indicates the
value of the dependent variable y that is expected when the independent variable x is equal
to zero.

Example 53. Let us find the least squares regression for the workers dataset from Examples
2
50 and 51. We have x̄s = 21.3, ȳs = 41.2, Sx,s = 42.01 and Sxy,s = 106.9. Therefore

Sxy,s 106.9
b1 = 2
= = 2.545 and b0 = ȳs − b1 x̄s = 41.2 − 2.545 · 21.3 = −13.02.
Sx,s 42.01

The fitted regression line is

ŷ = −13.02 + 2.545 x,
which is shown in Figure 37.

Figure 37: Least squares regression fitted to the workers dataset.

It can be verified that the sum of squares error for the least squares regression is SSE = 56,
which is, in fact, smaller than the SSE of any of the four lines fitted in Example 51.
The intercept b0 is interpreted as: a company with no workers is expected to produce
around -13 tables. Admittedly, in this case, this interpretation makes no sense, it is simply
not possible to produce a negative number of tables. This undesired result is due to the fact
that the value x = 0 is not one of the observed values and it is, in fact, quite far from any
of the observed values. One must be careful when extrapolating the results of a regression
to observations that do not belong to the dataset, especially if they are quite different to the
observed values.
The slope b1 is interpreted as: in our set of 10 companies, on average, one extra worker is
associated to 2.5 more tables produced.

Slides - Topic 6 - Spearman Rank Correlation Coefficient
No ratings yet
Slides - Topic 6 - Spearman Rank Correlation Coefficient
32 pages
Understanding Spearman's Correlation
No ratings yet
Understanding Spearman's Correlation
8 pages
Correlational Analysis Pearson R and Spearman's Rank
No ratings yet
Correlational Analysis Pearson R and Spearman's Rank
12 pages
Correlation 26-2-24
No ratings yet
Correlation 26-2-24
16 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
RMT Session 8
No ratings yet
RMT Session 8
59 pages
Correlation Analysis and Interpretation
No ratings yet
Correlation Analysis and Interpretation
30 pages
Data Science Unit-3
No ratings yet
Data Science Unit-3
42 pages
ECN 236 - Correlation Theory 3
No ratings yet
ECN 236 - Correlation Theory 3
2 pages
Nadhratul Hikmah 1910533031 Multivariate Statistics
No ratings yet
Nadhratul Hikmah 1910533031 Multivariate Statistics
7 pages
Spearman's Rank Correlation Coefficient
No ratings yet
Spearman's Rank Correlation Coefficient
11 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
Spearman Correlation Coefficient
No ratings yet
Spearman Correlation Coefficient
4 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
Chapter Four Correlation Analysis: Positive or Negative
No ratings yet
Chapter Four Correlation Analysis: Positive or Negative
15 pages
Linear Correlation Analysis Guide
No ratings yet
Linear Correlation Analysis Guide
11 pages
r23 P & S Unit 2 Material
No ratings yet
r23 P & S Unit 2 Material
14 pages
Correlation Constant
No ratings yet
Correlation Constant
23 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
7 pages
Spearman Rank Correlation
100% (1)
Spearman Rank Correlation
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
39 pages
Stats Correlation for Chem Students
No ratings yet
Stats Correlation for Chem Students
50 pages
Correlation
No ratings yet
Correlation
33 pages
Correlation and Dependence: Navigation Search
No ratings yet
Correlation and Dependence: Navigation Search
7 pages
Using Statistical Techniq Ues in Analyzing Data
100% (1)
Using Statistical Techniq Ues in Analyzing Data
40 pages
MBA LSCM: Correlation & Regression
No ratings yet
MBA LSCM: Correlation & Regression
50 pages
Scatter Plots and Correlation Analysis
No ratings yet
Scatter Plots and Correlation Analysis
12 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
23 pages
Correlation
No ratings yet
Correlation
15 pages
Chapter - 5 - Correlation and Regression
100% (1)
Chapter - 5 - Correlation and Regression
70 pages
Publication Excerpt Dazong Richard Hosea
No ratings yet
Publication Excerpt Dazong Richard Hosea
6 pages
Module III Correlation and Regression
No ratings yet
Module III Correlation and Regression
61 pages
Topic 2
No ratings yet
Topic 2
25 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
59 pages
ACC 107 - Regression and Correlation
No ratings yet
ACC 107 - Regression and Correlation
24 pages
Correlation 1
100% (1)
Correlation 1
57 pages
Ch.-1 Correlation, Regression and Curve Fitting
No ratings yet
Ch.-1 Correlation, Regression and Curve Fitting
22 pages
Group Assignment
No ratings yet
Group Assignment
3 pages
Measures of Correlation
No ratings yet
Measures of Correlation
7 pages
Correlation Analysis Overview
No ratings yet
Correlation Analysis Overview
40 pages
Correlation Analysis: 1101091-1101100 PGDM-B
No ratings yet
Correlation Analysis: 1101091-1101100 PGDM-B
25 pages
Correlation Matrix
No ratings yet
Correlation Matrix
3 pages
CORRELATION
No ratings yet
CORRELATION
61 pages
06 Correlation
No ratings yet
06 Correlation
8 pages
Correlation Analysis - B Statistics
No ratings yet
Correlation Analysis - B Statistics
8 pages
Spearman's Rank Correlation Explained
No ratings yet
Spearman's Rank Correlation Explained
5 pages
Spearman Rank Order Correlation or Point Biserial Correlation
No ratings yet
Spearman Rank Order Correlation or Point Biserial Correlation
12 pages
Portion 10
No ratings yet
Portion 10
55 pages
Lesson 8. Correlation
No ratings yet
Lesson 8. Correlation
29 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
Measures of Variability and Correlation
No ratings yet
Measures of Variability and Correlation
8 pages
Chapter 9
No ratings yet
Chapter 9
14 pages
Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
8 pages
Lecture No 04 - Stats - 3!5!24
No ratings yet
Lecture No 04 - Stats - 3!5!24
26 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Stat MEM Chapter II Correlation
No ratings yet
Stat MEM Chapter II Correlation
5 pages
Spearmans Rank Correlation Coefficient Wikipedia
No ratings yet
Spearmans Rank Correlation Coefficient Wikipedia
8 pages
Lecture VII Bivariate Data
No ratings yet
Lecture VII Bivariate Data
8 pages
Citra Canary: CIA Installation Errors
No ratings yet
Citra Canary: CIA Installation Errors
2 pages
Seminar Ieee Format 2
No ratings yet
Seminar Ieee Format 2
4 pages
Microprocessor Computer ArchitectureCACS155
No ratings yet
Microprocessor Computer ArchitectureCACS155
7 pages
Lab 5 - PHP - MYSQL CS314
No ratings yet
Lab 5 - PHP - MYSQL CS314
12 pages
Development of Learning Instrument of Active Learning
No ratings yet
Development of Learning Instrument of Active Learning
7 pages
GMM - Steak Wellington
No ratings yet
GMM - Steak Wellington
6 pages
std9 ch10 Que
No ratings yet
std9 ch10 Que
3 pages
RE Re Updated EULA - Recovery Care - Monthly Subscription - $ 30.00
No ratings yet
RE Re Updated EULA - Recovery Care - Monthly Subscription - $ 30.00
9 pages
Design of FPGA Based AFM Controller For Undergraduate Study
No ratings yet
Design of FPGA Based AFM Controller For Undergraduate Study
5 pages
Mobile Computing
No ratings yet
Mobile Computing
7 pages
Unsubscribe from Spam Messages
No ratings yet
Unsubscribe from Spam Messages
5 pages
1715166871
No ratings yet
1715166871
6 pages
Questions Bank-01 (AWS IAM)
No ratings yet
Questions Bank-01 (AWS IAM)
8 pages
IELTS Reading Practice Test 16 Printable v2
100% (1)
IELTS Reading Practice Test 16 Printable v2
13 pages
Lecture 4
No ratings yet
Lecture 4
25 pages
DL Unit 4
No ratings yet
DL Unit 4
58 pages
Employee Payroll System Micro Project
No ratings yet
Employee Payroll System Micro Project
11 pages
Unit 5
No ratings yet
Unit 5
66 pages
Next Gen Firewall for Offices
No ratings yet
Next Gen Firewall for Offices
2 pages
Service & Maintenance Manual
No ratings yet
Service & Maintenance Manual
198 pages
DAA Project
No ratings yet
DAA Project
2 pages
1-Introduction To Systems of Linear Equations
No ratings yet
1-Introduction To Systems of Linear Equations
52 pages
Buy Verified Go2Bank Account - Secure Your Finances Today
No ratings yet
Buy Verified Go2Bank Account - Secure Your Finances Today
17 pages
Welcome (Absurd - Design)
No ratings yet
Welcome (Absurd - Design)
6 pages
SV-11-0032 - Rev.1 - 7200z Service Manual - 112118
No ratings yet
SV-11-0032 - Rev.1 - 7200z Service Manual - 112118
43 pages
Mac and Linux OS Timeline Summary
No ratings yet
Mac and Linux OS Timeline Summary
6 pages
Python and Scratch Assignment Aaryahi
No ratings yet
Python and Scratch Assignment Aaryahi
7 pages
Passive Voice Present Simple
No ratings yet
Passive Voice Present Simple
3 pages
Edge Controller User Guide v1.2
No ratings yet
Edge Controller User Guide v1.2
25 pages
Access Form for Limosa Declaration
No ratings yet
Access Form for Limosa Declaration
7 pages

Lecture06 Prel

Uploaded by

Lecture06 Prel

Uploaded by

In few words, the correlation measures how well the association between two variables can

• There is a perfect association between x and y3 (bottom–left panel), in fact, we have

Correlation matrix Let us say that we have J variables, x1 , x2 , · · · , xJ . It is possible to

write the correlation matrix as

5.2 Spearman’s correlation coefficient

• The top–left panel shows an increasing function: as x increases, y also increases;

• The top–central panel shows a decreasing function: as x increases, y decreases;

• The bottom–central panel shows a non–increasing function: a function that is partly

where R̄U = (N + 1)/2.

Now we calculate the first term in the denominator

And the second term in the denominator is

5.3 Kendall’s correlation coefficient

5.4 Time-series plots

6 Simple linear regression: the descriptive approach

Table 24: Array collecting the information in a regression problem

Figure 35: Scatter plot of number of workers x and number of tables y.

We have said that we want to express y as a function of x. In the previous example we

Figure 36: Four lines fitted to the workers dataset.

First, let us formalize what do we mean by a line ŷ adequately describing y as a function

Remark: The slope b1 in (14) can be written alternatively as

The fitted regression line is

Figure 37: Least squares regression fitted to the workers dataset.

You might also like