0% found this document useful (0 votes)
154 views12 pages

Correlation & Regression Analysis Worksheet

Uploaded by

Hei Chit Luk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views12 pages

Correlation & Regression Analysis Worksheet

Uploaded by

Hei Chit Luk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Investigation 4 - Correlation and Regression - Worksheet

Topics from GCE AS and A Level Mathematics

 Interpret scatter diagrams and regression lines for bivariate data, including recognition
of scatter diagrams which include distinct sections of the population
 Understand informal interpretation of correlation.

Investigation 4 (a)

Investigate if there is a correlation between daily mean total cloud cover and daily mean
pressure.

The data
Open the Excel workbook Pearson Edexcel GCE AS and AL Mathematics data [Link].
Select the Information worksheet.

1. Read the information in cell A17.


Explain how cloud cover is calculated.

The fraction of the celestial dome covered by cloud. The unit is oktas.

2. Read the information in cell A19. What units are used to measure daily mean pressure?

Pascal

3. What type of variable is daily mean pressure?

Continuous variable

4. What type of variable is daily mean total cloud cover?


Quantitative, discrete variable

The location Leuchars 2015 was randomly selected from:


Camborne 2015 Camborne 1987
Hurn 2015 Hurn 1987
Leuchars 2015 Leuchars 1987
Leeming 2015 Leeming 1987
Heathrow 2015 Heathrow 1987.

Select the worksheet Leuchars May-Oct 2015.

It is difficult to analyse these data as it is presented in the dataset. The headers need to be in
row 1.
Copy the data into a new workbook
Select the whole worksheet
Click on the small blue square in the left hand corner, this will select
the whole worksheet.

1
Investigation 4 - Correlation and Regression - Worksheet

Right click then Copy

Open a new workbook


Select the Office button then New
Then double click on Blank Workbook
Select A1 right click Paste

Rename the worksheet


Double click on the tab Sheet 1 at the bottom of the worksheet
Type Leuchars 2015, then click anywhere on the worksheet

Delete rows 1 – 5
Select rows 1 – 5 right click Delete
Save workbook as Leuchars2015

5. What are bivariate data?

Data on each of two variable, where each value of one of the variables is paired with a value
of the other variable

In bivariate data if one of the variables is controlled (or explains the other variable), it is
known as the independent (or explanatory) variable.

A dependent (or response) variable is a variable whose value depends on the value of another
variable.
The dependent variable is usually plotted on the vertical axis. Note: if you are using a
regression model to predict a value, the variable for the value you wish to predict should be
the Y variable and plotted on the vertical axis of a scatter diagram.

For bivariate data if all the points in a scatter diagram seem to lie near a straight line, there is
a linear correlation between the two variables.

Pearson’s product moment correlation coefficient provides a standardised measure of linear


correlation. Its value lies within −1 and +1.

Process
Plot a scatter diagram to investigate if there is a correlation between daily mean total cloud
cover and daily mean pressure. In this case there is no controlled variable. Later in this
investigation a prediction for daily mean total cloud cover will be calculated based on the
regression equation for daily mean total cloud cover against daily mean pressure.
6. Which variable should be plotted on the vertical axis?

Mean total cloud cover

Plot a scatter diagram in Excel


Open [Link]’.

2
Investigation 4 - Correlation and Regression - Worksheet

When plotting a scatter diagram, Excel plots the variable in left hand column on the x-axis
and the variable in the right hand column on the y-axis.
Copy columns A, I and K into a new worksheet
In the worksheet Leuchars2015 select column A right click Copy
In worksheet Sheet 2 select A1 right click Paste
In the worksheet Leuchars2015 select column K right click Copy
In worksheet Sheet 2 select C1 right click Paste
In the worksheet Leuchars2015 select column I right click Copy
In worksheet Sheet 2 select D1 right click Paste

Rename Sheet 2 CloudPress


Select B1 type Season
In column B code the data for May, June, July and August as
Spring/Summer and code the values for September and October
Autumn. These codes will be used later.

Plot a scatter diagram


In the worksheet CloudPress select columns C and D
Select the Insert tab then Scatter.
Select Scatter with only Markers.

Format the diagram

Delete the legend Daily Mean Total Cloud (oktas)


Click on the legend and delete 9
8

Add a title 7
6
Click on the title and type Daily mean total cloud 5
cover vs Daily mean pressure Leuchars 2015 then 4
Daily Mean Total Cloud
(oktas)
Enter 3
2
1
Add a vertical axis title 0
Click on the chart then select the Layout tab select 980 1000 1020 1040

Axis Titles then Primary Vertical Axis Title then


Vertical Title and type Daily mean total cloud cover (oktas) and Enter

Change the alignment of the text in the vertical title


Right click on the vertical title select Format Axis Title then
Alignment then click on the arrow next to Text direction click
Rotate all text 270o and Close

3
Investigation 4 - Correlation and Regression - Worksheet

Add a horizontal axis title


In the Layout tab select Axis Titles then Primary Horizontal Axis Title then Title Below Axis
and type Daily mean pressure (hPa) and Enter.
Add a chart border
Right click on the chart select Format Plot Area then Border Color select Solid line open up
the Color arrow select Black and Close

Save your work

Daily mean total cloud cover versus Daily mean


pressure Leuchars 2015 May to October
9
Daily mean total cloud cover (oktas)

8
7
6
5
4
3
2
1
0
980 990 1000 1010 1020 1030 1040
Daily mean pressure (hPa)

Figure 1

7. Comment on the correlation between the two variables.

No strong correlation between daily mean total cloud cover and daily mean pressure for Leuchars
from May to October 2015. The points are widely scattered, indicating a weak or non-existent linear
relationship between these variables.

8. Give an interpretation of the correlation between the two variables.

The interpretation is that daily mean total cloud cover and daily mean pressure are not linearly
related.

9. Comment on the scatter.

The scatter plot shows a broad distribution of points without any clear trend. Points are spread
across different cloud cover values for a wide range of pressures, indicating that high or low pressure
does not consistently correspond to high or low cloud cover.

4
Investigation 4 - Correlation and Regression - Worksheet

Find the product moment correlation coefficient in Excel


Select E2 type =corr

Select CORREL then column C, column D Enter

Correlation coefficient = -0.055611

To add a line of regression in Excel (This is called a trend line in


Excel)
Right click on any data point select Add Trendline
Select Linear then Display Equation on Chart and Close
Drag the equation of regression (trend line) so it is clearly visible
on the chart, reduce the number of decimal place values to two
and format the text using the formatting in the Home tab.

Daily mean total cloud cover versus Daily mean


pressure Leuchars 2015 May to October
9
Daily mean total cloud cover (oktas)

8
7
6
5
4
3
y = -0.07x + 77.53
2
1
0
980 990 1000 1010 1020 1030 1040
Daily mean pressure (hPa)

10. Write the line of regression using the names of the variables.

y = -0.07x + 77.53
So Daily mean total cloud cover =-0.07 x daily mean pressure + 77.53

11. Interpret the gradient of the line of regression for daily mean total cloud cover against
daily mean pressure.

-0.0711

12. Does this regression model seem to fit the data?

Yes

5
Investigation 4 - Correlation and Regression - Worksheet

13. Use the regression model (equation of regression) to predict the daily mean total cloud
cover for a daily mean pressure of 1030 hPa.

4.297

Report

Daily mean total cloud cover vs Daily mean pressure Leuchars


2015 May to October
9

8
Daily Mean Total Cloud (oktas)

7 f(x) = − 0.071124045200379 x + 77.5323726443281


6

0
980 990 1000 1010 1020 1030 1040
Daily Mean Pressure (hPa)

Investigation 4 (b)
Plot a scatter diagram to investigate the relationship between daily mean total cloud cover
against daily mean pressure for Leuchars 2015, split by season.

To plot the scatter diagram daily mean total cloud cover against daily mean pressure for
Leuchars 2015, split by season

In the worksheet CloudPress select E1 type Spring/Summer Enter


Select F1 type Autumn Enter (these will be used as series names)

Plot a scatter diagram for Spring/Summer only, cells C1:D124.


Move the chart to the top of the worksheet. Either drag or Cut and Paste.

6
Investigation 4 - Correlation and Regression - Worksheet

Add a label for Spring/Summer


Right click on the chart select Select Data
Select Edit
Series name select E1 OK OK

Add Autumn data and label


Right click on the chart select Select Data
Select Add
Series name select F1
Series X values select C125:C185
Series Y values select D125:D185
(delete ={y})
OK OK

Format the chart by adding a title and labels.

Daily mean total cloud cover versus Daily mean


pressure Leuchars 2015, by season
9
Daily mean total cloud cover (oktas)

8
7
6
5
4 Spring/summer
3 Autumn
2
1
0
980 990 1000 1010 1020 1030 1040
Daily mean pressure (hPa)

Report

7
Investigation 4 - Correlation and Regression - Worksheet

Daily mean total cloud cover vs Daily mean


pressure Leuchars 2015, by season
Daily Mean Total Cloud (oktas)

9
8
7
6
5
4
3
2
1
0
980 990 1000 1010 1020 1030 1040 1050
Daily Mean Pressure (hPa)

Spring/summer Autumn

14. Comment of the split of the data between Spring/Summer and Autumn.

In spring/summer (blue points), data points are more evenly spread across a range of pressures
(from about 980 hPa to 1030 hPa). The cloud cover values in this period also show a wide range
(from 2 to 9 oktas).
In autumn (orange points), data points are more clustered towards higher pressures (around 1000 to
1030 hPa). The cloud cover values are slightly more varied with a noticeable presence of lower cloud
cover values (2 to 8 oktas).
In spring/summer, there seems to be a higher variability in cloud cover at a given pressure. For
instance, at around 1000 hPa, cloud cover ranges from 4 to 9 oktas.
In autumn, the data suggests a slight trend where higher pressures correspond to lower cloud cover.
For example, at pressures above 1020 hPa, cloud cover is often lower (2 to 6 oktas).

Investigation 4 (c)

Investigate if there is a correlation between daily mean visibility and daily mean temperature.
Use the same random sample i.e. Leuchars 2015.

The data
In the Excel workbook Pearson Edexcel GCE AS and AL Mathematics data [Link].
Select the Information worksheet.

1. Read the information in cell A12.


Explain how daily mean temperature is measured.
Air temperatures are recorded by thermometers in a louvered screen 1.25 metres above short
grass, except at some Weather Centre’s and Climate Data Logger stations, where
observations are made from a non-standard roof top exposure.

8
Investigation 4 - Correlation and Regression - Worksheet

2. Read the information in cell A18.


Define daily mean visibility.

Visibility is defined as the greatest distance at which an object can be seen and recognized in
daylight, or at night could be seen and recognized if the general illumination were raised to
daylight level.

3. What type of variable is daily mean temperature?

Independent

Process
Plot a scatter diagram of daily mean visibility against daily mean temperature.
4. Which variable should be plotted on the vertical axis?

daily mean visibility

5. Comment on the correlation between the two variables.

The scatter plot does not indicate a strong linear correlation between daily mean visibility and daily
mean temperature. The data points are quite widely dispersed.

There is a slight trend suggesting that higher temperatures may be associated with higher visibility,
as higher visibility values tend to be more frequent at higher temperatures (above 10°C).

However, there are still many points with high visibility at lower temperatures and points with low
visibility at higher temperatures, indicating a significant amount of variability.

6. Explain why no line of regression should be calculated or fitted to the data.

The scatter plot indicates a weak or non-existent linear relationship between the two variables. The
data points are widely scattered with no clear linear trend. Fitting a regression line would not provide
a meaningful or useful model because it would not accurately represent the underlying relationship.

9
Investigation 4 - Correlation and Regression - Worksheet

Daily Mean Visibility (dam) VSDaily Mean


Temperature (0900-0900) (°C)

6000

5000
Daily Mean Visibility (dam)

4000

3000

2000

1000

0
0.0 5.0 10.0 15.0 20.0 25.0
Daily Mean Temperature (0900-0900) (°C)

Investigation 4 (d)

Investigate if there is a correlation between daily mean air temperature and daily mean
pressure in Beijing May to October 2015.

The data
The data are provided in the Excel workbook [Link].

Copy the data for Beijing 2015 into a new workbook


Rename the workbook Beijing2015
Delete rows 1 – 5
Save workbook as Beijing2015

Process
Plot a scatter diagram to investigate if there is a correlation between daily mean air
temperature and daily mean pressure in Beijing, May to October 2015. Daily mean pressure
is the explanatory variable.

1. Which variable should be plotted on the vertical axis?

daily mean air temperature

2. Comment on the correlation between the two variables.

Strong correlation between temp and pressure

3. Give an interpretation of the correlation between the two variables.

10
Investigation 4 - Correlation and Regression - Worksheet

4. Find the product moment correlation coefficient in Excel.

Correlation coefficient = -1.0164

5. Add a line of regression to the plot.

6. Write the line of regression using the names of the variables.

7. Interpret the gradient of the line of regression for daily mean air temperature against daily
mean pressure.

Gradient is -1.0164 so if temperature decreases by 1.0164 c when the pressure decresses by 1


c

8. Interpret the intercept of the line of regression for daily mean air temperature against
daily mean pressure.

Gradient is -0.9839 so if pressure decreases by 0.9839 c when the tempearature decresses by


1c

9. Does this regression model seem to fit the data? Give a reason for your answer.

Yes, as it has a strong collreation

10. Use the regression model (line of regression) to predict the daily mean air temperature for
a daily mean pressure of 1005 hPa.

27

11. Comment on the accuracy of the predicted daily mean air temperature in question 10.

11
Investigation 4 - Correlation and Regression - Worksheet

Using y = -1.0164x + 1032.9

The temperature is 27.4c

12. Plot scatter diagram to show the relationship between daily mean air temperature against
daily mean pressure for Beijing, May to October 2015, split by season.

Report

Daily mean air temperature VS daily mean pressure in


Beijing, May to October 2015, by season

1040

1030
daily mean pressure (hPa)

1020 Spring/Summer
Autumn
1010

1000

990

980

970
5.0 10.0 15.0 20.0 25.0 30.0 35.0
Daily mean air temperature (C)

12

You might also like