100% found this document useful (4 votes)
3K views59 pages

Statistical Analysis With Software Applications BSA PDF

This document provides instructional materials for a statistics course. It outlines 7 modules that cover basic concepts of statistics, presentation of data, measures of central tendency and variability, the normal distribution, correlation and regression analysis, and hypothesis testing. Module 1 defines descriptive and inferential statistics, and discusses population and samples, parameters and statistics. It also covers sampling techniques such as simple random sampling, stratified random sampling, and clustering sampling. The module presents formulas for determining sample size and provides examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
3K views59 pages

Statistical Analysis With Software Applications BSA PDF

This document provides instructional materials for a statistics course. It outlines 7 modules that cover basic concepts of statistics, presentation of data, measures of central tendency and variability, the normal distribution, correlation and regression analysis, and hypothesis testing. Module 1 defines descriptive and inferential statistics, and discusses population and samples, parameters and statistics. It also covers sampling techniques such as simple random sampling, stratified random sampling, and clustering sampling. The module presents formulas for determining sample size and provides examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

LOPEZ, QUEZON BRANCH

INSTRUCTIONAL
MATERIALS
FOR
STAT 20053
Statistical Analysis with
Software Applications

Compiled by:

THELMA D. OLAIVAR
Associate Professor 3
STAT 20053
Statistical Analysis with Software Applications
BSA 2

CONTENTS

Module 0 Overview

Module 1 Basic Concepts

Module 2 Presentation of Data

Module 3 Measures of Central Tendency

Module 4 Measures of Variability

Module 5 The Normal Distribution

Module 6 Correlation and Regression Analysis

Module 7 Hypothesis Testing


MODULE 1
Elementary Statistics
and Probability
𝐵𝑎𝑠𝑖𝑐 𝐶𝑜𝑛𝑐𝑒𝑝𝑡𝑠

Objectives of the Module

At the end of this module, the


students should be able to

 Differentiate descriptive from


inferential statistics.

 Differentiate population from


sample.

 Differentiate parameter from


statistic.

 Compute for the sample size


using Slovin’s Formula

 Recognize the difference


between probability from non-
probability sampling

 Recognize the different


methods of gathering data

.
PRE-TEST

𝑁
Using 𝑛 = 1+𝑁𝑒2 fill up the following table:

𝑵 𝒆 𝒏
1000 0.05
1000 0.01
2000 0.02
2000 0.03
500 0.10
500 250
5000 1000
5000 500
1.0 Types of Statistics

Descriptive Statistics describes or summarizes the important characteristics of a set of


data. The data found in The Socio-Economic Profile of Freshmen in any university makes
use of descriptive statistics.

Inferential Statistics draws conclusions about a population through the use of a


representative sample. Using the concept of probability, inferential statistics deals with
generalizations from samples to populations, hypothesis testing, determining relationships
among variables, and prediction.

A population is the complete and entire collection of elements under study. When
population is very large, to economize on time, money, and effort, a sample which is a
representative subset of the population must be used instead.

A parameter is a numerical measurement describing a population.

A statistic is a numerical measurement describing a sample.

1.1 Sampling and Sampling Techniques

When gathering data, a census is attractive to use if the population is small because
It eliminates sampling error and it provides data on all the members or elements of the
population.

If the population is too big to handle, a substantial number of samples is acceptable.


Determining the sample size is truly important because if the sample size is too large, there
may be a waste of time, money, and effort while if it is too small, it may lead to inaccurate
results. It is better to use a formula. The simplest formula to use is the Slovin’s Formula:
𝑁
𝑛=
1+𝑁𝑒2
Where:
𝑛 = sample size
𝑁 = population size
𝑒 = margin of error
sampling error is the difference in the survey results between sample and population
Illustrations:
Example 1: Find the sample size if the population size is 20,000 and the margin of
error is 5%.
20000 20000
𝑛 = = = 392
1 + 20000(0.05 )
2 1 + 50

Example 2: Find the sample size if the population size is 20,000 and the margin of error
is 1%.
20000 20000
𝑛 = = = 6667
(
1 + 20000 0.01 2 ) 1+2

Sampling Techniques
Probability Sampling Techniques are techniques that result to samples are chosen in
such a way that every member of the population has a known though not necessarily an equal
chance of being selected. The samples used are unbiased samples.
1. Simple Random Sampling - all members of the population have an equal chance of
being selected. Randomization is done through Lottery
Technique or Fish-Bowl Technique or using a Table of
Random Numbers.
𝑁

randomization

2. Stratified Random Sampling - this is used when the sample is divided into groups
or strata and samples are randomly selected from
20 each stratum. The sampling ratio n/N is used to

𝑁 arrive at proportionate samples for each group or stratum.

𝑛
Use sampling ratio =
𝑁

𝑛
3. Systematic Sampling with a Random Start - there is the selection of every kth
element of the population where k = N/n
AB CD E is called the sampling interval. The starting
FGHI JKL
element is selected randomly.
𝑁 MNOPQ RS
TUVWXYZ

𝑁
Use 𝑘 =
𝑛

D HL
𝑛 PTX

4. Cluster Sampling - this is sometimes called area sampling because


the population may be very, very large requiring
the population to be grouped into clusters.

1 2
5 N in clusters
3 4
8
6

2 n in clusters
5 8

Non-Probability Sampling Techniques is a sampling method where every element of


the population does not have a known chance of being included in the sample.
1. Convenience Sampling - selection of elements in the sample are based on convenience
where the elements are easily accessible.

easily accessible

𝑛
2. Quota Sampling - this is used when there is stratification but the sampling ratio is not
used, instead the one doing the sampling merely decides
on the allocation or quota

n by quota

3. Purposive Sampling - selection of members of the sample is based on some


predetermined criteria (purpose)

qualified sample

1.2 Data Collection & Sampling Techniques

1. The Direct or Interview Method - method in which there is a direct contact with the
respondent thus more accurate response is obtained
since clarification from the interviewee can be readily
obtained.
2. The Indirect or Questionnaire Method - method in which a lot of money and time will
be saved because a questionnaire can be
given to the respondents at the same time.

3. The Registration Method - method of gathering data that is governed by law

4. The Experimental Method - method used to find out cause and effect relationship

POST TEST

Fill up the following table with the advantages and major


disadvantages of Data Gathering Techniques/Methods
Methods/ Advantages Disadvantages
Techniques
MODULE 2
Elementary Statistics
and Probability
𝑃𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑎𝑡𝑎

Objectives of the Module

At the end of this module, the


students should be able to

 Recognize the different


methods of presenting
data

 Construct a frequency
distribution

 Graph a frequency
distribution.

 Construct other types of


frequency distribution
2.0 Methods of Presenting Data

 Textual Presentation of Data - data is presented in paragraph or in sentences


and includes enumeration of important
characteristics emphasizing the most significant
features and highlighting the most striking
attributes of the set of data
- also known as textual form

 Tabular Presentation of Data - clear presentation and comparison of large number


of data items in a table to allow data to be presented
at a level of detail which cannot usually be
determined from a text

 Graphical Presentation of Data - may be in the form of bar graphs, line graphs or
pie charts which help facilitate comparison and
interpretation without going through the numerical
data

Activity 2a

Consider the following table. Label its parts: table number, table title, column headers,
and source note.

Table 2a:
Distribution of BSEd Math Students
According to Year Level

Year Level No. of Students Percentage


(Frequency) Frequency
Freshmen 50 0.3182
Sophomore 50 0.2727
Junior 60 0.2273
Senior 40 0.1818
𝑁 = 1,100

Source: University Registrar


Activity 2b

On your Activity Notebook, cut and paste examples of the different types of Data Presentation.

2.1 Frequency Distribution

The most convenient way of organizing data is by constructing frequency distribution. A


frequency distribution is a collection of observations produces by sorting them into classes and
showing their frequency distribution: categorical, ungrouped, and group ed.

The categorical frequency distribution is used for data that can be placed in specific
categories, such as nominal, or ordinal level data.

Example 1 The following data give the blood types of 40 BSA students.

A O O A B O AB O B AB
O B O A A O O O O A
AB O A O B AB B A O O
A O O O O A O O A B

Blood Types Tally Frequency


A IIII – IIII 10
B IIII – I 6
O IIII – IIII – IIII - IIII 20
AB IIII 4

When observations are sorted into classes of single values, the result is called a
frequency distribution for ungrouped data. When observations are sorted into classes of more
than one value, the result is called a frequency distribution for grouped data.

Weekly Expenses of 80 First Year Students


variable Weekly Expenses Number of Students
101 - 300 8
2nd class 301 - 500 40 Frequency of
the 2nd class
501 - 700 11
lower limit of 701 - 900 16
the 4th class
901 - 1100 5

Upper limit of the 4th class


The following are the basic terminologies associated with frequency tables.

Lower class limit – the smallest data value that can be included in the class

Upper class limit – the largest value that can be included in the class

Class boundaries – are used to separate the classes so that there are no gaps in the
frequency distribution.

Class marks – the midpoints of the classes


lower limit + upper limit
𝑋𝑚 =
2
Class width – the difference between two consecutive lower class limits

The class width of the preceding distribution is 200 (301− 101 = 200).

The following are the steps in constructing a frequency table.

Step 1 Decide on the number of classes your frequency table will have. Usually,
it is between 5 and 20.
Step 2 Find the range. This is the difference between the highest and lowest
scores.
Step 3 Find the class width. Divide the range by the number of classes. The
class with should be an odd number. This ensures that the midpoint of
each class has the same place value as the data.
Step 4 Select a starting point, either the lowest score or the lower class limit. Add
the class width to the starting point to get the second lower class limit.
Then enter the upper class limit.
Step 5 Find the boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to the upper class limit.
Step 6 Represent each score by a tally.
Step 7 Count the total frequency for each class.

Note: When constructing frequency tables,


1. The classes must be mutually exclusive; each score must belong to only one class.
2. Include all classes, even if their frequency is zero.
3. Make sure that all classes have the same width.
4. Try to select convenient numbers from class limits.
5. Make sure that the number of classes should be between 5 and 20.
Activity 2c

Use the following data to construct


1. A categorical frequency distribution for gender.
2. A frequency distribution using the PUPCET scores and 7 classes.
3. A frequency distribution of Grade 10 average with 5 classes.
4. A frequency distribution of Grade 12 average with 5 classes.

Name Gender PUPCET Scores Grade 10 Average Grade 12 Average


A F 85 87 89
B M 86 89 91
C M 71 85 87
D F 84 87 88
E F 79 90 92
F F 94 93 97
G F 63 84 86
H F 86 90 92
I M 75 87 88
J F 90 95 95
K F 91 93 97
L M 80 85 89
M F 82 89 92
N M 84 87 90
O M 87 91 95
P M 95 95 98
Q F 67 82 87
R M 82 87 87
S F 87 91 93
T F 90 93 95
U M 89 92 95
V F 78 88 89
W F 80 90 91
X M 84 90 90
Y F 90 95 96
Activity 2d

1. Construct a frequency polygon and histogram of a frequency distribution.


2. Construct an ogive of a frequency distribution.
MODULE 3
Elementary Statistics
and Probability
𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑠 𝑜𝑓 𝐶𝑒𝑛𝑡𝑟𝑎𝑙
𝑇𝑒𝑛𝑑𝑒𝑛𝑐𝑦

Objectives of the Module

At the end of this module, the


students should be able to

 Discuss and explain the


properties of each measure of
central tendency.

 Compute for the measure of


central tendency for grouped
and ungrouped data.

 Realize the importance of the


measures of central tendency
in describing characteristics of
data set.

 Compute for the measures of


location.
3.0 Measures of Central Tendency (Ungrouped Data)

The measures of central tendency are measures that represent a set of scores. They
are called averages.
 Mean – computational average
- every score participates in the computation
Notation: 𝑥̅ sample mean
𝓊 population mean (𝓂𝓊)
Formula (Ungrouped Data)
Σ𝑥
𝑥̅ =
𝑁
Illustrations:
1. Consider the grades in five quizzes in Accounting of two accounting students
Petra 75 80 85 90 95
Juana 100 75 80 70 100
Σ𝑥 425
Petra’s mean would be 𝑥̅ = 𝑁 = 5 = 85
Σ𝑥 425
Juana’s mean would be 𝑥̅ = 𝑁
= 5
= 85
2. The monthly salaries of five BSEd Math graduates a year after graduation are
as follows: P15000, P25000, P7000, P100000 and P8000
What is the mean salary?
Σ𝑥
𝑥̅ = 𝑁
= 155000
5
= 31,000
Do you think P31000 is really a very good representative score?
3. Weighted Mean – used when every score is given weight
Consider the grades of Anne in four major subjects:
Subject Grade (𝑥) No. of units (𝑤) 𝑤𝑥
1.0 3 3.0
2.0 4 8.0
1.5 3 4.5
1.5 3 4.5
Σ𝑤 = 13 Σ𝑤𝑥 = 20
Σw𝑥
𝑥̅ 𝑤 =
Σw
= 20
13
= 1.538
4. Weighted Mean – used when dealing with Likert Scale or Modified Likert Scale
Consider the following table showing the attitudes of 20 students towards
mathematics based on the level of agreement to three statements
Statement 1 2 3 4 5 𝑥̅ 𝑤 Descriptive
Interpretation
I like Math 0 3 5 10 2 3.55 Agree
I enjoy problem solving 1 3 7 8 1
I just want to compute 0 2 4 12 2
Σf𝑥 0 (1)+3(2)+5(3) +10(4) +2(5)
𝑥̅ 𝑤 = Σf =
20
= 71
20
= 3.55
𝐻𝑆−𝐿𝑆
The descriptive interpretation is based on the Range Interval =
𝑛𝑜.𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
5− 1 4
= = = 0.80
5 5
1.00− 1.80 Strongly Disagree
1.81− 2.60 Disagree
2.61− 3.40 Uncertain (Neutral)
3.41− 4.20 Agree
4.21− 5.00 Strongly Agree
 Median – positional average
- middle score that divides the set of scores into two equal parts
Notation: 𝑀𝑑 , 𝑥̃
Formula:
𝑥 𝑛 +1
𝑥̃ =
2

Note: Arrange the data first in an array

Illustrations:
1. Consider the grades of Petra and Juana Arranged in arrays
𝑥1 𝑥2 𝑥3
Petra 75 80 85 90 95 𝑥̃ = 𝑥 6 = 𝑥 3 = 85
2
Juana 70 75 80 100 100 𝑥̃ = 80
2. The monthly salaries of five BSEd students arranged in an array
7000,8000,15000,25000,100000
𝑥̃ = 𝑥 3 = 15,000
Which is the better representative score when there are outliers
(𝑖𝑛 𝑡ℎ𝑖𝑠 𝑐𝑎𝑠𝑒 𝑃100,000.00)?
3. What if there is an even number of scores?
85 90 65 100 90 75 95 85
Array: 65 75 85 85 90 90 95 100
𝑥4 𝑥5
85 + 90
𝑥̃ = 𝑥 8+1 = 𝑥 4.5 = = 87.5
2 2
 Mode – nominal average
- the most frequently occurring score
Notation: 𝑀𝑜 , 𝑥̂
No formula, just look at the scores with highest frequency
Illustrations:
1. For Petra’s grades, there is no mode.
For Juana’s grades, the mode is 100.
2. For the monthly salaries of five BSED graduates, there is no mode.
3. For the even number of scores, there are two modes 85 and 90 (bimodal)

3.1 Measures of Central Tendency (Grouped Data)

 Mean – computational average


Properties:
1. The sum of the deviations of all measurements in a set from the mean is 0.
2. It can be calculated for any set of numerical data so it always exist.
3. A set of numerical data has one and only one mean.
4. It lends itself to higher statistical treatment.
5. It is the most reliable since it takes into account every item in the set of
data.
6. It is greatly affected by an outlier/extreme or deviant value.
7. It is used only if the data are interval or ratio and when normally distributed.
Formula: (Grouped Data)
𝛴𝑓(𝑥)
𝑥̅ =
𝑛

Illustration:
Use the following table showing the scores of 40 BSEd freshmen in a long quiz
on Contemporary World
Classes 𝑓 𝑥 𝑓𝑥
25 – 29 1 27 27
30 – 34 2 32 64
35 – 39 8 37 296
40 – 44 20 42 840
45 - 49 9 47 423
Note that each class is represented by 𝑥 the class midmark or midpoint
𝛴𝑓𝑥 1650
𝑥̅ = = = 41.25
𝑛 40
 Median – positional average

Properties:
1. The score or class in a distribution, below which 50% of the score fall and
above which another 50% lie.
2. Not affected by extreme or deviant values.
3. Appropriate to use when there are outliers or extreme or deviant values.
4. Use when data are ordinal.
5. It exists in both quantitative and qualitative data.

Formula: (Grouped Data)


𝑛
−𝑐𝑓
𝑥̃ = 𝐿 𝐵 + (2 )𝑖
𝑓𝑚

where 𝐿𝐵 = lower limit boundary of the median class


𝑛 = total frequency
𝑐𝑓 = cumulative frequency before median class
𝑓𝑚 = frequency of median class

Classes 𝑓 𝑥 Boundaries < 𝑐𝑓


25 – 29 1 27 24.5 – 29.5 1
30 – 34 2 32 29.5 – 34.5 3
35 – 39 8 37 34.5 – 39.5 11

Median class 40 – 44 20 42 39.5 – 44.5 31


45 - 49 9 47 44.5 – 49.5 40

𝑛
−𝑐𝑓
2
𝑥̃ = 𝐿 𝐵 + ( )𝑖
𝑓𝑚
20−11
= 39.5 + ( ) (5)
20

9
= 39.5 + ( ) (5)
20
= 39.5 + 2.25
= 41.75
 Mode – nominal average

Properties:
1. It is used when we want to find the value which occurs most often.

2. It is a quick approximation of the average.

3. It is an inspection average.

4. It is the most unreliable among the three measures of central tendency

because its value is undefined in some observations.

5. It exists in both quantitative and qualitative data.

Formula 1: (Grouped Data)


∆1
𝑥̂ = 𝐿 𝐵 + ( )𝑖
∆ 1+∆ 2

Where 𝐿𝐵 = lower limit boundary of the modal class


∆1 = difference between the highest frequency and the frequency
just below it
∆2 = difference between the highest frequency and the
frequency just above it
Formula 2: (an estimate)
𝑥̂ = 3𝑥
̃ − 2𝑥
̅

Illustrations:
Classes 𝑓 𝑥 Boundaries < 𝑐𝑓
25 – 29 1 27 24.5 – 29.5 1
30 – 34 2 32 29.5 – 34.5 3
35 – 39 8 37 34.5 – 39.5 11
Modal class 40 – 44 20 42 39.5 – 44.5 31
45 - 49 9 47 44.5 – 49.5 40
Identify the modal class, the class with the highest frequency.
Note that
∆ 1 = 20 − 8 = 12
∆ 2 = 20 − 9 = 11
12
𝑥̂ = 39.5+ ( ) (5)
12 + 11
= 39.5 + 2.608
= 42.108
3.2 Measures of Location

Measures of location are measures that divide the distribution into equal parts. They are
also called quantiles.

 Median – measure of location that divides the distribution into two equal parts

50% 50%

𝑥̃
 Quartiles – measures of location that divide the distribution into four equal parts

25% 25%

25% 25%

Formulas:
𝑁
( −𝑐𝑓𝑄1 )(𝑖)
4
𝑄1 = 𝐿 𝑄1 +
𝑓 𝑄1

where 𝐿𝑄1 = lower limit boundary of the Q 1 class


𝑁 = total frequency
𝑐𝑓𝑄1 = cumulative frequency before Q 1 class
𝑓𝑄1 = frequency of Q 1 class
𝑛
−𝑐𝑓
𝑄2 = 𝐿 𝐵 + (2 )𝑖
𝑓𝑚
3𝑁
( −𝑐𝑓𝑄3 )(𝑖)
4
𝑄3 = 𝐿 𝑄3 +
𝑓𝑄3

where 𝐿𝑄3 = lower limit boundary of the Q 3 class


𝑁 = total frequency
𝑐𝑓𝑄3 = cumulative frequency before Q 3 class
𝑓𝑄3 = frequency of Q 3 class
 Deciles – measures of location that divide the distribution in ten equal parts

𝐷1 𝐷2 𝐷9

 Percentiles – measures of location that divide the distribution in one hundred equal
parts

𝑃1 𝑃99

Activity 3a
Use the following table.
Classes 𝑓 𝑥 Boundaries < 𝑐𝑓
25 – 29 1 27 24.5 – 29.5 1
30 – 34 2 32 29.5 – 34.5 3
35 – 39 8 37 34.5 – 39.5 11
40 – 44 20 42 39.5 – 44.5 31
45 - 49 9 47 44.5 – 49.5 40

Find the following. Write the formulas first before the computation.

1. 𝑄1 6. 𝐷9

2. 𝑄2 7. 𝑃25

3. 𝑄3 8. 𝑃75

4. 𝐷1 9. 𝑃90
5. 𝐷5 10. 𝑃99
Activity 3b

1. Solve for the mean, median and mode of the following frequency distribution
Classes 𝑓 𝑥 Boundaries < 𝑐𝑓
25 – 29 1
30 – 34 2
35 – 39 8
40 – 44 20
45 - 49 9

2. Solve for the mean, median and mode of the following frequency distribution
Classes 𝑓 𝑥 Boundaries < 𝑐𝑓
25 – 29 8
30 – 34 8
35 – 39 8
40 – 44 8
45 - 49 8
MODULE 4
Elementary Statistics
and Probability
𝑀𝑒𝑎𝑠𝑢𝑟𝑒𝑠 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦

Objectives of the Module

At the end of this module, the


students should be able to

 Describe sets of data by using


measures of variability.

 Compute for the range,


quartile deviation variance,
standard deviation and
coefficient of variation using
formulas.

 Find and interpret coefficients


of skewness and kurtosis.

 Realize the importance of


measures of variability in
describing characteristics of
sets of data.
PRE-TEST

Analyze and interpret the graphs of distribution of monthly wages of workers from three
companies 𝑋, 𝑌 and 𝑍.

3000 5000 9000 11000 13000 17000 19000

___________1. In what company are more uniform?

___________2. In what company is the highest wage received?

___________3. In what company is the lowest wage received?

___________4. What is the mean wage in Company Y?

___________5. In what company are wages more dispersed?


4.0 Measures of Variability (Ungrouped Data)

Measures of variability, or dispersion are measures that indicate how dispersed or


scattered are the data.
With measures of variability, the measures of central tendency becomes more
meaningful and data are better described.
 Range – simplest measure of variability obtained by getting the difference
between the highest score and the lowest score in the set of data
Formula : 𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑆 − 𝐿𝑆
 Standard Deviation – is the square root of the average of the sum of
squared deviations (𝑥 − 𝑥̅ ) from the mean 𝑥̅
Variance is the square of the standard deviation
Notation : 𝑠 for sample standard deviation
𝜎 for population standard deviation
𝑠 2 is sample variance
𝜎 2 is population variance
Formula 1: (Ungrouped Data)

= √Σ 𝑥−𝑥̅
( )2
𝑠
𝑛

Formula 2: (Ungrouped Data)


Σ𝑥2
𝑠=√ − ̅) 2
(𝑥
𝑛

Illustrations:
Given the data
3 5 6 6 7 10 12 15
5. Range
𝑅𝑎𝑛𝑔𝑒 = 15 − 3
= 12
6. Variance and standard deviation
𝑥 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
3 −5 25
5 −3 9
6 −2 4
6 −2 4
7 −1 1
10 2 4
12 4 16
15 7 49
Σ𝑥 = 64 Σ(𝑥 − 𝑥̅ )2 = 112
𝑥̅ = 8
(𝑥 − 𝑥̅ )2
𝑠2 =
8
112
=
8
𝑠 2 = 14
𝑠 = √14
= 3.741657 ≈ 3.74

 Quartile Deviation – is the average deviation from the median or 𝑄2


- this is also called the Semi-Interquartile Range
(𝑄3 − 𝑄2 ) + (𝑄2 − 𝑄1 )
𝑄𝐷 =
2
Formula:
𝑄3 − 𝑄1
𝑄𝐷 =
2
Illustration:
Given: 3 56 67 10 12 15
𝑄1 = 5.5 𝑄2 = 6.5 𝑄3 = 11

𝑄3 − 𝑄1
𝑄𝐷 =
2
11 − 5.5
=
2
5.5
=
2
= 2.75

4.1 Measures of Variability (Grouped Data)

 Range
Formula: (Grouped Data)
𝑅𝑎𝑛𝑔𝑒 = 𝑈𝐵𝐻 − 𝐿 𝐵𝐿
where 𝑈𝐵𝐻 is the upper boundary of the highest class

𝐿 𝐵𝐿 is the lower boundary of the lowest class


 Standard Deviation

Formula 1: (Grouped Data)

Σ 𝑓 ( 𝑥−𝑥̅) 2
𝑠=√
𝑛

Formula 2: (Grouped Data)

Σ 𝑓𝑥 2
𝑠=√ − (𝑥̅ )2
𝑛

Note: Variance is just 𝑠 2


𝑄𝐷 has the same formula for
grouped and ungrouped data.

Activity 4a

Using the table, let us see if you can follow the formula

Classes 𝑓 𝑥 (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅)2 𝑓(𝑥 − 𝑥̅)2 𝑥2 𝑓𝑥 2


25 – 29 1 27
30 – 34 2 32
35 – 39 8 37
40 – 44 20 42
45 - 49 9 47

Solve for:

1. Range
2. Variance
3. Standard Deviation
4. 𝑄𝐷
4.2 Skewness

Skewness is the degree of asymmetry of a distribution. Negative skewness means


skewed to the left where the mean is less than the mode.

𝑠𝑘 < 0

𝑥̅ 𝑥̂

Positive Skewness means the tail is on the right and the mean is greater than mode.

𝑠𝑘 > 0

𝑥̂ 𝑥̅

The distribution is symmetrical if 𝑠𝑘 = 0

𝑠𝑘 = 0

𝑥̅ = 𝑥̂

Formula 1:
Skewness (Ungrouped)
Σ( 𝑥−𝑥̅) 3
𝑠𝑘 =
𝑛𝑠3

Formula 2:
Σ 𝑓 ( 𝑥−𝑥̅) 3
𝑠𝑘 =
𝑛𝑠3
4.3 Kurtosis

Kurtosis is the degree of flatness or peakedness of a distribution. Curves may be


described as platykurtic, mesokurtic or leptokurtic.

platykurtic
𝑘<3

mesokurtic

𝑘=3

leptokurtic

𝑘>3

Formula 1: (Ungrouped)
Σ( 𝑥−𝑥̅) 4
𝑘=
𝑛𝑠4

Formula 2: (Grouped)
Σ 𝑓 ( 𝑥−𝑥̅) 4
𝑘=
𝑛𝑠4
Activity 4b

Use the following table to compute for skewness and kurtosis.

1.

Classes 𝑓 𝑥 (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2 𝑓(𝑥 − 𝑥̅)2 𝑥2 𝑓𝑥 2


25 – 29 1 27
30 – 34 2 32
35 – 39 8 37
40 – 44 20 42
45 - 49 9 47

2.

Classes 𝑓 𝑥 (𝑥 − 𝑥̅ ) (𝑥 − 𝑥̅ )2 𝑓(𝑥 − 𝑥̅)2 𝑥2 𝑓𝑥 2


25 – 29 8 27
30 – 34 8 32
35 – 39 8 37
40 – 44 8 42
45 - 49 8 47
MODULE 5
Elementary Statistics
and Probability
𝑇ℎ𝑒 𝑁𝑜𝑟𝑚𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Objectives of the Module

At the end of this module, the


students should be able to

 Give the properties of a


normal distribution

 Find and interpret 𝑧-scores

 Find the area under the


normal curve

 Develop critical thinking by


solving real-life problems
using areas under the normal
curve
5.0 The Central Limit Theorem & the Normal Distribution

The Central Limit Theorem:


If 𝑛 (the sample size) is large, the theoretical sampling distribution of the
mean can be approximated closely with a normal distribution.
Among the many continuous distributions used in statistics, the normal distribution
is by far the most important. It has become important because many characteristics in life
approximate the normal distribution. Its study dates back to 18 th century investigations into
the nature of experimental errors. It shows that if the samples are increased to a
considerable number, the shape of the distribution approximates the normal curve, also
considered as the Gaussian curve, named after one of the three greatest mathematicians
of all times, Karl Friedrich Gauss, who considered IQ to be normally distributed.
The following are the properties of a normal distribution:
1. The 𝒎𝒆𝒂𝒏 = 𝒎𝒆𝒅𝒊𝒂𝒏 = 𝒎𝒐𝒅𝒆
2. It is symmetrical about the mean (𝑠𝑘 = 0 , 𝑘 = 3)
3. The tails or the ends are asymptotic relative to the horizontal axis
4. The total area under the normal curve is 1.0 𝑜𝑟 100%
5. The normal curve area may be subdivided into standard deviations, at least 3 to
the left and 3 to the right.

5.1 From Normal Distribution to Standard Normal Distribution

In a study about IQ, IQ of people approximates the normal distribution with 𝜇 = 100, with
𝜎 = 15

55 70 85 100 115 130 145

In terms of standard deviation, the normal distribution is converted to the standard


normal distribution with 𝜇 = 0 and 𝜎 = 1

−3 −2 −1 0 1 2 3

and we note that 85 is nothing but 𝜇 − 1𝜎 while 115 = 𝜇 + 1𝜎


This conversion from normal distribution to standard normal distribution was found to be
useful and important in researches, in business, economics, education, health, sciences &
social sciences and in many real-world problems and situations.

So that from raw data 𝑥, comes the standard score 𝑧


𝑥− 𝜇
𝑥 ⟶ 𝑧= ⟶𝑧
𝜎
Illustration:
Using 𝜇 = 100 and 𝑥 = 115
𝜎 = 15
115−100
𝑧= = 1 and we see that the standard score 𝒛 is the (raw) score
15
expressed in terms of standard deviation.

5.2 Areas under the Normal Curve

We learn many things from the normal curve areas.

68%

95%

99%

−3 −2 −1 1 2 3

If Area (𝑧 = −1 to 𝑧 = 1) = 68% , then since the standard normal distribution is


symmetrical
Area (𝑧 = 0 to 𝑧 = 1) = 34% and Area (𝑧 = −1 to 𝑧 = 0) = 34%
If Area (𝑧 = −2 to 𝑧 = 2) = 95% , then
Area (𝑧 = 0 to 𝑧 = 2) = .4750 and Area (𝑧 = −2 to 𝑧 = 0) = .4750
but these are just estimates. We use the Table of Normal Curve Areas.

Illustrations:
Find the following areas.

1. Area (𝑧 = 0 to 𝑧 = 1.56)
Locate 1.5 and column 6
𝐴𝑟𝑒𝑎 = 0.4406
1.56
2. Area (𝑧 = −2 to 𝑧 = 1)
Area (𝑧 = −2 to 𝑧 = 0) = 0.4772
Area (𝑧 = 0 to 𝑧 = 1) = 0.3413
𝐴𝑟𝑒𝑎 = 0.8185

−2 1

3. Area (𝑧 > 2)
Area (𝑧 = 0 to 𝑧 = 2) = 0.4772
Area = 0.5000 − 0.4772
= 0.0228

4. Area (𝑧 < −2.33)


Area (𝑧 = −2.33 to 𝑧 = 0) = 0.4901
Area = 0.5000 − 0.4901
= 0.0099
−2.33

5. Area (𝑧 = 1.56 to 𝑧 = 2.33)


Area (𝑧 = 0 to 𝑧 = 2.33) = 0.4901
Area (𝑧 = 0 to 𝑧 = 1.56) = 0.4406
Area (𝑧 = 0 to 𝑧 = 2.33) −
Area (𝑧 = 0 to 𝑧 = 1.56)
1.56 2.33
= 0.4901 − 0.4406
= 0.0495

6. Area (𝑧 > −1.96)


Area (𝑧 = −1.96 to 𝑧 = 0)
Area = 0.475 + 0.5000
= 0.9750
−1.96
MODULE 6
Elementary Statistics
and Probability
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 & 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝐴𝑛𝑎𝑙𝑦𝑠𝑖𝑠

Objectives of the Module

At the end of this module, the


students should be able to

 Find Pearson’s 𝑟

 Find Spearman’s 𝜌

 Find the coefficient of


determination 𝑟 2

 Interpret 𝑟 and 𝑟 2

 Use regression line and


regression equation in making
predictions.
6.0 Correlation

Correlation is a measure of relationship between two variables 𝑥 and 𝑦, usually 𝑥 as the


independent variable and 𝑦 as the dependent variable. Such relationship is indicated by the
value of 𝑟, the coefficient of correlation.
−1 ≤ 𝑟 ≤ 1
Strength of relationship |𝒓 | is indicated by the magnitude (from 0.0001 to 0.9999)
disregarding the sign of relationship.
The closer |𝒓 | to 1 the stronger the relationship.
Direction of relationship is indicated by the sign. There is a positive relationship if
one variable increases as the other variable increases, and as one variable decreases the
other one decreases. Refer to the scattergram.

𝑟>0 𝑟<0 𝑟=0


Activity 6a

Write + , − , 0 to indicate possible relationship between the pair of variables.

________1. income and expenditure

________2. price and demand for a product

________3. IQ and

________4. height and weight of a person

________5. number of hours spent in studying and score in exams

________6. number of absences and semestral GPA

________7. price of gas and number of cars in the road

________8. shoe size and income

________9. number of persons to do a work and time spent in completing the work

________10. life span and number of sticks of cigarettes consumed

6.1 Spearman’s ρ and Pearson’s 𝑟

 Spearman’s 𝛒 is an estimate of correlation. It makes use of ranking of variables.


6 Σ𝐷2
ρ=1−
𝑛(𝑛 2 − 1)
where 𝐷 2 is the square of the difference in rank

Use the following table to interpret.

Table 6a: Interpretation of "𝑟"

Between ±0.80 to ± 0.99 high correlation


Between ±0.60 to ± 0.79 moderately high correlation
Between ±0.40 to ± 0.59 moderate correlation
Between ±0.20 to ± 0.39 low correlation
Between ±0.01 to ± 0.19 negligible correlation
Illustration

Consider the following data on the number of hours spent in studying (𝑥) and the
grades received (𝑦) by 10 students.

𝑥 𝑦 𝑅𝑥 𝑅𝑦 𝐷 𝐷2
3 72 6.5 6.0 0.5 .25
6 89 1.5 1.0 0.5 .25
2 57 9.0 10.0 -1.0 1.00
3 69 6.5 8.0 -1.5 2.25
2 63 9.0 9.0 0 0
4 75 5.0 4.0 1.0 1.00
5 73 3.5 5.0 -1.5 2.25
2 70 9.0 7.0 2.0 4.00
6 82 1.5 3.0 -1.5 2.25
5 84 3.5 2.0 1.5 2.25
Σ𝐷2 = 15.5

6 Σ𝐷2
𝜌=1−
𝑛(𝑛2 − 1)
6 (15.5)
𝜌=1−
10(99)
93
= 1−
990
= 1 − 0.09393
= 0.90607
≈ 0.91

 Pearson’s 𝒓 or Pearson’s Product Moment Coefficient of Correlation 𝒓 gives a more


accurate computation of the coefficient of correlation:
𝑛 Σ𝑥𝑦 − (Σ𝑥)(Σ𝑦)
𝑟=
√[𝑛 Σ𝑥 2 − (Σ𝑥)2 ][𝑛 Σ𝑦2 − (Σ𝑦)2]
Activity 6b

Solve for Pearson’s 𝑟

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
3 72
6 89
2 57
3 69
2 63
4 75
5 73
2 70
6 82
5 84

6.2 Introduction to Regression Analysis

If two variables are correlated, that is 𝑟, the coefficient of correlation is significant, then it
is possible to predict or estimate the value of the dependent variable from the independent
variable. This is sometimes called causal forecasting.
Another type of problem which uses regression analysis is when variables
corresponding to years are given and it is possible to predict the value of the variable
several years hence. This is sometimes called forecasting and is related to time-series
analysis.
For these types of problems concerning linear regression, the so called Methods of
Least Squares is used where the “line of best fit”
𝑦 = 𝑎 + 𝑏𝑥 becomes the equation model
Regression Equation: 𝑦 = 𝑎 + 𝑏𝑥

where 𝑥 is the predictor variable


𝑦 is the predictand variable
𝑏 is the slope of the line (𝑠𝑙𝑜𝑝𝑒 𝑚)
𝑎 is the constant value (𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡)
and the following must be computed:
Σ𝑥𝑦 − Σ𝑥 Σ𝑦
𝑏=
𝑛 Σ𝑥 2 − (Σ𝑥)2
Σ𝑦 Σ𝑥
𝑎= −𝑏
𝑛 𝑛
Activity 6c

Find the equation of the line that best fit.


Hours spent in studying Grades received
𝑥𝑦 𝑥2
𝑥 𝑦
3 72
6 89
2 57
3 69
2 63
4 75
5 73
2 70
6 82
5 84

We find 𝑦 = 53.31 + 5.29𝑥


Predicting the grades received:

1. 𝑥 = 7 ℎ𝑜𝑢𝑟𝑠
𝑦 = 53.31 + 5.29(7) = 90.34 ≈ 90
2. 𝑥 = 30 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
𝑦 = 53.31 + 5.29(.5) = 53.31 + 2.645
= 55.96
≈ 56
3. 𝑥 = 1 ℎ𝑟
𝑦 = 53.31 + 5.29(1) = 53.31 + 5.29
= 58.6
≈ 59

6.3 Time Series Analysis

Problems on forecasting production, sales, income, profits, enrollment and many others
which are collected at regular intervals of time can be explained by time-series analysis. The
independent variable 𝑥 represent time period in regular interval and 𝑦 is the dependent
variable to be forecasted.
Illustration

Consider the following data on the enrollment of a kindergarten school which initially
operated in 2015. Forecast the enrollment in 2020.

Year (𝑥) Enrollment (𝑦) 𝑥𝑦 𝑥2


2015 1 25 25 1
2016 2 40 80 4
2017 3 60 180 9
2018 4 70 280 16
2019 5 95 475 25
Σ𝑥 = 15 Σ𝑦 = 290 Σ𝑥𝑦 = 1040 Σ𝑥 2 = 55

Σ𝑥𝑦 − Σ𝑥 Σ𝑦
𝑏=
𝑛 Σ𝑥 2 − (Σ𝑥)2
5(1040) − (15)(290)
=
5(55) − (15)2
5200− 4350
=
275 − 225
850
=
50
= 17

Σ𝑦 Σ𝑥
𝑎= −𝑏
𝑛 𝑛
= 58 − (17)(3)
= 58 − 51
=7

𝑦 = 7 + 17𝑥
For year 2020, 𝑥 = 6
𝑦 = 7 + (17)(6)
= 7 + 102
= 109
A short-cut method can be used with Σ𝑥 = 0

𝑥 𝑦 𝑥𝑦 𝑥2
-2 25 -50 4
-1 40 -40 1
0 60 0 0
1 70 70 1
2 95 190 4
Σ𝑥 = 0 Σ𝑦 = 290 Σ𝑥𝑦 = 170 Σ𝑥 2 = 10

𝑛 Σ𝑥𝑦 − Σ𝑥 Σ𝑦 Σ𝑥𝑦
𝑏= =
𝑛 Σ𝑥 2 − (Σ𝑥)2 Σ𝑥 2
170
=
10
= 17
Σ𝑦 290
𝑎= =
𝑛 5
= 58
𝑦 = 58 + 17𝑥
For 2020, 𝑥 = 3
𝑦 = 58 + 51 = 109
The short-cut method yields the same result. What if 𝑛 is even? Use the same data
with 2020 enrollment equal to 96.

𝑥 𝑦 𝑥𝑦 𝑥2
-5 25 -125 25
-3 40 -120 9
-1 60 -60 1
1 70 70 1
3 95 285 9
5 96 480 25
Σ𝑥 = 0 Σ𝑦 = 386 Σ𝑥𝑦 = 530 Σ𝑥 2 = 70

530
𝑏= = 7.57
70
𝑎 = 64.67
𝑦 = 64.67+ 7.57𝑥
Activity 6d

Why do we consider forecasting models as a form of control system?


MODULE 7
Statistical Analysis with
Software Applications
𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑇𝑒𝑠𝑡𝑖𝑛𝑔

Objectives of the Module

At the end of this module, the students


should be able to

 Give the meaning of


hypothesis.

 Explain why there is a need for


hypothesis.

 Define the important terms in


hypothesis testing.

 Formulate null and alternative


hypothesis.

 Apply the z-test and t-test.


7.0 Hypothesis Testing

The methods of inference used to support or reject claims based on sample data are
known as tests of significance. But it is not enough to test the significance of differences.
There is a need to write the hypothesis, thus the process of hypothesis testing.
What is a hypothesis?
 An educated guess about the population parameter
 An assumption about the population parameter
Hypothesis Testing : This is the process of making an inference or generalization on
population parameters based on the results of the study on
samples.
Statistical Hypotheses : It is a guess or prediction made by the researcher regarding the
possible outcome of the study.
Null Hypothesis (𝑯𝟎 ) is always hoped to be rejected. Always contains “ = ” sign
Alternative Hypothesis (𝑯𝒂 ) challenges 𝐻0. Never contains “ = ” sign.
Uses “ < or > or ≠ ”. It generally represents the idea which the researcher
wants to prove.
Hypothesis Testing : A procedure for deciding if the null hypothesis should be rejected
in favor of an alternative hypothesis, or will not be rejected.

Types of Hypothesis Test: One-Tail left/right & Two-Tail Non-directional


The types of hypotheses test is determined by the Alternative Hypothesis 𝐻𝑎. An
alternative hypothesis may be one-sided (one-tail left or right directional) or two-sided (two-
tail non-directional) depending upon the problem. A one-tail left/right test is used if 𝐻𝑎
claims that a parameter is smaller/greater than the value given by the null hypothesis. A
two-tail test claims that a parameter is simply not equal to the value given by the null
hypothesis, thus, the direction does not matter. In short, if 𝐻𝑎 uses < or >, then the
hypothesis test is One-Tail right or left directional respectively, and if 𝐻𝑎 uses ≠, the
hypothesis test is Two-tail Non directional.
Illustration:
An Evaluation of the Effectiveness of Online Learning
Problem: The researcher wants to know if online learning has significantly increased the
average GPA (Grade Point Average) of students in XYZ College from the known GPA which
is 80. The GPA of 200 randomly selected students was found to be 83.
𝐻0 :𝜇 = 80; The average GPA of students in XYZ College is equal to 80 or Online
learning has not significantly increased the average GPA of students in XYZ college.
𝐻𝑎 :𝜇 > 80; The average GPA of students in XYZ College is greater than 80 or Online
learning has significantly increased the average GPA of students in XYZ college.
Level of Significance, Errors and Rejection/Acceptance Region:
The level of Significance which is symbolized by "𝒂𝒍𝒑𝒉𝒂 (𝒂)" is set by the
researcher at the beginning of the research. Typical values for 𝑎 are 0.05, and 0.01. Another
interpretation of the significance level 𝑎, based in decision theory, is that corresponds to
the value for which one chooses to reject or not to reject the null hypothesis 𝐻0. In decision
theory, this is known as Type I error. The probability of a Type I error is equal to the
significance level 𝑎, and the probability of Type II error is equal to 𝛽 or (1 − 𝑎).

Meaning of 𝒂 = 𝟎. 𝟎𝟓 and 𝒂 = 𝟎. 𝟎𝟏 in Hypothesis Testing


Usually, researchers use either the 0.05 level or sometimes called the 5% level, written
as 𝑎 = 0.05 or the 0.01 level or 1% level, written as 𝑎 = 0.01, although the choice of levels is
largely subjective. The lower the significance level, the more the data must diverge from the
null hypothesis to be significant. Therefore, the 0.01 level is more conservative than the 0.05
level.

An 𝑎 = 0.05 means, “The probability of being right is 95%,


and the probability of being wrong is 5%”.

7.0 Hypothesis Testing Approaches

Hypothesis Testing Approaches: Critical Value Approach and p-value Approach and the
5-step solution
The Critical Value Approach
One way of deciding whether or not to reject 𝐻0 is by comparing the value of the test
statistics with the critical value. The critical value is the value that the test statistics (𝑍 𝑜𝑟 𝑇)
must exceed in order for the null hypothesis (𝐻0) to be rejected. We reject 𝐻0 if the absolute
value of the computed 𝑍 𝑜𝑟 𝑇 ≥ the absolute value of the critical value.

The p-Value Approach


The p-value as a tool in decision-making is now widely used. It is utilized as an
alternative and equivalent way of conducting tests of significance. Here, we compare p -
value with the level of significance or alpha, thus making our work simpler. The p -value is
often called the observed level of significance for the test. Again, p-value refers to the
probability or the expected value that the phenomenon is likely to occur. If the hypothesis is
to be tested at 𝑎 = 0.05, then the area of the rejection region is 0.05. In this approach,
comparison is made between "𝒂 𝐚𝐧𝐝 𝒑− 𝒗𝒂𝒍𝒖𝒆". We reject 𝐻0 if p-value ≤ 𝑎 (0.05),
otherwise 𝐻0 cannot be rejected.

The decision rule in this process is


“Reject 𝐻0 if p-value is less than or equal to 𝑎"
Approches in
Hypothesis
Testing

Critical value
p-value approach
approach

Computed vs. Critical p-value vs 𝑎


5-step solution 5-step solution
1. 𝐻0 : _____________ 1. 𝐻0 : _____________
. 𝐻𝑎 : _____________ . 𝐻𝑎 : _____________ p-value
2. 𝑎 = _____; Cri-value=_____; 2. 𝑎 = _____; Cri-value=_____;
1-T/2-T 1-T/2-T
3. Decision rule: Reject 𝐻0 if 3. Decision rule: Reject 𝐻0 if
Comp − value ≥ Cri − value 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 𝑎
4 . Decision: Reject/Do not 4 . Decision: Reject/Do not reject
reject 𝐻0 because ... 𝐻0 because ...
5. Conclusion: 5. Conclusion:

7.2 Testing the Significance of Difference Between/Among Means

The three known statistical hypothesis tests for means are the T -test, Z-Test, and the F-
Test or ANOVA. Please see illustrations below.

Statistical Hypotheses Tests


Testing the significance of Difference Between/Among Means

Z-test 𝜎 is known Normal


𝑛 ≥ 30 2 means Z-distribution

t-test 𝜎 is unknown Student-T


𝑛 < 30 & 𝑛 ≥ 30 2 means distribution

F-test 2 or more
F-distribution
(𝐴𝑁𝑂𝑉𝐴) means

Definition:
Degree of Freedom: It is the number of variables which are free to vary
Z-Test for Testing the significance of difference between:
Population or hypothesized mean,
that is Population mean vs Sample mean

Two sample means and 𝜎 is known,


that is Sample mean 1 vs Sample mean 2

Two sample means and two 𝜎 are known,


that is Sample mean 1 vs Sample mean 2

Testing the Significance of Difference between Means


"𝑛 𝑖𝑠 𝑙𝑎𝑟𝑔𝑒 𝑜𝑟 𝑤ℎ𝑒𝑛 𝑛 ≥ 30 𝑎𝑛𝑑 𝜎 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛"

Z-test
𝜎 is known
𝑛 ≥ 30

 Hypothesized/population mean VS sample mean and population


standard deviation is known.
(𝑥̅ − 𝜇)√𝑛 𝑥̅ – is the sample mean
𝑧= 𝜇 – is the population mean
𝜎
𝑛 – is the sample size
𝜎 – is the population standard deviation

 Sample mean 1 VS sample mean 2 and population standard deviation


is known.
𝑥̅ 1– is the mean of sample 1
𝑥̅ 1 − 𝑥̅ 2 𝑥̅ 2 – is the mean of sample 2
𝑧=
1 1 𝑛1 & 𝑛2 – are the sample sizes
𝜎√ +
𝑛1 𝑛2 𝜎 – are the population standard deviation

 Sample mean 1 VS sample mean 2 and 2 population standard


deviations are known.
𝑥̅1 − 𝑥̅ 2 𝑥̅ 1– is the mean of sample 1
𝑧=
𝑥̅ 2 – is the mean of sample 2
𝜎2 𝜎2 𝑛1 & 𝑛2 – are the sample sizes
𝜎√ 1 + 2
𝑛1 𝑛2 𝜎1 & 𝜎2 – are the population standard deviation
Critical Value Approach p-value Approach
1. 𝐻0 ; 𝜇 = 80; This year’s batch is as 1. 𝐻0 ; 𝜇 = 80; This year’s batch is as good
good as the previous batches in as the previous batches in College
College Algebra. Algebra.
𝐻𝑎 ; 𝜇 > 80; This year’s batch is 𝐻𝑎 ; 𝜇 > 80; This year’s batch is better in
better in College Algebra than the College Algebra than the previous
previous batches batches
2. 𝑎 = 0.05; 1 − T right; Z − comp = 2.53; 2. 𝑎 = 0.05; p − value = 0.0057;1 − T right
Z − critical = 1.65
3. Decision rule: Reject 𝐻0 if 3. Decision rule: Reject 𝐻0 if p − value(0.0057)
|𝑍𝑐 (2.53)| ≥ ≤ 𝑎(0.05)
|𝑍𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 (1.65)|,that as if 2.53 > 1.65
4. Decision: Reject 𝐻0 , because 4. Decision: Reject 𝐻0 , because p-value
2.53 > 1.65 (0.0057) < 𝑎(0.05)

5. Conclusion: I therefore conclude that 5. Conclusion: I therefore conclude that this


this year’s batch is better in College year’s batch is better in College Algebra
Algebra than the previous batches. than the previous batches.

Testing the Significance of Difference between Means


"𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛[ 𝜎] 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛"
 Hypothesized/population mean VS sample mean
(𝑥̅ − 𝜇)√𝑛
𝑡= 𝑑𝑓 = 𝑛 − 1
𝑠
 Two Independent Sample Means
𝑥̅1 − 𝑥̅ 2
𝑡= 𝑑𝑓 = 𝑛1 + 𝑛2 − 2
(𝑛1 − 1)𝑠12 + (𝑛2 − 1) 𝑠22 1 1
√ √ +
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2

 Dependent or Correlated Sample Means


̅ √𝑛
𝑑
𝑡= 𝑑𝑓 = 𝑛 − 1
𝑠𝑑
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES
LOPEZ BRANCH
Lopez, Quezon

Bachelor of Science in Accountancy

Course Title : STATISTICAL ANALYSIS with SOFTWARE APPLICATIONS


Course Code : STAT 20053
Course Credit : 5 units, (with laboratory)
Pre-Requisite : NONE
Course Description : This course deals with the study of the basic statistical concepts, frequency
distribution, collection and presentation of data, measures of Central Tendency, dispersion, correlation and
regression analysis. It also covers Probability Theory, probability of events, random variables probability distribution
and sampling distribution.

Institutional Learning Outcomes Program Outcomes Course Objectives


BSA graduates must demonstrate the After completing the
1. Creative and Critical Thinking ability to review, interpret, and evaluate course, the students
financial data and systems in compliance should be able to:
with established policies, procedures,  Acquire a sound
guidelines, agreements and/or legislation. knowledge of basic
He must be able to link data, knowledge statistical concepts
and insight together with different sources and probability theory.
and decisions.
He should possess active listening skills  Determine the
2. Effective Communication and the ability to communicate effectively statistical measures to
one’s point of view, both orally and in better analyze and
writing, at all organizational levels; be able interpret statistical
to explain verbally and/or writing financial, data.
statistical, administrative
matters/policies/procedures/regulatory  Test hypothesis using
matters/ audit results at a level the solution critical
appropriate to the audience and must be value and p-value
able to negotiate effectively. approaches.
A BSA graduate should develop an ability
3. Strong Service Orientation to work in groups, possess skills to  Determine relationship
participate as member of a team or/and among variables.
contribute to group effort; be able to teach
4. Passion to Life-Long Learning others new skills; be able to work to the  Perform regression
satisfaction of the clients; negotiate and analysis.
work with diversity or work well with men
and women from diverse backgrounds.  Appreciate the use of
A BSA graduate must work with the statistics in data
5. Sense of Nationalism and Global highest standards of professionalism, to analysis and research.
Responsiveness attain higher level of performance and
generally to meet the public interest, he
must conform to the ethical standards of
the profession that include: Integrity,
objectivity and independence,
professional competence and due care,
confidentiality, professional behavior and
moral values.
A BSA graduate should possess general
6. Community Engagement knowledge in gaining and understanding
of the different cultures in the world and
developing an international objective thus,
he must possess competency in English
7. Adeptness in the Responsible language, adaptability to foreign business
Use of Technology
practices, level of trainability and good
capabilities in dealing with foreign
partners.
A BSA graduate should not only be
8. High Level of Leadership and conversant with IT concepts for business
Organizational Skills systems but sound knowledge on internal
control in computer-based systems,
development standards and practices for
business systems, management of the
adoption, implementation and use of IT,
evaluation of computer business systems,
and managing the security information.
A BSA graduate should possess broad
9. Sense of Personal and base of knowledge concerning macro-
Professional Ethics environmental, economic and industry
issues, business process structures,
functions and practices. It includes
knowledge in areas such as economics,
quantitative models and business
statistics, organizational behavior,
international business, ethics and
corporate governance.
COURSE PLAN

Week Topic Learning Methodolog Resources Assessment


Outcomes y
Orientation  Understand  Blended Course Module Pre-
Week 1 Introduction the Learning syllabus Test & Post
 Overview and importance of Rubrics Test
Definition of Statistics the course, Basilia Ebora
 Descriptive and the syllabus, Blay,
Inferential Statistics the Elementary
 Sample and Population classroom Statistics
 Parameter and policies and (Revised
Statistics Edition)
the grading
system
 Have an
overview of
statistics
 Define
statistics and
other basic
concepts
A. Collection of Data  Acquire the  Paired Basilia Ebora Module Pre-
Week 2  Primary and basic Activity Blay, Test & Post
Secondary Data concepts on  Cooperati Elementary Test
 Interview and the Use data ve Statistics
of Questionnaire collection and Learning (Revised Activity
 Census and Sample presentation  Blended/ Edition) Output
Survey  Define Flexible
 Probability and Non- frequency Learning
Probability Sampling distribution
 Construct
B. Presentation of Data frequency
 Textual Presentation distribution
 Tabular Presentation  Graph
 Graphical Presentation frequency
C. Frequency Distribution distribution
 Ungrouped data
 Array
 Construction of
Frequency Distribution
 Graphical Presentation
of Frequency
Distribution
 Histogram
Week 3 Introduction to Microsoft  Open and  Paired Basilia Ebora Module Pre-
Office Excel Data Entry Close Activity Blay, Test & Post
Microsoft  Cooperati Elementary Test
Application ve Statistics
 Format cells Learning (Revised Activity
 Explore  Blended/ Edition) Output
Microsoft Flexible
Excel Learning
A. Measure of Central  Apply the  Paired Basilia Ebora Module Pre-
Week 4 Tendency Rules of Activity Blay, Test & Post
 Summation Notation Summation  Cooperat Elementary Test
 Arithmetic mean,  Solve for the ive Statistics
median, and mode for Measures of Learning (Revised Activity
grouped and ungrouped Central  Blended/ Edition) Output
data Tendency for Flexible
 Weighted arithmetic Grouped and Learning
mean, geometric mean Ungrouped
and harmonic mean Data
 Percentiles, deciles and
quartiles
 Moments

B. Measures of  Solve for  Paired Basilia Ebora Module Pre-


Week 5 Dispersion Absolute Activity Blay, Test & Post
1. Absolute Measures Measures of  Cooperativ Elementary Test
 Range Dispersion e Learning Statistics
 Mean Absolute  Blended/ (Revised Activity
Deviation Flexible Edition) Output
 Variance and Standard Learning
Deviation
2. Relative Measures  Solve for  Paired Basilia Ebora Module Pre-
Week 6  Coefficients of relative Activity Blay, Test & Post
Variation measures of  Cooperativ Elementary Test
 Coefficient of Quartile dispersion e Learning Statistics
Deviation  Apply  Blended/ (Revised Activity
 Standard score Chebyshev’s Flexible Edition) Output
3. Chebyshev’s Theorem Learning
Theorem  Solve for
C. Skewness and Coefficient of
Kurtosis Skewness
1. Coefficient of and Kurtosis
Skewness based on:
 Moments
 Mean, Median and
Mode
2. Coefficient of
Kurtosis
Correlation and  Define  Paired Basilia Ebora Module Pre-
Week 7 Regression: An correlation Activity Blay, Test & Post
Introduction  Relate the  Cooperati Elementary Test
Scatter Diagram coefficient of ve Statistics
Pearson Product-Moment correlation Learning (Revised Activity
Correlation Coefficient with scatter  Blended/ Edition) Output
Spearman’s Rank diagrams Flexible
Correlation Coefficient  Solve for Learning
Simple Linear Regression Spearman’s
Model 𝜌 and
Pearson’s 𝑟
 Solve simple
linear
regression
models
Week 8 MIDTERM EXAMINATION
Introduction to SPSS  Use SPSS as  Blended/ RStats Workbook
Week SPSS Windows & Files software for Flexible Institute, Output
9-11  Data view Statistical Learning Missouri State
 Variable view Analysis  Video University
 Output viewer Tutorials
 Creating data
 Applications
Hypothesis Testing  Give the  Blended/ Basilia Ebora Activity Output
Week Hypothesis meaning of Flexible Blay,
12-13 Statistical Testing hypothesis Learning Elementary
 Null hypothesis  Explain why Statistics
 Alternative hypothesis there is a (Revised
Types of Hypothesis Test need for Edition)
Level of Significance hypothesis
Hypothesis Testing  Define the
Approaches important
 Critical Value terms in
Approach hypothesis
 P-value Approach testing
The 5-Step Solution  Formulate
Testing the Significance null and
of Differences alternative
Z-Test & T -Test hypothesis
 Apply the Z-
Test and T -
Test
Random Variable and  Apply  Blended/ Basilia Ebora Activity
Week Probability Distributions ANOVA, Chi- Flexible Blay, Output
14-17  Concepts of Random Square Test Learning Elementary
Variable and Statistics
 Probability Mass Correlation (Revised
Function Analysis Edition)
a. Binomial
b. Hypergeometric
c. Geometric
d. Poisson
 Mathematical
Expectation and
Variance
 Probability Density
Function
a. Normal Distribution
b. Gamma and Its
Related Distribution
Week 18 FINAL EXAMINATION

SUGGESTED READINGS AND REFERENCES


 Basilia Ebora Blay, Elementary Statistics (Revised Edition)
 Video Tutorials from RStats Institute, Missouri State University

COURSE GRADING SYSTEM

Class Standing 70%

 Quizzes
 Attendance / Active Use of Module / Online Activity
 Recitation
 Group or Paired Output / Assignment / Seatwork

Midterm / Final Examination 30%


100%

Mid T erm Grade + Final Term Grade = FINAL GRADE


2
Classroom Policy
Classroom policies are based on the use of Blended/ Flexible Learning to adjust to the
“new normal” life we are in where rules on “social distancing” must be carefully observed.

CONSULTATION TIME :
Prepared by: Reviewed by:

Assoc. Prof. THELMA D. OLAIVAR, MS Assoc. Prof. RUFO N. BUEZA, DPA


Faculty Member, PUP Lopez Head, Academic Programs

Recommending Approval:

Assoc. Prof. RUFO N. BUEZA, DPA


Branch Director

Approved:

Prof. PASCUALITO B. GATAN, MBA


Vice-President for Academic Affairs

Revised July 2020

You might also like