0% found this document useful (0 votes)

58 views42 pages

Introduction To Data Analytics: Statistical Inference - II

This document discusses statistical inference and hypothesis testing. It defines Type I and Type II errors in hypothesis testing and provides examples of calculating the probabilities of these errors. The document also discusses one-tailed and two-tailed hypothesis tests, giving examples of defining the rejection regions for different tests. Finally, it presents two case studies and outlines the five steps for testing hypotheses.

Uploaded by

preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views42 pages

Introduction To Data Analytics: Statistical Inference - II

Uploaded by

preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

INTRODUCTION TO

DATA ANALYTICS

Class #11
Statistical Inference - II

Dr. Sreeja S R
Assistant Professor
Indian Institute of Information Technology
IIIT Sri City
IIITS: IDA - M2021 1
Q U O T E O F T H E D AY. .

IIITS: IDA - M2021 2

IN THIS PRESENTATION…

• Errors in hypothesis testing

• Case Study 1: Coffee Sale

• Case Study 2: Machine Testing

• Summary of Sampling Distributions in Hypothesis Testing

IIITS: IDA - M2021 3

Calculating
•Assuming
that we have the results of random sample. Hence, we use the
characteristics of sampling distribution to calculate the probabilities of making
either Type I or Type II error.

Example 6.6:
Suppose, two hypotheses in a statistical testing are:

Also, assume that for a given sample, population obeys normal distribution. A
threshold limit say is used to say that they are significantly different from a.

IIITS: IDA - M2021 4

Calculating
•

Here, shaded region implies the probability that,

a-δ a a+δ

Thus the null hypothesis is to be rejected if the mean value is less than or
greater than .

If denotes the sample mean, then the Type I error is

IIITS: IDA - M2021 5

THE REJECTION REGION
•
The rejection region comprises of value of the test statistics for which
1. The probability when the null hypothesis is true is less than or equal to the specified .
2. Probability when is true are greater than they are under .

a’ a a”
Rejection region for H0 for a
given value of α

Reject H0 Do not reject H0 Reject H0

≠a =a ≠a

IIITS: IDA - M2021 6

Two-Tailed Test
• For two-tailed hypothesis test, hypotheses take the form

In other words, to reject a null hypothesis, sample mean or under a given .

Thus, in a two-tailed test, there are two rejection regions (also known as critical
region), one on each tail of the sampling distribution curve.

IIITS: IDA - M2021 7

Two-Tailed Test
Acceptance region
Accept H0 ,if the sample
mean falls in this region

95 % of area

0.025 of area 0.025 of area

µH 0

Rejection region
Reject H0 ,if the sample mean falls
in either of these regions

Acceptance and rejection regions in case of a two-tailed test with 5% significance level.
IIITS: IDA - M2021 8
One-Tailed Test
•A one-tailed
test would be used when we are to test, say, whether the population mean is
either lower or higher than the hypothesis test value.

Symbolically,

Wherein there is one rejection region only on the left-tail (or right-tail).
Acceptance region
Acceptance region

.05 of area
.05 of area

Rejection region
Rejection region
¿ − tailed test
tailed test
¿
IIITS: IDA - M2021 9
EXAMPLE 6.7: CALCULATING

•
Consider the two hypotheses are

The null hypothesis is

The alternative hypothesis is

Assume that given a sample of size 16 and standard deviation is 0.2 and sample
follows normal distribution.

IIITS: IDA - M2021 10

EXAMPLE 6.7: CALCULATING

•We can decide the rejection region as follows.

Suppose, the null hypothesis is to be rejected if the mean value is less than 7.9 or greater than 8.1.
If is the sample mean, then the probability of Type I error is

Given the standard deviation of the sample is 0.2 and that the distribution follows normal
distribution.
Thus,

and

Hence,
IIITS: IDA - M2021 11

Example 6.8: Calculating and

There are two identically appearing boxes of chocolates. Box A contains 60 red and
40 black chocolates whereas box B contains 40 red and 60 black chocolates. There
is no label on the either box. One box is placed on the table. We are to test the
hypothesis that “Box B is on the table”.

To test the hypothesis an experiment is planned, which is as follows:

• Draw at random five chocolates from the box.
• We replace each chocolates before selecting a new one.
• The number of red chocolates in an experiment is considered as the sample
statistics.

Note: Since each draw is independent to each other, we can assume the sample distribution
follows binomial probability distribution. IIITS: IDA - M2021 12
Example 6.8: Calculating
•Let us express the population parameter as
The hypotheses of the problem can be stated as:
// Box B is on the table
// Box A is on the table
Calculating
In this example, the null hypothesis specifies that the probability of drawing a red chocolate is .
This means that, lower proportion of red chocolates in observations favors the null hypothesis.
In other words, drawing all red chocolates provides sufficient evidence to reject the null
hypothesis. Then, the probability of making a error is the probability of getting five red
chocolates in a sample of five from Box B. That is,

Using the binomial distribution

Thus, the probability of rejecting a true null hypothesis is That is, there is approximately
chance that the box B will be mislabeled as box A. IIITS: IDA - M2021 13

Example 6.8: Calculating
• error occurs if we fail to reject the null hypothesis when it is not true. For the current
The
illustration, such a situation occurs, if Box A is on the table but we did not get the five red
chocolates required to reject the hypothesis that Box B is on the table.
The probability of error is then the probability of getting four or fewer red chocolates in a
sample of five from Box A.
That is,

Using the probability rule:

That is,
Now,
Hence,

That is, the probability of making error is over . This means that, if Box IIITS:
A isIDAon- M2021
the table,
14
the
probability that we will be unable to detect it is .
CASE STUDY 1: COFFEE SALE

A coffee vendor nearby Kharagpur railway station has been having average
sales of 500 cups per day. Because of the development of a bus stand nearby, it
expects to increase its sales. During the first 12 days, after the inauguration of
the bus stand, the daily sales were as under:

550 570 490 615 505 580 570 460 600 580 530 526

On the basis of this sample information, can we conclude that the sales of coffee
have increased?

Consider 5% level of confidence.

IIITS: IDA - M2021 15

HYPOTHESIS TESTING : 5 STEPS

•The
following five steps are followed when testing hypothesis

1. Specify and , the null and alternate hypothesis, and an acceptable level of .

2. Determine an appropriate sample-based test statistics and the rejection region for
the specified .

3. Collect the sample data and calculate the test statistics.

4. Make a decision to either reject or fail to reject .

5. Interpret the result in common language suitable for practitioner.

IIITS: IDA - M2021 16

CASE STUDY 1: STEP 1

•Step
1: Specification of hypothesis and acceptable level of

Let us consider the hypotheses for the given problem as follows.

cups per day

The null hypothesis that sales average 500 cups per day and they have not
increased.

The alternative hypothesis is that the sales have increased.

Given the acceptance level of

IIITS: IDA - M2021 17

CASE STUDY 1: STEP 2

• 2: Sample-based test statistics and the rejection region for specified

Step

Given the sample as

550 570 490 615 505 580 570 460 600 580 530 526

Since the sample size is small and the population standard deviation is not known, we shall
use assuming normal population. The test statistics is

To find and , we make the following computations.

= IIITS: IDA - M2021 18

CASE STUDY 1: STEP 2

IIITS: IDA - M2021 19

Case Study 1: Step 2
•

Hence,

Note:
Statistical table for t-distributions gives a t-value given n, the degrees of freedom and ,
the level of significance and vice-versa.

IIITS: IDA - M2021 20

Case
•
Study 1: Step 3

Step 3: Collect the sample data and calculate the test statistics

As is one-tailed, we shall determine the rejection region applying one-tailed in the right
tail because is more than type ) at level of significance.

IIITS: IDA - M2021 21

Case
•
Study 1: Step 3

Step 3: Collect the sample data and calculate the test statistics

As is one-tailed, we shall determine the rejection region applying one-tailed in the right
tail because is more than type ) at level of significance.

Using table of for 11 degrees of freedom and with level of significance,

IIITS: IDA - M2021 22

Case Study 1: Step 4
•Step
4: Make a decision to either reject or fail to reject H0

The observed value of which is in the rejection region and thus is rejected at level of
significance.

IIITS: IDA - M2021 23

Case Study 1: Step 5
Step 5: Final comment and interpret the result

We can conclude that the sample data indicate that coffee sales have increased.

IIITS: IDA - M2021 24

CASE STUDY 2: MACHINE TESTING
•A medicine production company packages medicine in a tube of 8 ml with . In
maintaining the control of the amount of medicine in tubes, they use a machine. To
monitor this control a sample of 16 tubes is taken from the production line at
random time interval and their contents are measured precisely. The mean amount of
medicine in these 16 tubes will be used to test the hypothesis that the machine is
indeed working properly. The given sample size has a sample mean 7.89 and sample
follows normal distribution.

IIITS: IDA - M2021 25

CASE STUDY 2: STEP 1

•
Step 1: Specification of hypothesis and acceptable level of

The hypotheses are given in terms of the population mean of medicine per tube.

The null hypothesis is

The alternative hypothesis is

We assume , the significance level in our hypothesis testing 0.05.

(This signifies the probability that the machine needs to be adjusted less than 5).

IIITS: IDA - M2021 26

CASE STUDY 2: STEP 2

•Step
2: Sample-based test statistics and the rejection region for specified

Rejection region: G, which gives (obtained from standard normal calculation for two-
tailed test).

IIITS: IDA - M2021 27

CASE STUDY 2: STEP 3

•
Step 3: Collect the sample data and calculate the test statistics

Sample results: , ,

With the sample, the test statistics is

Hence,

IIITS: IDA - M2021 28

CASE STUDY 2: STEP 4
•

Step 4: Make a decision to either reject or fail to reject H0

-2.20 -1.96 0 1.96 2.20

Since , we reject

IIITS: IDA - M2021 29

CASE STUDY 2: STEP 5
•

Step 5: Final comment and interpret the result

We conclude and recommend that the machine be adjusted.

IIITS: IDA - M2021 30

CASE STUDY 2: ALTERNATIVE TEST
•Suppose
that in our initial setup of hypothesis test, if we choose instead of 0.05, then the
test can be summarized as:

1. ,

2. Reject if

3. Sample result n =16, = 0.2, =7.89, ,

4. , we fail to reject = 8

5. We do not recommend that the machine be readjusted.

IIITS: IDA - M2021 31

Hypothesis Testing Strategies
• The hypothesis testing determines the validity of an assumption (technically
described as null hypothesis), with a view to choose between two conflicting
hypothesis about the value of a population parameter.

• There are two types of tests of hypotheses

 Non-parametric tests (also called distribution-free test of hypotheses)
Parametric tests (also called standard test of hypotheses).

IIITS: IDA - M2021 32

Parametric Tests : Applications
• Usually assume certain properties of the population from
which we draw samples.

• Observation come from a normal population

• Sample size is small

• Population parameters like mean, variance, etc. are hold good.

• Requires measurement equivalent to interval scaled data.

IIITS: IDA - M2021 33

Parametric Tests
•Important
Parametric Tests
The widely used sampling distribution for parametric tests are

Note:
All these tests are based on the assumption of normality (i.e., the source of data is
considered to be normally distributed).

IIITS: IDA - M2021 34

Parametric Tests : Z-test
•: This is most frequently test in statistical analysis.

• It is based on the normal probability distribution.

• Used for judging the significance of several statistical measures particularly

the mean.

• It is used even when or is applicable with a condition that such a distribution

tends to normal distribution when n becomes large.

• Typically it is used for comparing the mean of a sample to some

hypothesized mean for the population in case of large sample, or when
population variance is known.
IIITS: IDA - M2021 35
Parametric Tests : t-test
•

: It is based on the t-distribution.

• It is considered an appropriate test for judging the significance of a sample

mean or for judging the significance of difference between the means of two
samples in case of

• small sample(s)

• population variance is not known (in this case, we use the variance of the sample as an
estimate of the population variance)

IIITS: IDA - M2021 36

Parametric Tests : -test

•

: It is based on Chi-squared distribution.

• It is used for comparing a sample variance to a theoretical population

variance.

IIITS: IDA - M2021 37

Parametric Tests : -test

•

: It is based on F-distribution.

• It is used to compare the variance of two independent samples.

• This test is also used in the context of analysis of variance (ANOVA) for
judging the significance of more than two sample means.

IIITS: IDA - M2021 38

Hypothesis Testing : Assumptions
•Case
1: Normal population, population infinite, sample size may be large or small, variance
of the population is known.

Case 2: Population normal, population finite, sample size may large or small………variance
is known.

Case 3: Population normal, population infinite, sample size is small and variance of the
population is unknown.

and

IIITS: IDA - M2021 39

Hypothesis Testing
•Case
4: Population finite

Note: If variance of population is known, replace by . Population normal, population

infinite, sample size is small and variance of the population is unknown.

IIITS: IDA - M2021 40

Hypothesis Testing : Non-Parametric Test

• Non-Parametric tests
Does not under any assumption
Assumes only nominal or ordinal data

Note: Non-parametric tests need entire population (or very large sample size)
IIITS: IDA - M2021 41
Any question?

IIITS: IDA - M2021 42

Scribe
100% (1)
Scribe
9 pages
Hypothesis Testing Procedures Explained
No ratings yet
Hypothesis Testing Procedures Explained
42 pages
Testing of Hypothesis Notes
No ratings yet
Testing of Hypothesis Notes
10 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
26 pages
Steps in Hypothesis Testing Explained
No ratings yet
Steps in Hypothesis Testing Explained
4 pages
22AIP3101A Session 10
No ratings yet
22AIP3101A Session 10
56 pages
Data Analysis: Hypothesis Testing Basics
No ratings yet
Data Analysis: Hypothesis Testing Basics
17 pages
One-Sample Hypothesis Testing Basics
No ratings yet
One-Sample Hypothesis Testing Basics
9 pages
03-Hypothesis Testing With One Sample For The Mean Tutorial
No ratings yet
03-Hypothesis Testing With One Sample For The Mean Tutorial
24 pages
Unit 2-Inferential Statistics
No ratings yet
Unit 2-Inferential Statistics
55 pages
STSM3714 (With Extra Notes From Class)
No ratings yet
STSM3714 (With Extra Notes From Class)
125 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
Chapter 7 - Statistical Inference
No ratings yet
Chapter 7 - Statistical Inference
62 pages
Hypothesis Testing and Estimation Guide
No ratings yet
Hypothesis Testing and Estimation Guide
93 pages
Testing of Hypotheses (Saktipada Nanda)
No ratings yet
Testing of Hypotheses (Saktipada Nanda)
53 pages
Hypothesis Testing Skills and Examples
No ratings yet
Hypothesis Testing Skills and Examples
6 pages
Chapter 5
No ratings yet
Chapter 5
59 pages
Bus Math-Module 6.5 Test of of Significant Differences
No ratings yet
Bus Math-Module 6.5 Test of of Significant Differences
131 pages
Sampling
No ratings yet
Sampling
22 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
One-Sample Tests of Hypothesis
No ratings yet
One-Sample Tests of Hypothesis
39 pages
Introduction to Hypothesis Testing
No ratings yet
Introduction to Hypothesis Testing
43 pages
Chapter 3 &4 HYPOTHESIS and Chi-Square TESTING
No ratings yet
Chapter 3 &4 HYPOTHESIS and Chi-Square TESTING
17 pages
Ken Black QA ch09
No ratings yet
Ken Black QA ch09
60 pages
Testing of Hypothesis - SVB Notes
No ratings yet
Testing of Hypothesis - SVB Notes
20 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
44 pages
3 Hypothesis-Testing
No ratings yet
3 Hypothesis-Testing
59 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
147 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
47 pages
Hypothesis Testing and Errors Explained
67% (3)
Hypothesis Testing and Errors Explained
37 pages
IE5005 Lecture 04
No ratings yet
IE5005 Lecture 04
57 pages
07 (Chapter 7)
No ratings yet
07 (Chapter 7)
63 pages
Hypothesis and Index Number and Sampling Method
No ratings yet
Hypothesis and Index Number and Sampling Method
36 pages
Testing Acme's Baseball Card Claims
No ratings yet
Testing Acme's Baseball Card Claims
7 pages
STSM3714 (With Notes From Class)
No ratings yet
STSM3714 (With Notes From Class)
110 pages
Hypothesis
No ratings yet
Hypothesis
59 pages
MT271 Chapter
No ratings yet
MT271 Chapter
14 pages
Business Statistics by S P Gupta 1
No ratings yet
Business Statistics by S P Gupta 1
18 pages
Statistical Testing and R Programming Guide
No ratings yet
Statistical Testing and R Programming Guide
21 pages
Engineers' Guide to Hypothesis Testing
No ratings yet
Engineers' Guide to Hypothesis Testing
10 pages
06 Analyze
No ratings yet
06 Analyze
25 pages
Lesson 15-Test of Hypothesis
No ratings yet
Lesson 15-Test of Hypothesis
3 pages
Statistics & Probability Q4 - Week 3-4
No ratings yet
Statistics & Probability Q4 - Week 3-4
16 pages
Author: Dr. K. GURURAJAN: Class Notes of Engineering Mathematics Iv Subject Code: 06mat41
0% (1)
Author: Dr. K. GURURAJAN: Class Notes of Engineering Mathematics Iv Subject Code: 06mat41
122 pages
Testing Statistical Hypothesis - 1
No ratings yet
Testing Statistical Hypothesis - 1
10 pages
Basic Business Statistics: (8 Edition)
No ratings yet
Basic Business Statistics: (8 Edition)
36 pages
Problems of Hypothesis Testing-02
No ratings yet
Problems of Hypothesis Testing-02
44 pages
Lec 1
No ratings yet
Lec 1
38 pages
Hypothesis Testing in Data Analysis
No ratings yet
Hypothesis Testing in Data Analysis
58 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
12 pages
10.01. Testing of Hypotheses Printable
No ratings yet
10.01. Testing of Hypotheses Printable
22 pages
Probability and Statistics Assignment
No ratings yet
Probability and Statistics Assignment
5 pages
Hypothesis
No ratings yet
Hypothesis
27 pages
Hypothesis Test
83% (6)
Hypothesis Test
15 pages
Hypothesis Testing One Sample Full
No ratings yet
Hypothesis Testing One Sample Full
58 pages
Testing of Hypotheses PDF
No ratings yet
Testing of Hypotheses PDF
21 pages
Work Material - U3 Statistics
No ratings yet
Work Material - U3 Statistics
14 pages
CH III Hypothesis Testing
No ratings yet
CH III Hypothesis Testing
39 pages
Basic Concepts of Hypothesis Testing Discussion
No ratings yet
Basic Concepts of Hypothesis Testing Discussion
46 pages
Student Suicides - What Are The Deep Rooted Problems
No ratings yet
Student Suicides - What Are The Deep Rooted Problems
8 pages
Probability Distributions Overview
No ratings yet
Probability Distributions Overview
50 pages
Introduction To Data Analytics: Sampling Distributions
No ratings yet
Introduction To Data Analytics: Sampling Distributions
31 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
30 pages
EDEXCEL A-Level Geography June 2023 Paper 1
No ratings yet
EDEXCEL A-Level Geography June 2023 Paper 1
40 pages
Defining Economics and Scarcity Principles
No ratings yet
Defining Economics and Scarcity Principles
22 pages
Learning Team Week 4 QNT351 University of Phoenix
No ratings yet
Learning Team Week 4 QNT351 University of Phoenix
5 pages
The Impact of Supply Chain Finance On Corporate Performance Improving Supply Chain Efficiency and Increasing Profitability
100% (1)
The Impact of Supply Chain Finance On Corporate Performance Improving Supply Chain Efficiency and Increasing Profitability
85 pages
Business Management Student Research Workbook 13
No ratings yet
Business Management Student Research Workbook 13
78 pages
JSS Hughes Davis Imenda2019
No ratings yet
JSS Hughes Davis Imenda2019
13 pages
Theoretical Framework and Hypotheses Development
No ratings yet
Theoretical Framework and Hypotheses Development
28 pages
The Real World - 8th Ed 2
No ratings yet
The Real World - 8th Ed 2
65 pages
Natural Selection Portfolio
No ratings yet
Natural Selection Portfolio
4 pages
Chapter 6 - Teacher Notes
No ratings yet
Chapter 6 - Teacher Notes
36 pages
LP-Science 7-Wk-2-Aug.-30-Sept.2-2022
No ratings yet
LP-Science 7-Wk-2-Aug.-30-Sept.2-2022
7 pages
Third Quarter Assessment: Inquiry 12
No ratings yet
Third Quarter Assessment: Inquiry 12
8 pages
Unit-2 Research Process and Problem Formulation
No ratings yet
Unit-2 Research Process and Problem Formulation
9 pages
Working For A Better Future Social Mobility Beliefs and Expectations of Filipino Migrant Workers in Macau
No ratings yet
Working For A Better Future Social Mobility Beliefs and Expectations of Filipino Migrant Workers in Macau
12 pages
Sublist 1 of The Academic Word List
No ratings yet
Sublist 1 of The Academic Word List
13 pages
Scientific Study of Religion - 2017 - Brown - The Influence of Religion On Interstate Armed Conflict Government Religious
No ratings yet
Scientific Study of Religion - 2017 - Brown - The Influence of Religion On Interstate Armed Conflict Government Religious
21 pages
Capstone Project
No ratings yet
Capstone Project
82 pages
Criminological Research and Statistics - Notes For Criminology Students
No ratings yet
Criminological Research and Statistics - Notes For Criminology Students
12 pages
The Merit of The Meritocratization
No ratings yet
The Merit of The Meritocratization
13 pages
Quantitative Research Concepts Guide
No ratings yet
Quantitative Research Concepts Guide
5 pages
Probability and Statistics Questions Guide
No ratings yet
Probability and Statistics Questions Guide
23 pages
Masters Dissertation Finance
No ratings yet
Masters Dissertation Finance
32 pages
Post Hoc Power: A Concept Whose Time Has Come: Anthony J. Onwuegbuzie
No ratings yet
Post Hoc Power: A Concept Whose Time Has Come: Anthony J. Onwuegbuzie
31 pages
Cordeiro Et Al. (2021)
No ratings yet
Cordeiro Et Al. (2021)
17 pages
Assignment The Scientific Method
No ratings yet
Assignment The Scientific Method
4 pages
Our Hidden Forces - E Boirac
100% (1)
Our Hidden Forces - E Boirac
339 pages
State of The Art: James C. Mccroskey
No ratings yet
State of The Art: James C. Mccroskey
19 pages
Fed 1
No ratings yet
Fed 1
14 pages
Chapter 3, Thinking Like A Researcher
No ratings yet
Chapter 3, Thinking Like A Researcher
33 pages
Statistical Analysis for Researchers
No ratings yet
Statistical Analysis for Researchers
4 pages