0% found this document useful (0 votes)
26 views26 pages

Week 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views26 pages

Week 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CSE315:Introduction to Data

Science
WEEK-5
Inference and Estimation
• Guess is the meaning of the word inference. When the
population parameters are estimated from the data obtained
from the sample through statistical method it is called
statistical inference.
• For example, you collected sample data on the monthly
income of 200 bus drivers, from which you can find out the
average monthly income of 200 bus drivers. If you want to
know based on this information what will be the average
income of bus drivers all over Bangladesh, this is the
information. Method of estimating population information
based on sample information.
Inference and Estimation
• The information obtained from the sample is called
sample statistic. Statistics is expressed by English
letters such as mean is expressed by X, standard
deviation is expressed by S.
• Population information is called population
parameter. Parameters are expressed by Greek
letters, such as mean μ and standard deviation σ.
Types of Inference
• Estimation- Estimation is a method of extracting
numerical data of population parameters from
sample data.
• Types of Estimation- a) Point Estimation
b) Interval Estimation
• Hypothesis Testing- Simply, a hypothesis is a
hypothetical claim about population. However, this
hypothetical claim must be mathematically verifiable.
• Types- Null Hypothesis and Alternative Hypothesis
Point Estimation
• In point estimation a certain value of the population
parameter is found. In this case, sample information is
considered as population information.
• Ways- Method of Moments and Maximum Likelihood
Interval Estimation
• Estimating the value of a population parameter within a
certain range or range, assuming there will be some errors in
the point estimate, is called an interval estimate. Interval
estimation consists of a confidence level such as 90%
confidence level, 95% confidence level, 99% confidence level
etc.
• Interval estimation = Point estimation +/- error margin
Two ways to estimate interval
estimation
• Interval estimation is mainly done in two ways. Z
distribution is followed when population standard
deviation σ is known and T distribution is followed
when σ is unknown.
Hypothesis Testing
• The hypothesis is a very obscure and complex part of the
world of statistics. Simply, a hypothesis is a hypothetical claim
about population. However, this hypothetical claim must be
mathematically verifiable.
• Hypotheses can be of two types:
• Null and Alternate Hypothesis
Null Hypothesis
• Null Hypothesis- Null hypothesis is a claim regarding
population which is initially taken as true until it is proved to
be false. The null hypothesis is expressed by H0. Expression of
null hypothesis may have =, <= or> = sign.
• Alternative Hypothesis- If the null hypothesis is proved to be
false then another hypothesis is kept as an alternative which
is considered to be true, this hypothesis is called Alternative
Hypothesis. Alternative hypotheses are expressed by HA.
Alternative hypotheses cannot contain expressions of null
hypotheses (=, <= or> =). In that case the expression of
alternative hypothesis may be ≠,>, <etc.
Hypothesis Testing
• Hypothesis testing is the process of checking whether a hypothesis
is correct. In hypothesis testing, we can reject or accept a null
hypothesis.

• When testing a hypothesis, correct assumptions may be considered


incorrect and incorrect assumptions may be considered correct.
This type of error is called hypothesis testing error.
• This error can be of two types:
• Type I Error - When a true null hypothesis is rejected, it is called a
Type I Error.
• Type II Error- When a false null hypothesis is not rejected, it is
called Type II Error.
Hypothesis Testing (z test and t test)
• z test and t test - we use z distribution when the
value of population standard deviation is known and
such test is called z test.
• On the other hand, when the value of population
standard deviation is unknown, we use t distribution
and such a test is called t test.
z test and t test
Conditions for z and t test:
• Sample size should not be less than 30
• Follow the normal distribution if the population size is less
than 30.
• Otherwise follow the non parametric method

σ value
is
known?

Z test t test
P-value
• p-value- it is a probability value. When testing a hypothesis
with p-value, if significance (significance level) is greater than
p-value, then null hypothesis is rejected, whereas if it is less
than p-value, then null hypothesis is not rejected.
• Steps to test hypothesis in p-value method,
• Step-1: Determining Null and Alternative Hypothesis
• Step-2: Determine which distribution to follow
• Step-3: Find the value of p-Value
• Step-4: Making a decision
One Tail and Two Tail Tests
• One Tail and Two Tail Tests - In Hypothesis Testing, we see if the value of
alpha significance (significance level) exceeds the p-value. That is, if the
value of is within the non-rejection area (white part), we will not reject the
null hypothesis.
• On the other hand, if the value of alpha exceeds the non-rejection area
and reaches the rejection area (red part), then we will reject the null
hypothesis.
Mathematical example-1 of z test
• One company claims that it takes employees an average of 90
minutes to learn how to use new machineries. The company's
supervisor recently collected samples from 20 employees to
find out if it took employees less than 90 minutes or more to
learn to use a newly added machine.
• The average time to learn how to use their newly connected
machine in that sample was 75 minutes. It is known from the
previous information that the learning duration of the
employees using the machine follows the normal distribution
and in this case the standard deviation is 6 minutes. Now
verify the claim in the case of alpha = .01 (i.e. 99% confidence
level).
Mathematical example-1 of z test
Step-1: Determining Null and Alternative Hypothesis

Step-2: Determine which distribution to follow


As Sample size is less than 30 but standard deviation
is known so we follow the Normal distribution
Step-3: Find the value of p-Value

According to Z z-table, p value will be .0007 and it is a two-tail test so p-value will
be double
Step-4: Making a decision
Alpah=.01, p value is greater than .0014 so we reject null hypothesis. So new
employees need to learn the machineries which is not 90 minutes, rather it take
less or greater than 90 minutes.
Mathematical example-2 of z test
• The mayor of a city estimates that the average value of the
assets of families living in his city is at least, 300,000. To verify
the validity of this claim / assumption, he collected data from
25 families through random samples. The average wealth of
the families was found to be 26,000 from the sample. Past
research has shown that the population standard deviation in
the distribution of wealth to the people of that city is 60,000.
Now verify the veracity of the mayor's claim at 2.5% level of
synapses.
Mathematical example-2 of z test
Step-1: Determining Null and Alternative Hypothesis

Step-2: Determine which distribution to follow


As Sample size is less than 30 but standard deviation
is known so we follow the Normal distribution
Step-3: Find the value of p-Value

According to Z z-table, p value will be .2266

Step-4: Making a decision


Alpah=.025, which is less than p value (.2266) so we will not reject null hypothesis.
So mayor claim is right.
Mathematical example-1 of t test
• One expert claims that the average oxygen intake of those
who exercise regularly is higher than the average oxygen
intake of older adults. Previous research has shown that the
average oxygen intake in normal adults is 36.6 ml / kg. You
wanted to verify the validity of this claim. That's why you
randomly selected 15 people who exercised regularly and
looked at the data collected from them, taking an average of
40.6 ml / kg of oxygen. The standard deviation of the sample
is 8 ml / kg. It is already known that the amount of oxygen
intake in adults follows the normal distribution. Now you
need to verify the validity of this claim at 5% Level of
Significance.
Mathematical example-1 of t test
Step-1: Determining Null and Alternative Hypothesis

Step-2: Determine which distribution to follow


As Sample size is less than 30 but standard deviation
is unknown so we follow the t distribution
Step-3: Find the value of p-Value

According to t-table, df=14 and we see that the value of 2.517 resides between 2.145 and
2.624, so the p-value is accordingly .025 and .01. so we can write .01<p-value< .025
Step-4: Making a decision
Alpah=.05, which is greater than p value so we will reject null hypothesis.
Probability
• Probability is the probable mathematical measure of the
occurrence of an event. The probability of this incident is how
much the head is likely to rise when tossing.
• Probability = favorable event / total event
• For example, in the game of Ludu, if you give a move, what is
the probability of getting that?
• The total number of events in Ludu's dice is 6 (1,2,3,4,5, and
6) and the favorable event is 1 i.e. 6 rising events. Then the
probability of that rising will be 1/6
Probability
• Probability is the only thing that can happen. An event that
cannot happen is an impossible event, the probability of such
an event is zero.
• On the other hand, all the events that must happen are
certain events. For example, the probability that we will die
one day is 1, that is, it is a certain event,
• on the other hand, the probability that we will live forever is 0,
that is, it is an impossible event. The probability value must be
between 0 and 1.
Different components of probability

• Experiments are repetitive events that consist of certain


possible outcomes. Such as playing ludu.
• Outcome - Every possible result of an experiment is called
Outcome. As Ludu rice can get 1,2,3,4,5 or 6, these are all
outcomes.
• Sample Space - All Outcomes together are called Sample
Space.
• Events - One or more outcomes are called events. For
example, the occurrence of even numbers (2,4 or 6) in Ludu
rice is an event.
Bayes Theorem
• The basis of hidden events is the probability of a hidden event,
which we do not know beforehand. A and B are not two
events. The probability of the occurrence of event A occurring
in event B is determined by the base theorem.

• P (A) = A probability of occurrence of the event


• P (B) = B The probability of the event occurring
• P (B | A) = Probability of event B occurring if event A occurs
• P (A | B) = Probability of A event occurring if B event occurs
Example of Bayes Theorem
• Suppose you plan to go on a picnic somewhere with your
family and friends on a Friday in January. When I woke up on
Friday morning, I saw that the sky was cloudy, and your
forehead was full of worries. If it rained, all your plans would
be canceled! In this case you have some previous information,
using this information you want to know what is the
probability of rain today?
• Previous information
• 40% of the day the sky is cloudy in the morning
• In 50% of cases, it rains only when there is cloudy sky
• January usually does not rain for more than 3 days (10%)
Example of Bayes Theorem
So we can write,
• P(Rain) = 10%
• P(Cloud|Rain) = 50%
• P(Cloud) = 40%

• So you know that the probability of rain today is only 12.5%,


so even if you are not afraid to see the clouds, you can have a
picnic because the chances of rain are very low!
rain=0.1
def bayes(a,b,ba):
cloud=0.4
ab= a*ba/b
cloud_given_rain=0.5
print("P(A | B)=",round(ab, 3))
bayes(rain,cloud,cloud_given_rain)

You might also like